Application of Protein-based Biosensors in Detection of Novel Therapeutics & Environmental Monitoring
Dissertation
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University
By
Jeevan Baretto, M.S.
Graduate Program in Department of Chemical Engineering
The Ohio State University
2014
Dissertation Committee:
Professor David W. Wood, Advisor Professor Andre Palmer Professor Shian-Tian Yang c Copyright by
Jeevan Baretto
2014 Abstract
Proteins play a vital role in a living system. Any malfunctions lead to a serious genetic disease. In recent years, thousands of genes been discovered, each with specific functions. The development of tools for controlling their function is a challenging task and researchers have been working on it for decades. Proteins have been used as molecular switches to construct biosensors. In this work, an engineered protein- based bacterial biosensor has been introduced as a tool for detection of potential ligands to nuclear hormone receptor proteins. The modular design of the biosensor protein allows us to swap the receptor protein and create new biosensors easily. The study has been extended to receptor proteins across different species and tested with a library of chemicals. A pre-synaptic protein, neurexin, known to play a role in neurodevelopmental disorders has been incorporated into the biosensor and could potentially lead to new avenues in discovery of new therapeutics. Techniques including nuclear magnetic resonance (NMR) have been used to confirm the binding. A detailed study on the mechanism of biosensors has been presented which can lead to intelligent design of next generation biosensors.
ii To my late Daddy
&
my Amma
iii Acknowledgments
I have to thank David for giving me this excellent opportunity to work in his lab and providing every possible support. I am grateful to him for believing in me in spite of not having any prior biology experience. He was a constant source of inspiration in the group, a great mentor and an advisor. I also want to thank my committee members Dr. Palmer and Dr. Yang for their constant support. I have to mention
Dr. Tapas Mal, who extensively guided me with the NMR experiments and am very grateful for that.
I am very thankful to the current and past lab members who were there when I needed them. I would like to thank Drs. Jingjing, Iraj and Izabela for their guidance, support and deep insight during my initial days in the lab. I am thankful to other lab members - Mike, Dan, Miriam, Elif, Sam, Tzu Chiang, Ashwin, Steven, Samar among others for their friendship and constant support.
A special thanks to my best friends: Dr. Verena, Michi and Shadwa for cheering me up all the time when my experiments did not work.
I have to thank my roommates with whom I spent a significant amount of time while working on my thesis. Special thanks to Sanket, Deepak, Arvind, Sughosh and
Sidharth.
I would have never made it to OSU without the efforts of my inspiring professors at
IIT Bombay - Prof. Preeti Aghalayam, Prof. Sanjay Mahajani and Prof. Anuradda
iv Ganesh. This journey would not be possible without the efforts of my amazing family
- my mom, dad and my sister, Mangala. I miss you a lot. I would like to thank my extended family for their unconditional love and support even during hard times -
Mamatha and Rajeshanna.
v Vita
2006 ...... B.E.ChemicalEngineering, S. V. National Institute of Technology Surat, India 2013 ...... M.S.ChemicalEngineering, The Ohio State University Columbus OH 2009-present ...... GraduateTeaching Associate, Department of Chemical Engineering, The Ohio State University Columbus OH
Fields of Study
Major Field: Chemical Engineering
vi Table of Contents
Page
Abstract...... ii
Dedication...... iii
Acknowledgments...... iv
Vita...... vi
ListofTables...... x
ListofFigures...... xi
1. Introduction...... 1
1.1Goal...... 4 1.2Background...... 5 1.2.1 Inteins...... 5 1.2.2 Structure...... 6 1.2.3 Engineered Mini-inteins ...... 7 1.2.4 Split Inteins ...... 9 1.2.5 Splicing Mechanism ...... 10 1.2.6 Applications of Inteins ...... 13 1.2.7 Construction of a Thymidylate Synthase Reporter System . 14
2. ProteinSwitchesforBiosensing...... 17
2.1 Nuclear Hormone Receptors Superfamily ...... 18 2.2 Nuclear Hormone Receptors Classification ...... 19 2.3 Nuclear Hormone Receptor Ligands ...... 20 2.4StructuralfeaturesofNHRs...... 22 2.5 Hormone-sensing protein-based Biosensor Design and Construction 24
vii 2.6 Detection of compounds with hormone-mimicking properties .... 26 2.7 Biosensors with Estrogen Receptors from Various Species ...... 28 2.8Discussion...... 48 2.9Conclusions...... 50
3. NeurexinBiosensor...... 51
3.1Neurexin...... 51 3.2PlasmidConstructions...... 55 3.3BiosensorAssay...... 55 3.4 15N Labeled Protein Expression and Purification ...... 57 3.5 Sample for NMR experiments ...... 58 3.6 Saturated Transfer Difference (STD) NMR ...... 58 3.7Heteronuclearsinglequantumcoherence(HSQC)...... 59 3.8ResultsandDiscussion...... 60 3.9Conclusions...... 65
4. MechanismofBiosensors...... 66
4.1Results...... 67 4.2Discussion...... 74 4.3Conclusion...... 78
5. Summary ...... 79
5.1FutureWork...... 80 5.1.1 3DstructureofneurexinusingNMR...... 80 5.1.2 MolecularDocking...... 81 5.1.3 In-vitro biosensor...... 81
6. Materials&Methods...... 82
6.1ReagentsandStrains...... 82 6.2PlasmidConstruction...... 83 6.3 15N Labeled Protein Expression and Purification ...... 84
Appendices 86
A. DNA Sequences of Ligand Binding Domains of Nuclear Hormone Recep- torsandNeurexin...... 86
A.1 Human Estrogen Receptor β ...... 86
viii A.2 Cow Estrogen Receptor β ...... 87 A.3 Rat Estrogen Receptor β ...... 88 A.4 Zebrafish Estrogen Receptor β ...... 89 A.5Neurexin2b...... 90
B. PlasmidMaps...... 91
B.1 pMIT:ERβ Human...... 91 B.2 pMIT:ERβ Cow...... 96 B.3 pMIT:ERβ Rat...... 101 B.4 pMIT:ERβ Zebrafish...... 106 B.5 pMIT:Nrx 2b ...... 111 B.6pET:Nrx-His6...... 116
Bibliography ...... 119
ix List of Tables
Table Page
2.1 List of ICCVAM suggested chemicals ...... 30
2.2 Statistical Data for human ER β biosensor...... 37
2.3 Statistical Data for cow ER β biosensor...... 41
2.4 Statistical Data for rat ER β biosensor...... 45
2.5 Statistical Data for zebrafish ER β biosensor...... 46
x List of Figures
Figure Page
1.1Thecentraldogmaofmolecularbiology...... 3
1.2Evolutionofinteins...... 8
1.3 Evolution of mini-inteins ...... 9
1.4Mechanismofproteinsplicing...... 12
1.5Schematicofinteinmediatedproteinpurification...... 14
1.6FolateCycle...... 16
2.1 Mechanism of estrogen signaling ...... 20
2.2 Steroid receptor ligands ...... 21
2.3 Retinoid X receptor ligands ...... 22
2.4 Estrogen receptor ligands ...... 23
2.5 Tertiary structure of ligand bound ERβ ...... 24
2.6 Schematic of the biosensor modules ...... 25
2.7 Animal biosensor plasmid construction ...... 30
2.8 Dose response curves for chemicals with human ERβ 1 ...... 35
2.9 Dose response curves for chemicals with human ERβ 2 ...... 36
xi 2.10 Dose response curves for chemicals with human ERβ 3 ...... 36
2.11 Dose response curves for chemicals with cow ERβ 1...... 37
2.12 Dose response curves for chemicals with cow ERβ 2...... 38
2.13 Dose response curves for chemicals with cow ERβ 3...... 38
2.14 Dose response curves for chemicals with cow ERβ 4...... 39
2.15 Dose response curves for chemicals with cow ERβ 5...... 39
2.16 Dose response curves for chemicals with cow ERβ 6...... 40
2.17 Dose response curves for chemicals with rat ERβ 1 ...... 41
2.18 Dose response curves for chemicals with rat ERβ 2 ...... 42
2.19 Dose response curves for chemicals with rat ERβ 3 ...... 43
2.20 Dose response curves for chemicals with rat ERβ 4 ...... 43
2.21 Dose response curves for chemicals with rat ERβ 5 ...... 44
2.22 Dose response curves for chemicals with zebrafish ERβ 1 ...... 45
2.23 Dose response curves for chemicals with zebrafish ERβ 2 ...... 46
2.24 Dose response curves for chemicals with zebrafish ERβ 3 ...... 47
2.25 Dose response curves for chemicals with zebrafish ERβ 4 ...... 47
3.1SchematicofSynapseJunction...... 52
3.2Schematicofneurexinsubtypes...... 53
3.3 Schematic representation of neurexin biosensor ...... 55
3.4 Schematic representation of the high-throughput screening method . . 56
3.5 Scheme showing the principle of STD NMR ...... 59
xii 3.6Doseresponsecurveforneurexinbiosensorwithrosiglitazone..... 61
3.7STDNMRspectrumofneurexinwithrosiglitazone...... 62
3.8 Overlay of HSQC for neurexin in apo form and in the ligand-bound form 64
4.1Schematicofdifferentbiosensorconstructs...... 68
4.2 Effect of MBP and/or intein domain deletions ...... 69
4.3 Effect of intein V437L mutation ...... 71
4.4 Effect of length of linker between MBP and intein domain ...... 72
4.5 Effect of GS linkers between intein and TS domain ...... 73
4.6 Effect of intein N440A mutation ...... 75
4.7 Effect of intein N440A mutation and GS linkers ...... 76
xiii Chapter 1: Introduction
Proteins are an integral part of any living organism. Ones ability to develop and sustain life depends entirely on active production, regulation and function of proteins.
Enzymes participate in catalyzing biochemical reactions in our body, starting from digestion of food to absorption of oxygen into the blood stream. For example, the enzyme amylase breaks down starch, which is a large molecule, to smaller molecules such as maltose, which is more readily absorbed by the intestines. Several metabolic pathways, such as glycolysis, which produces the organisms energy currency, ATP, are vital for the living system and would not be possible if there were no enzymes catalyz- ing the biochemical reactions. Another example would be the process of homeostasis.
The regulation of the organisms temperature, pH, alkalinity and acidity requires a tight control of the protein activity within their system. Any malfunction may lead to a serious genetic disease. Ion channels, which are membrane proteins, play a very important role in controlling the flow of ions, which mediate the action potential.
They are like gates that let only certain ions pass through. Our nervous system depends on these action potentials for the transfer of electrochemical signals. So if we want to move our arms or look right, we need those ion channels working. In fact, the venoms produced by some predators target these ion channels of their prey and inactivate them. Peptide-based hormones act as co-activators and regulate the
1 function of certain genes. Therefore we can see that proteins play a very important role in the sustenance of life.
Proteins in different species or even within the same species display a variety of difference in their phenotype and function. They all have specific functions that are equally important for the survival of the species. Before we understand life, we need to understand the importance of protein function. In recent years, thousands of genes have been discovered, each with a specific function in an organism. These genes interact with thousands of proteins which regulate the genes function. Understanding their function will lead to discoveries than can help cure diseases and improve yield in agricultural crops. For example, we can engineer the metabolic pathway of a gene for higher intake of nutrients and increase yield. This method has been successfully applied in the field of agricultural sciences.
The activity and function of proteins depend on their structure and folding char- acteristics. According to the central dogma of molecular biology, figure 1.1 which describes the flow of genetic information within a biological system, the DNA encod- ing a particular gene is transcribed to messenger RNA which is then translated to a protein.
The proteins then fold to secondary, tertiary and quaternary structures and be- come functional. The proteins have a dynamic structure, which means they are breathing all the time and can undergo conformational changes to allow for interac- tions with other proteins or small-molecules. Enzymes bind to their substrates in a similar fashion, termed as lock and key.
We can regulate the function of proteins by engineering their biophysical proper- ties. We can have serious consequences if the biophysical characteristics of proteins
2 Figure 1.1: The central dogma of molecular biology
are altered. Mutations in DNA lead to changes in amino acid sequence and in turn alter the function of the protein. Several known diseases such as sickle-cell anemia,
Alzheimers etc. are caused by mutations in the gene expressing the protein. Therefore the protein that was supposed to carry out a particular function is unable to do so because it has a different genetic code. Hence understanding proteins and engineering them in a favorable way can help cure these diseases.
To understand any system, we need to first perturb the system, see how it reacts, and know the function of that particular module and thus how it affects the system.
Similarly, to understand the function of a particular protein, one strategy is to knock
3 out its gene from the chromosome, or mutate the nucleotides of the gene sequence which expresses that protein and see what implications that has on its phenotype.
This technique is simple and can be easily generalized to mutate any nucleotide from billions of nucleotides and still be highly specific so the function of only that particular gene is affected. Unfortunately, we cannot completely generalize this technique to any target gene, as some proteins are involved in cell viability and cannot be disturbed.
Bacterial and mammalian cell hosts have provided a cheap alternative for expressing human proteins. They are like cell factories that can make proteins for us in large quantities. They are cheaper, faster, and more efficient. We can insert a specific set of
DNA into these host cells and transform them to make our protein of interest. These host cells are considered industrial work horses for making therapeutic proteins and other biotechnology products. Proteins have also been used as molecular switches to regulate the functioning of a gene. This means that we can make use of the biophysical feature of proteins to regulate the target gene.
1.1 Goal
Our goal in this thesis is to design alternative methodologies in regulating protein function. First, a method will be described for the engineering of protein switches called biosensors that can detect chemicals binding to the protein. We will then fo- cus on developing these ideas and applying them to specific targets such as estrogen receptors, thyroid receptors, peroxisome proliferator-activated receptors, and neurex- ins. We will probe into the mechanism of working biosensors by deleting domains and truncating protein sequences to study the role of individual domains.
4 1.2 Background
Let us now delve into learning more about specific proteins central to this work.
Proteins have a variety of functions, and most of them are still unknown. In this chapter, we will try to understand inteins its structure, function, and applications.
1.2.1 Inteins
Several archael, eubacterial and eukaryotic genes contain in-frame insertions that are excised during post-translational modification (Perler, 1998). These sets of se- quences are termed as inteins. This results in the formation of two products: the excised intein and the mature spliced host protein (Perler et al., 1994). Intein splic- ing is analogous to the self-splicing of introns at the RNA level (Derbyshire and
Belfort, 1998).
Intein was first discovered in 1988 by sequence comparison of carrot vacuolar
ATPase with the exons of a Neurospora 69-kDa genomic clone from Saccharomyces cerevisiae (Zimniak et al., 1988). It revealed a discrepancy between the size of the gene that encodes for this protein and the mature translated product. When it was initially cloned, the expected size of the mature translated product was 119 kDa.
After running the protein samples through a polyacrylamide gel electrophoresis, they found out that the actual size of the product was 69 kDa. Northern blot analysis confirmed the presence of a single mRNA transcript corresponding to the size 119 kDa. Thus it was believed that there is some other mechanism involved. Researchers at New England Biolabs did a systematic study into the splicing mechanism and suggested that factors such as temperature and pH play an important role in the splicing reaction (Xu et al., 1993).
5 Inteins exist as one of the three forms in nature 1. As maxi-inteins which contain
DNA homing endonuclease within the reading frame of the intein, resulting in N- terminus IN and C-terminus IC intein fragments. 2. As a mini-intein where the endonuclease domain is removed but contain an adjoining protein splicing domain.
3. As split inteins where the IN and IC are encoded by two independent genes and fused to their respective exteins (van den Heuvel et al., 1998). Over 550 inteins have been identified in the intein InBase database (Raffo et al., 2013), of which a majority are from unicellular organisms. It was found that a number of amino acids flanking the intein-extein splice junctions were well conserved. The first amino acid of an intein is always Cys or Thr while the first amino acid of the C-extein is either Cys,
Thr or Ser. These conserved features have led to discovery of different inteins in many different host proteins and species (Pietrokovski, 2001). It is found that most of the intein alleles are found to be similar which leads to the theory that inteins have evolved from a single gene ancestor (Karki et al., 2014). Although there is not much similarity among the non-allelic inteins, their integration point is found in highly conserved sequence motifs (Derbyshire and Belfort, 1998). Intein host proteins are very diverse, including vacuolar ATPase, DnaB helicase, RecA, GyraseA, DNA polymerase, PEP synthase, anaerobic rNTP reductase, and others (Karki et al., 2014).
1.2.2 Structure
The size of inteins varies significantly between species and also within the same species. The smallest intein discovered so far is the RIR-1 intein from Methanoth- ermobacter thermautotrophicus (Mth RIR-1), consisting of 134 amino acids. One the other hand, the largest discovered is the RFC-2 intein from Pyrococcus abyssi
6 (Pab RFC-2) of 608 amino acids (Raffo, Berardi et al. 2013). In general, inteins are classified as large or maxi inteins if their size is greater than 350 amino acids and mini-inteins if their size is smaller than 200 amino acids. The mini inteins lack a sequence of amino acids that are similar to homing endonucleases.
Inteins have two functional domains similar to the hedgehog proteins: one is a splicing domain while the other is a central homing endonuclease domain. From the crystal structure, we can see that the homing endonuclease domain seems to have been inserted into the splicing domain. It is speculated that at some point the ancestral self-splicing gene was invaded with the homing endonuclease gene (Vogel et al., 2014).
In Figure 1.2, we can see an ancestral Hedgehog and INTein (HINT) module was modified to a mini-intein by the addition of a polypeptide ligation region and further to a maxi-intein by the addition of a homing endonuclease. Also the same
HINT module, with addition of sterol recognition region was modified to hedgehog autoprocessing domain (Raffo et al., 2013).
1.2.3 Engineered Mini-inteins
To verify the a hypothesis that endonuclease gene invaded a sequence encoding a small, functional splicing element, variable lengths of sequence from the endonuclease domain from the Mycobacterium tuberculosis (Mtu) RecA intein were deleted to cre- ate engineered functional mini-inteins that are splicing proficient (Derbyshire et al.,
1997). The goal was to show that endonuclease activity is not required for protein splicing functions using site-directed mutagenesis. This mini-intein was transferred to a tripartite fusion system (MIC) for in-vivo characterization of splicing products.
The tripartite fusion system (MIC) consisted of maltose binding protein (M), fused
7 Figure 1.2: Evolution of inteins and Hedgehog-like autoprocessing proteins (Raffo et al., 2013))
in-frame to the intein (I) and then to the C-terminal domain of the homing endonu- clease I-TevI (C). It was found that the mini-intein was able to cleave itself out to generate the ligated product MC and verified using Western blot experiments (Der- byshire et al., 1997). V67L substitution helped stabilization of the mini-intein (Hiraga et al., 2005). This mini-intein has played an important part in this thesis, which we will study in the coming chapters.
8 Figure 1.3: Evolution of mini-inteins. Originally the inteins contain two domains: a splicing domain and an endonuclease domain. Endonuclease domain was deleted to yield functional artificial mini-inteins. Figure adapted from Derbyshire et al. (Der- byshire et al., 1997)
1.2.4 Split Inteins
The splice domains of a mini-intein can be split into two fragments, separated at the point where the endonuclease domain was excised, and expressed separately
9 using two separate genes. The N- and the C- domains recombine in trans to regain their ability to induce splicing activity. An artificial mini-intein derived from Mtu
RecA intein was split at the location where the endonuclease domain was deleted and was shown to retain splicing activity in vivo, but it was also shown to occur in vitro after the two fragments were denatured and then renatured (Derbyshire et al., 1997).
The split inteins are naturally present in living organisms. The catalytic subunit alpha (α) of DNA polymerase III (DnaE protein) is expressed in two fragments in two separate genes both encoding part of the DnaE pieces. When the fragments are ligated together and protein splicing takes place, the DnaE protein also ligates and becomes functional. It was also recently found that a number of split inteins exist in nature that have split sites other than the location where endonuclease domain deletions take place (Paterni et al., 2013).
1.2.5 Splicing Mechanism
The inteins are able to splice and re-ligate a number of polypeptide sequences.
This means that they have all the information required to carry out this process and is contained within the intein sequence. Protein splicing is therefore a self-catalyzed process, where the intein can be viewed as an enzyme that catalyzes the splicing reaction and then links the concomitant two substrates with a new peptide bond.
Since the discovery of protein splicing, the mechanism of intein splicing has been thoroughly investigated. The initial amino acid residue in an intein is generally a cysteine or a serine. These amino acids act as nucleophiles and attack the carbonyl group of the N-extein/intein peptide bond. This results in the formation of an ester
10 in the case of a serine by a N to O acyl rearrangement, and a thiosester by an N-
S one when a cysteine is the initial amino acid of the intein (Chong et al., 1996).
The proximity of the N- and C-termini of the intein then allows a transesterification reaction to take place, which results in the ligation of the two exteins. During this step, the hydroxyl or thiol group of the initial amino acid of the C-terminal extein acts once again as nucleophiles. These groups attack the previously generated (thio)ester
N-extein/intein linkage to create a new (thio)ester bond at the C-terminal splice junction between the two exteins (Xu et al., 1993). The transesterification reaction then leads to the formation of a branched intermediate where the two exteins are linked by a (thio)ester bond and the intein remains connected to the C-intein by a peptide bond. This branched intermediate can be identified with SDS polyacrylamide gel electrophoresis due to its substantially reduced electrophoretic mobility when the splicing reaction is decelerated and this reaction product is allowed to accumulate (Xu et al., 1993). After this reaction, the intein releases itself during a cyclization reaction
(aminosuccinimide formation), which is mediated by the asparagine present as the
final amino acid in the intein sequence (Evans and Xu, 1999). Finally, the (thio)ester link of the ligated exteins spontaneously rearranges itself through a last acyl shift to form a native peptide bond, leading to the formation of the mature ligated extein product (Paulus, 2000). Figure 1.4 is a schematic representation of this process.
After this reaction, the intein releases itself during a cyclization reaction (aminosuc- cinide formation), which is mediated by the asparagine present as the final amino acid in the intein sequence (Evans and Xu, 1999). Finally, the (thio)ester link of the lig- ated exteins spontaneously rearranges itself through a last acyl shift to form a native
11 Figure 1.4: The mechanism of protein splicing. Adapted from Wood 2000.
peptide bond, leading to the formation of the mature ligated extein product (Paulus,
2000). Figure 3 is a schematic representation of this process.
12 1.2.6 Applications of Inteins
Certain host proteins into which inteins have been inserted are inactivated but subsequently reactivate as the intein splices itself out. Some of the examples of inteins that are involved in such kind of behavior include Sce VMA intein (Zeidler et al.,
2004), Mtu RecA (Derbyshire et al., 1997) and Mxe GyrA (Adam and Perler, 2002).
Some examples of host proteins where this kind of behavior is seen are aminoglucoside phosphotransferase (Daugelat and Jacobs, 1999), thymidylate synthase (Wood et al.,
1999) and more. As mentioned earlier, the intein splicing reaction is self-catalyzed, meaning it does not require any cofactors or coenzymes. This ability makes them active either in vivo or in vitro. The Sce VMA intein has been found to be splicing- competent in bacteria, yeast (Kane et al., 1990), insects (Zeidler et al., 2004) or even in mammalian cells (Mootz et al., 2003). Similarly, the Mtu RecA intein is able to splice in bacteria (Daugelat and Jacobs, 1999), yeast (Buskirk et al., 2004) and in vitro (Gangopadhyay et al., 2003b,a).
Conventional protein purification involves using an affinity tag so that the target protein can be easily separated from the rest of the proteins. Sometimes removal of the affinity tag may not be necessary. But for applications like pharmaceuticals, it is necessary to remove the affinity tag. Typically, protease specific sites are introduced adjacent to the tag which is recognized by specific proteases and hence can be eas- ily cleaved. This step is expensive for two reasons: the proteases are prohibitively expensive, and the protease needs to be removed after cleavage. To overcome these drawbacks, inteins have been used as self-cleaving tags as shown in the figure 1.5.
13 Figure 1.5: Schematic of the intein mediated purification process with on-column cleavage. Inclusion of an intein between a binding domain and product protein renders the binding domain self-cleaving. A shift in pH initiates the cleavage reaction in column-bound material, allowing the collection of a pure product protein. Adapted from Wood et al 2000 (Wood, Derbyshire et al. 2000).
1.2.7 Construction of a Thymidylate Synthase Reporter Sys- tem
The characteristics of protein splicing mentioned earlier make intein technology an attractive candidate for the construction of molecular switches. During natural
14 selection, only the most active intein during splicing reaction has survived. Further- more, there are no examples available in the literature where nature makes use of the switching properties of the intein to regulate the levels of active host protein and thus there is no readily available natural mechanism of controlling the splicing reaction.
Since no apparent mechanism for regulation of intein activity is known, inteins must be modified to fine tune the splicing activity such that splicing reaction can be controlled. Therefore we would need a mode of control that can be easily manipulated to control the splicing reaction. There are two types of reporter systems: one is a screening system and the other is a selection system. The screening system requires evaluation of the properties of the reporter protein for every individual intein vari- ant. The selection system relies on the survival of cells expressing only intein variants with desirable characteristics. The enzyme, thymidylate synthase (TS), is found in all living organisms and is involved in a metabolic pathway called the folate cycle. En- zyme activity is critical for providing the cells with deoxythymidine monophosphate
(dTMP), which is a precursor molecule for thymidine (deoxythymidine triphosphate
-dTTP), an essential structural unit for the synthesis of DNA (Belfort and Pedersen-
Lane, 1984). TS catalyzes the reductive methylation of deoxyuridine monophosphate
(dUMP) by 5, 10 methylenetetrahydrofolate to yield dTMP and dihydrofolate. Two other enzymes: dihydrofolate reductase (DHFR) and serine transhydroxymethylase are required for completion of the folate cycle (Carreras and Santi, 1995). Cells that are expressing TS can be easily selected in a thymine-free medium (-THY medium) as TS activity is absolutely essential for the survival of cells as synthesis of DNA depends on it. The stringency of selection can be tuned by changing the incubation
15 temperature, as TS activity is temperature dependent. As the temperature is in- creased, there will be an increase in metabolic rates within the cell and thus increase in requirement for active TS. Another useful characteristic of the TS reporter is that, in the presence of trimethoprim, which is a DHFR inhibitor, the thymine production can be inhibited and thus provide a negative selection. Genetic selection with the TS reporter system is therefore a diverse tool for the evaluation of splicing performance of inteins.
Figure 1.6: TS genetic selection system. The roles of thymidylate synthase (TS), di- hydrofolate reductase (DHFR) and serine transhydroxymethylase (SHT) in the folate cycle. Adapted from Carreras and Santi (Carreras and Santi, 1995).
16 Chapter 2: Protein Switches for Biosensing
In the previous chapter, we were introduced to inteins, their different types, and how they can be used as protein switches. In this chapter, we will explore the idea of using inteins to make a simple bacterial biosensor. Before that, we will briefly touch upon the current strategies used in creating new protein switches. Over the past few decades, a lot of work has gone into creating new protein switches for biosensing and regulation of protein function. While natural allosteric proteins often contain intrin- sic molecular recognition and signal output function in a single domain, artificially engineered protein switches usually rely on a chimeric protein fusion where the ligand binding domain and the signal reporter domain are fused together. An important hurdle to constructing a protein switch is the signal transmission from the binding domain to the reporter domain (Golynskiy et al., 2011; Stratton and Loh, 2011). Sev- eral protein fusion strategies have been used; examples include end to end fusion and domain insertion. Fluorescent proteins can be fused to a receptor protein inducing a Forster Resonance Energy Transfer (FRET) signal (Miyawaki et al., 1997). Drug discovery relies on biosensing tools for efficient and rapid detection of ligand binding to therapeutically relevant protein targets. Nuclear hormone receptors (NHRs) are one such family of proteins that participate in vital functions in our body. They all have similar structural design where a particular alpha helix behaves as a switch to
17 turn the receptor on or off based on the ligand binding. This structural rearrange- ment feature can be utilized to construct a biosensor to screen for ligands that bind to the receptors.
2.1 Nuclear Hormone Receptors Superfamily
Nuclear Hormone Receptors (NHRs) are a family of receptors that participate in various vital functions in our body such as cell differentiation, reproduction, metabolism and cell growth. Physiology in mammals is subject to daily oscillations in hormone secretions, body temperature, renal activity, etc. There is a complex interlock sys- tem that coordinates these activities with perfect precision (Kotronoulas et al., 2009;
Marcheva et al., 2013). This interlock system consists of a superfamily of receptors that are collectively called NHRs. Many members of this family have been candi- dates for drug discovery studies due to their role in important metabolic pathways
(Lin et al., 2013; Yang et al., 2013). Estrogen receptors (ERs) are one of the most extensively studied members of this superfamily. ERs bind to the hormone estro- gen, thereby undergoing a conformational change of the ligand-binding domain which allows the receptor to dissociate from Hsp90 and form a homodimer. This homod- imer can then translocate to the nucleus and interact with a number of coactivators such as the steroid receptor co-activator (SRC)-1 and form an active complex with an ability to bind to regulatory regions of DNA (termed estrogen response elements
- EREs) and activate the expression of specific genes (Manolagas et al., 2013). Es- trogen receptors (ERs) contain two subtypes: ERα and ERβ.ERα is expressed in breast cancer cells, ovarian stroma cells and the hypothalamus (Yaghmaie et al., 2005;
Cheng et al., 2013). ERβ is expressed in kidney, brain, bone, heart, lungs, intestinal
18 mucosa, prostate and endothelial cells (Babiker et al., 2002). Ligand binding domains of nuclear hormone receptors share homology in their structure. They consist of a switch domain called containing a helix 12 which acts like a door to the ligand binding pocket (Aranda and Pascual, 2001). The repositioning of the helix 12 is dependent on the nature of the ligand. When the ligand is an agonist, the helix 12 moves towards the binding pocket, creating a charged area on the protein surface. This charged area is then occupied by a co-activator resulting in the initiation of transcription. When the ligand is an antagonist, the helix 12 tends to rotate away from the binding pocket resulting in suppression of transcription. This feature of structural rearrangement is important for designing of biosensors to detect ligand binding.
2.2 Nuclear Hormone Receptors Classification
NHRs are broadly classified into three categories: 1. the steroid receptors that include estrogen receptor (ER), androgen (AR), progesterone (PR), glucocorticoid
(GR), and mineralocorticoid (MR) receptors; 2. the retinoid acid-heterodimeric re- ceptors such as the thyroid hormone (TR), vitamin D (VTD), retinoic-acid (RAR),
9-cis-retinoic-acid (RXR) and ecdysone receptor (EcR); and 3. the orphan receptors such as the estrogen-related receptor and the steroidogenic factor 1 for which no en- dogenous ligands have been identified. For some of the orphan receptors, endogenous and synthetic potential ligands have been identified recently. For some of them, it is unknown if they are activated either by ligand binding or by some other mechanism
(Weatherman et al., 1999).
NHRs are further classified into subtypes to add to the complexity. Estrogen receptor, for example, has α and β subtypes, while the estrogen-related receptor has
19 Figure 2.1: Mechanism of estrogen signaling. (A) The estrogen receptor (ER) is present in the cytosol in a heterodimeric state bound to the molecular chaperone Hsp90. (B) Hormone binding induces a conformational change in the receptor, disso- ciation from Hsp90 and subsequent homodimerization. (C) The estrogen-bound ER dimer translocates to the nucleus and binds to estrogen response elements (EREs). (D) The ER complex interacts with coactivators (e.g. SRC) and allows the RNA polymerase to bind to the TATA box and initiate gene transcription.
α, β,andγ subtypes. Peroxisome proliferator-activated receptor (PPAR) has α, β,
γ,andδ subtypes (Bourguet et al., 2000).
2.3 Nuclear Hormone Receptor Ligands
NHRs are activated by binding to their native ligands which are small-molecule hormones. NHR ligands are mostly hydrophobic in nature, as they have to diffuse through the cellular membrane to reach the nucleus. They are broadly classified into
20 two categories: 1. steroidal compounds that are derivatives of cholesterol, and 2. non-steroidal compounds that are derived from various sources such as metabolites.
An important property of NHR ligands is that they are very similar in structure and have well conserved molecular volume. This property is important because the NHRs ligand binding pocket should be able to fit the ligand; this also determines its affinity and specificity.
NHRs can be activated by a number of chemicals that are structurally similar to the native ligands. These chemicals function as synthetic hormones and have the potential to treat conditions that stem from hormonal deficiencies. These chemicals are termed as agonists. There are chemicals that can inhibit the response of natural
Figure 2.2: Steroid receptor ligands
hormones or of other synthetic agonists. These kinds of hormone analogues are called antagonists and have a great medicinal value.
21 Figure 2.3: Retinoid X receptor ligands
2.4 Structural features of NHRs
The structural features of NHRs are highly conserved. They have three main functional domains: an N-terminal transactivation domain, a DNA binding domain, and a C-terminal ligand-binding domain. The N-terminal transactivation domain includes a ligand-independent activation function called AF-1. The DNA binding do- main consists of a transcriptional activation function domain (AF-2) that is activated upon ligand binding (Bourguet et al., 2000).
The C-terminal ligand binding domains of NHRs contain 12 alpha helices which are organized in a three layer helical sandwich. The 12th alpha helix is flexible and moves to accommodate the ligand binding. It opens up the ligand binding pocket and thus allows for the NHRs to undergo conformational change upon ligand binding.
This helix 12 includes the AF-2 region and extends outwards of the ligand-binding
22 Figure 2.4: Estrogen receptor ligands
pocket. Several NHRs have been crystallized in both apo as well as ligand bound form to confirm this phenomenon (Bourguet et al., 2000). Figure 2.5 is an example of human ERβ bound to synthetic estrogen E2.
23 Figure 2.5: The structures of ERβ in apo form and ligand-bound form. The ligand here is estrogen E2. The flexible helix 12 is red in color and undergoes a conforma- tional change to accommodate the ligand binding.
2.5 Hormone-sensing protein-based Biosensor Design and Construction
The biosensor protein was designed by fusing the ligand binding domain of a NHR to a well characterized reporter enzyme, TS, which was discussed in the previous chapter. It is known that expression of the ligand binding domain of NHRs suffers with stability and solubility issues, but has been resolved by fusion with other genes
(Wittliff et al., 1990). Therefore, fusions with other domains should help stability and solubility issues. Also, the mini splicing domain of the Mycobacterium tuberculosis
RecA intein is known to fold properly and retain activity when inserted into different protein hosts. Maltose binding protein (MBP) is known to increase the solubility of the fusion protein (Jeong et al., 2014). Thus, endonuclease domain of the full intein
24 was replaced with the ligand binding domain of a NHR which was further fused to the bacteriophage T4 td gene (expressing T4 TS enzyme) at the C-terminus. In addition, the first amino acid of the intein was mutated from Cys to Ala to suppress any splicing and N-terminal cleavage. The resulting fusion was cloned into the plasmid pMal-c2 fused to the C-terminal side of E. coli MBP. The resulting plasmid is referred to as pMIT::LBD (MBP-Intein-TS::LBD), where LBD is the ligand binding domain of a NHR. E. coli D1210ΔthyA cells were transformed with the resulting pMIT::LBD
Figure 2.6: Schematic of the different modules in the biosensor. As we can see, upon successful ligand binding the ligand binding domain undergoes conformational changes and further activates TS.
plasmids. The lac operator had G to A nucleotide substitution, 16 bases downstream of the TATAA motif such that there was reduced affinity to lac repressor, or making it constitutively active (Skretas and Wood 2005). The cells were grown in a minimal
25 media without thymine named as –– Thy. The incubation was optimized at 34oC.
Upon addition of ligand and incubation for about 16h, the cells were able uptake the ligand and bind to the ligand binding domain of the NHR, thus activating TS.
As a result, we can observe a TS+ growth phenotype. Several biosensor plasmids were constructed by swapping the LBD from different NHRs such as pMIT::ERβ, pMIT::TRβ, pMIT::PPARγ and pMIT::Nrx2b. Biosensor subtypes α for ER and TR were also constructed (Skretas et al., 2007)(Skretas, Meligova et al. 2007). Once we had the biosensor cells, we used a library of chemicals to detect any potential ligands, especially for those that might have therapeutic properties. We were able to plot the dose response curves and calculate the effective half minimal concentration (EC50) for each ligand with respect to the receptor protein.
2.6 Detection of compounds with hormone-mimicking prop- erties
Our endocrine system consists of a series of glands that secrete hormones directly into the blood stream. These hormones are involved in vital functions in our body such as development, reproduction, hemostasis and others (Burris et al., 2013). En- docrine disruptor chemicals (EDCs) are compounds, both natural and synthetic, that mimic the properties of hormones and alter the normal functioning of the endocrine system for both wildlife and humans. They are known, by many animal experiments, to cause adverse health defects by affecting the fundamental physiological processes
(Agas et al., 2012). They are found in almost every commodity that we use in our daily lives, such as perfumes, baby foods, vaccines and plastic bottles. Histor- ically, they were known to only impair reproductive and developmental processes,
26 but recently the focus has shifted to other metabolic diseases such as obesity, dia- betes, cancerous tumors, birth defects, autism spectrum disorders and others (Chen et al., 2009; Newbold, 2010; Polyzos et al., 2012). They can be classified according to the nature of their endocrine actions, examples include anti-androgenic, andro- genic, estrogenic, anti-thyroid, inhibitors of steroid hormone synthesis and retinoid agonists. They are also classified according to their usage in daily life: pesticides
(DDT and methoxychlor), fungicides (vinclozolin), herbicides (atrazine), industrial chemicals (PCBs, dioxins), chemicals used in the synthesis of plastics (phthalates, bisphenol A (BPA), alkylphenols), plant hormones (phytoestrogens), pharmaceutical drugs (diethylstilbestrol, DES) and personal care products (Gawrys et al., 2009; Skin- ner et al., 2011). Therefore it is imperative to develop a tool that can detect these compounds and potentially lead to discoveries in therapeutics to diseases that are linked to exposure of these hormone-like compounds. Several committees have been set up to develop strategies to tackle the problems related to EDCs. The inter-agency coordinating committee for validating alternative methods (ICCVAM) and the envi- ronmental protection agency (EPA) are examples of such committees that are inviting proposals to develop new tools for detection of EDCs. Therefore, we were able to test the library of compounds suggested by ICCVAM with the biosensor cells for their es- trogenicity. The original list contained 74 compounds but very few of them displayed a positive response. In the next section we will talk about using biosensor cells from different animal species and using them to test against the ICCVAM library.
27 2.7 Biosensors with Estrogen Receptors from Various Species
In the past, we have successfully constructed biosensors using the human estrogen and thyroid receptor protein to detect chemicals with hormonal activity (Skretas and
Wood, 2005a). We have extended the strategy by constructing biosensors with ERβ from various animal species including fish, pig, cow, zebrafish and mouse (Gierach et al., 2012). The biosensor design and the ligand binding domain boundaries are shown in figure 2.7.
The experiment was then extended to test a library of chemicals suggested by
Interagency Coordinating Committee for Validation of Alternative Methods (ICC-
VAM) that were classified as estrogenic. The ICCVAM aims to reduce the use of animals and encourage replacing them with lower level species for in vivo testing.
Table 1 shows the list of chemicals that were tested and displayed a positive growth phenotype during the test. The potencies of these test compounds were compared.
The half-maximal effective concentration (EC50) was compared to determine the rel- ative pseudotransactivation (RTPA). The EC50 values and the standard deviations were obtained using triplicate samples and have been presented as a plot between optical density versus test compound concentration. The calculations were based on nonlinear regression with variable hill slope (GraphPad Prism 6; GraphPad Software,
La Jolla, CA, USA). Bottom + (Top – Bottom) Y= (2.1) 1+10(logEC50–X)*Hillslope ECE2 RTPA = 50 X100% (2.2) ligand EC50
28 To further examine the quality of assay, Z factor for each measurement was cal- culated, which is an indication of signal to noise in a measurement (Zhang et al.,
1999).