Designing a synthetic biology modulator using structural mimicry of homeodomain-zipper

by

Duan Fang Tan

A thesis submitted in conformity with the requirements for the degree of Master of Science Department of Chemistry University of Toronto

© Copyright by Duan Fang Tan (2021)

Designing a synthetic biology modulator using structural mimicry of homeodomain-zipper proteins

Duan Fang Tan

Master of Science

Department of Chemistry University of Toronto

2021

Abstract In synthetic biology, an organism’s genome can be re-engineered to produce substances outside of its natural capabilities. This is achieved by incorporating exogenous DNA circuitries into the host. We designed a minimalist “franken-” termed HinZip that binds to a unique

DNA sequence comprising two 12 base-pair half-sites. HinZip combines motifs from eukaryotic and prokaryotic proteins to structurally mimic homeodomain-zipper (HD-Zip) proteins, which is a transcription factor family observed only in plants. HinZip has shown good binding affinity and DNA sequence specificity in vivo. These results provide a promising start to designing synthetic biology circuitries controlled by HinZip.

ii

Acknowledgments

To begin, I would like to sincerely thank my supervisor, Prof. Jumi Shin, for providing me this research opportunity and for her guidance throughout my studies. I also want to thank

Prof. Voula Kanelis for agreeing to be the second reader for my thesis and for providing her valuable feedback on my writing. Without their support, completing this thesis would not have been possible.

I am very grateful to have worked with the amazing group of past and present scientists in the Shin lab: Serban Popa, Montdher Hussain, Kevin Do, Raneem Akel, and Afnan Khan. I appreciate their enthusiasm and support, which made my research experience even more interesting and fulfilling. I am also grateful for the work done by all former graduate and undergraduate students from the Shin lab that helped me consolidate a starting ground for my own project.

Finally, I wish to thank my friends and family who supported me throughout my graduate career and helped me overcome the academic and emotional obstacles along the way. I would not be where I am today without the tremendous amount of support I received from them.

iii

Table of Contents

Acknowledgments...... iii

Table of Contents ...... iv

List of Tables ...... vi

List of Figures ...... vii

List of Appendices ...... ix

List of Abbreviations ...... x

1.1 Introduction ...... 1

1.1.1 The HD-Zip family of transcription factors and their unique presence in the plant kingdom ...... 1

1.1.2 Binding interactions of Hin recombinase in the DNA major and minor grooves of Salmonella’s hix sites ...... 3

1.1.3 Design and characterization of HinZip ...... 5

1.1.4 Applications in synthetic biology ...... 8

1.1.5 Research objectives ...... 9

1.2 Materials and Methods ...... 9

1.2.1 Handling of E. coli cells...... 9

1.2.2 Codon optimization ...... 11

1.2.3 Modification of DNA ...... 11

1.2.4 Subcloning of constructs into plasmid vectors ...... 14

1.2.5 Bacterial one-hybrid (B1H) protocol ...... 17

1.2.6 Protein expression ...... 19

1.2.7 Circular dichroism (CD) ...... 20

1.2.8 Electrophoretic mobility shift assay (EMSA) ...... 21

1.3 Results and Discussion ...... 22

iv

1.3.1 Assessing DNA-binding affinity of HinZip and variants in vivo via B1H ...... 22

1.3.2 Linker extension between the LZ and the DBD ...... 26

1.3.3 Linker deletion between the LZ and the DBD ...... 28

1.3.4 Assessing autoactivation of the B1H reporter system ...... 31

1.3.5 Further B1H analysis using the -16 invHixC reporter system ...... 34

1.3.6 Examining secondary structure of HinZip variants for proper protein folding .....35

1.3.7 Assessing DNA binding capability in vitro via EMSA ...... 36

1.3.8 Summary ...... 38

1.3.9 Future directions ...... 39

1.4 References ...... 42

1.5 Appendix ...... 48

1.5.1 Appendix 1 – Composition of media and buffers ...... 48

1.5.2 Appendix 2 – Oligonucleotide sequences and supplementary figures ...... 49

v

List of Tables

Appendix 2. Table 1. Sequences of oligonucleotides used for site directed mutagenesis/cloning49

Appendix 2. Table 2. DNA sequences used in B1H, CD, or EMSA ...... 50

vi

List of Figures

Figure 1. Structures of a HD-DNA complex and the HD-Zip proteins ...... 2

Figure 2. Crystal structure of the Hin DBD-DNA complex ...... 4

Figure 3. Magnification of crystal structure of Hin DBD-DNA complex focusing on the residue- and base-specific hydrogen bonding in the minor groove ...... 5

Figure 4. Structure of the designed protein HinZip ...... 6

Figure 5. Schematic representation of the B1H ...... 7

Figure 6. Subcloning scheme of the protein-encoding sequences into pB1H2w2 and pET28a(+) ...... 16

Figure 7. Subcloning scheme of the invHixC target sites into pH3U3 ...... 17

Figure 8. Investigating optimal spacer lengths from the 0sp to 9sp invHixC for HinZip binding 24

Figure 9. HinZip, HinZip/LA, and Hin 55-mer binding to the 0sp and 9sp invHixC ...... 25

Figure 10. The extended and original constructs of HinZip and HinZip/LA binding to the 0sp and 9sp invHixC ...... 27

Figure 11. Structures of HinZip and tightHinZip ...... 30

Figure 12. B1H testing autoactivation of the -12 0sp invHixC ...... 32

Figure 13. B1H assay for autoactivation...... 34

Figure 14. Representative CD spectra of HinZip...... 35

Figure 15. EMSA of HinZip binding to 0sp invHixC ...... 38

Appendix 2. Figure 1. N-terminal sequences of HD-Zip TFs ...... 51

vii

Appendix 2. Figure 2. Formation of Hin recombinase’s tetrameric complex during DNA inversion ...... 52

Appendix 2. Figure 3. Plasmid maps of pB1H2w2 and pET28a ...... 53

Appendix 2. Figure 4. B1H of HinZip and tight HinZip binding to the -16 invHixC reporter system ...... 54

viii

List of Appendices

1.5.1 Appendix 1 – Composition of media and buffers ...... 48

1.5.2 Appendix 2 – Oligonucleotide sequences and supplementary figures ...... 49

ix

List of Abbreviations

Amp Ampicillin B1H Bacterial one-hybrid bZIP Basic region-leucine zipper CD Circular dichroism DBD DNA-binding domain DNA Deoxyribonucleic acid E. coli Escherichia coli E-box Enhancer box EMSA Electrophoretic mobility shift assay HTH Helix-turn-helix IPTG Isopropyl β-D-1-thioglalactopyranoside LB Lysogeny broth

Kd Dissociation constant Kan Kanamycin LZ Leucine zipper NMR Nuclear magnetic resonance OD Optical density PCR Polymerase chain reaction PEG 8000 Polyethylene glycol 8000 PDB Protein Data Bank RNA Ribonucleic acid RPM Revolutions per minute SOB/SOC Super optimal broth/Super optimal broth with catabolite suppression SP Spacer TAE/TBE Tris-Acetate-EDTA/Tris-Borate-EDTA TF Transcription factor Zif268 Zinc finger-containing transcription factor 268 3-AT 3-amino-1,2,4-triazole

x

1.1 Introduction

1.1.1 The HD-Zip family of transcription factors and their unique presence in the plant kingdom

The homeodomain (HD) is a 60 amino-acid helix-turn-helix (HTH) motif present in eukaryotic transcription factors (TFs). The HTH comprises two scaffolding helices for stability and a third recognition helix for DNA-binding interactions.1,2 Another common feature of the

HD motif is a disordered N-terminal arm found before the DNA-recognition helix (Figure 1a).

HDs were first discovered in the Drosophila genome and found to bind monomerically to DNA target sites with high affinity toward homeotic regulation of organ and body development. Since their discovery in Drosophila, HDs have been found across all eukaryotic species and to carry a highly conserved structure. This suggests that a properly folded and consistent structure in the

HD is important for upholding its DNA-binding activities.3

As more HD TFs are discovered, their structures are found to encompass more than the

HTH and the disordered N-terminal arm. The most notable example is the homeodomain-zipper

(HD-Zip) family of TFs exclusively found in the plant kingdom.3 As the name indicates, this family of plant TFs contains the iconic HTH motif in its DNA-binding domain (DBD), including an N-terminal disordered arm. These HD-Zip TFs also contain a leucine zipper (LZ) at their C- termini. The LZ is an amphipathic α-helix that consists of heptad repeats (abcdefgn) of leucine residues at the fourth position (d) and hydrophobic residues at the first position (a).4 When two

LZ monomers interact with each other, residues at positions a and d face each other to form a hydrophobic core via side-chain packing of the hydrophobic residues.4 These interactions result in the formation of a coiled-coil LZ dimer. This dimerization introduces a new mode of DNA- binding by HD-Zip TFs as dimers instead of the previously seen monomeric binding by HD TFs in other eukaryotes.1,2

1

Arabidopsis thaliana, commonly known as rockcress, contains the most well- characterized HD-Zip TFs, which are divided into four classes.2,5 The TFs from each class possesses a HTH and a LZ as the basic structure. Beyond this basic structure, classes II, III, and

IV HD-Zips incorporate additional domains that provide new functions (Figure 1b).

a) b)

Figure 1. Structures of a HD-DNA complex and the HD-Zip proteins. a) Nuclear magnetic resonance (NMR) structure of a HD-DNA complex. Pituitary homeobox 2 (PITX2) from human is used to represent the general structure of HD TFs, which comprises the recognition helix within an HTH motif and the disordered N-terminal arm (both highlighted red; PDB: 2LKX). b) Schematic representation of the domains present in each of the subfamilies of HD-Zip TFs. Classes I and II HD-Zip TFs primarily consist of an HD and an LZ as their functional domains. Despite HD-Zip II being the only class with a characterized N-terminal disordered arm (red segment), an N-terminal region comprising low-complexity sequences, can be seen on HD-Zip I and IV as well,5 thus supporting its conservation in the HD- Zip proteome (Figure 1 in Appendix 2). Adapted from Ref.1.

Classes I and II HD-Zips served as templates for designing our novel protein. These two classes comprise minimalist protein structures, and can dimerize via LZ interactions to recognize a unique 9 bp pseudo-palindromic target sequence CAATNATTG, where N can be any of the four DNA bases.3

2

1.1.2 Binding interactions of Hin recombinase in the DNA major and minor grooves of Salmonella’s hix sites

Based on the HD-Zip structure, we selected Hin recombinase, which binds to DNA target sites using both a HTH motif and an N-terminal disordered arm. Hin recombinase belongs to the serine recombinase family and utilizes a conserved serine in its catalytic domain to invert a 1 kilo-base-pair (kb) segment of the Salmonella chromosome.6 The inverted 1 kb segment is flanked by two inversion sites, hixL and hixR. Each of the inversion sites contains two 12 base- pair (bp) half-sites that are imperfect, inverted repeats of each other.7–9 Upon binding to the inversion sites, two Hin monomers dimerize at the N-termini via cysteine crosslinking of an elongated helix E that connects Hin’s DBD to its catalytic domain. The catalytic domains of the two dimers then join with the Fis proteins on adjacent enhancer sites to form a tetrameric invertasome complex that facilitates the cleavage and inversion of the DNA segment (Figure 2 in

Appendix 2).8

We selected Hin recombinase for two reasons. First, Hin’s HTH DBD bears close resemblance to the HD TFs, and Hin’s DBD also carries a disordered N-terminal arm attached to its HTH motif (Figure 1a). Thus, Hin’s use of the HTH and the disordered N-terminal arm for

DNA-binding is similar to HD TF’s DNA-binding mechanism (Figure 1a). Second, unlike the

HD-Zips, Hin’s DNA-binding mechanisms have been characterized with a crystal structure of the DBD-DNA complex (PDB:1HCR).10 As HD-Zip TFs also contain a HD DBD that comprises a HTH motif and an N-terminal disordered arm (Figure 1b), we extrapolated the DNA-binding mechanisms from HD to HD-Zip in order to validate our design of HinZip as a mimic of the HD-

Zip TFs.

In the binding complex, Hin’s DBD contacts both the major and minor grooves of the target hix sequence (Figure 2a).10 In the major groove, Ser174 and Arg178 of the recognition

3

helix establish specific base contacts (Figure 2b). Ser174 bonds to A21 via water-mediated

hydrogen bonds between its carbonyl oxygen and N7 and N6 of adenine’s purine rings.11 The

hydroxyl group on Ser174’s side chain also makes a direct hydrogen bond to N7 of A10. N6 of

A10 and O4 of the complementary T22 also take part in water-mediated hydrogen bonding with

the side chain of Arg178. The terminal guanidine group of Arg178 makes an additional hydrogen

bond to N7 of G9. This arrangement of water-mediated hydrogen bonds is commonly seen in

TF-DNA binding such as the trp repressor.12

A21

S174

T22 A10

G9 R178 a) b)

Figure 2. Crystal structure of the Hin DBD-DNA complex. a) Hin’s DNA-binding complex. The N-terminal arm and recognition helix are highlighted red. b) Magnification of crystal structure focusing on the orientation of the side chains of Ser174 and Arg178 in the major groove. While the carbonyl oxygen of Ser174 makes a straightforward hydrogen bond to the water molecule (represented by the blue dot), the orientation of Arg178 is more stringent as it needs to position its amine and terminal guanidine over two rungs of the DNA ladder to interact with both strands simultaneously. Both a) and b) are adapted from Ref.10 (PDB: 1HCR).

Hin also possesses the disordered N-terminal arm that is an unusual binding element. As

a result of the high charge density in the poly A/T tract in the minor groove, the positively

charged Gly-Arg-Pro-Arg “warhead” of the N-terminal arm is guided into the minor groove via

charge interactions. Upon entering the minor groove, the two Arg residues make hydrogen bonds

to A26, T6, and the phosphate backbone at P8 (Figure 3). Not only is the formation of these

4

hydrogen bonds charge-dependent, as the warhead needs to first be guided into the minor groove, but it is also geometry-dependent for both the DNA and the protein. For Arg140 to form hydrogen bonds to both A26 and T6, the DNA must have a larger-than-usual propeller twist. It has been proposed that a G-C substitution at base-pair 6 would render the second hydrogen bond impossible to achieve, as the DNA would slightly unwind and the N2 amine of guanine would push away Arg140.10 Meanwhile, Pro141 arches over the minor groove and brings Arg142 to form the last hydrogen bond to P8. These minor groove interactions are highly specific, and studies have shown that deletion of the warhead completely abolished binding.10

P8

R142

A26 T6 R140

Figure 3. Magnification of crystal structure of Hin DBD-DNA complex focusing on the residue- and base-specific hydrogen bonding in the minor groove. The migration of the N- terminal disordered arm into the minor groove is facilitated by the charge attraction from the tightly wound DNA. The complex is then stabilized by the three hydrogen bonds indicated by red arrows. The unusually large twist at A26-T6 swings both nitrogenous bases under Arg140’s side chain to establish two direct hydro0gen bonds (PDB:1HCR).

1.1.3 Design and characterization of HinZip

Interestingly, the HD-Zip family and Hin recombinase’s DBD both utilize the HTH motif to establish specific DNA binding. When comparing Hin to classes I and II HD-Zips, the similarities are even more prevalent, as they all contain characteristic N-terminal disordered

5

arms, which have proved to be essential for Hin’s specific binding activity in previous studies.10

Therefore, we fused the Hin DBD with the FosW LZ in order to structurally mimic HD-Zip TFs

(Figure 4). FosW is a semi-rationally designed variant of the LZ found in cFos, which is a

mammalian TF that heterodimerizes with cJun.13 FosW was designed by Mason and coworkers

to homodimerize while still being able to heterodimerize with cJun.14,15 FosW has been used by

the Shin group before to create homodimerizing TFs.16–18

C-termini

N-terminus N-terminus a) b)

Figure 4. Structure of the designed protein HinZip. a) Proposed 3-D model of HinZip binding to the palindromic invHixC as a dimer. The addition of the FosW LZ to Hin’s DBD should allow HinZip (purple) to dimerize, and therefore, to recognize a longer DNA sequence (discussed in section 1.1.5.) The FosW LZs were each attached to the C-terminal arm of Hin’s DBD. This should provide enough freedom for the α-helices to swing around from either side of the DNA to form the coiled-coil. b) Sequence of HinZip. All the residues in the native Hin DBD were included in the design, except for KH at the end of the N-terminal arm changed to KL (highlighted yellow) to establish the HindIII restriction site for domain swapping. Two additions, GT and GSG (highlighted yellow), were also made to insert KpnI and BamHI restriction sites for subcloning and further protein modification, respectively.

To study this franken-protein that we created, we employed strategies from previous

projects in the Shin group to assess the structure and binding capability of the protein both in

vivo and in vitro.17 For the in vivo study, we used the bacterial one-hybrid (B1H) assay designed

6

by Wolfe and coworkers to assess HinZip’s binding to its cognate DNA in E. coli.19 As mentioned before, we fused FosW to the C-terminus of Hin’s DBD. Therefore, HinZip did not possess any recombinase activity due to the lack of a catalytic domain. The omega subunit of

RNA polymerase was fused to the N-terminus of HinZip (Figure 3a in Appendix 2). This construct allowed the omega subunit to recruit the rest of the polymerase to the lac after HinZip successfully bound to the target site, which then activated transcription of a His3 reporter gene positioned downstream (Figure 5). The transcription of His3 is vital to the E. coli

USO cells’ survival as this strain of E. coli is a histidine auxotroph that must rely on the biosynthesis of L-histidine from His3 to propagate.19 From there, we interpreted HinZip’s binding strength using E. coli cell growth on histidine-deficient agar. We also included 3-amino-

1,2,4-triazole (3-AT) in the B1H as a competitive inhibitor of His3. 3-AT inhibits the His3 gene product imidazoleglycerol-phosphate dehydratase, and disrupts the biosynthetic pathway of L- histidine.20 The purpose of this added competition was to reduce background growth and select the strongest HinZip-DNA interaction that would overcome 3-AT’s inhibition to yield the highest growth.21

Figure 5. Schematic representation of the B1H. HinZip is fused to the omega subunit of RNA polymerase by a 21-residue linker. This protein-RNAP construct is cloned into the pB1H2w2 vector (see Methods for subcloning procedure). Binding of HinZip to the invHixC target sequence on the pH3U3 vector brings the omega subunit to the lac promoter where the RNA

7

polymerase complex is assembled to initiate transcription of the downstream His3 reporter gene. Successful transcription of His3 allows the USO cells (ΔrpoZ, ΔhisB, ΔpyrF) to survive on His- deficient agar.19

For the in vitro studies, we used circular dichroism (CD) spectrometry to determine the secondary structure of HinZip in terms of α-helicity. Since the Hin DBD and FosW were largely helical, α-helicity can indicate whether HinZip adopts a proper secondary structure. We also used electrophoretic mobility shift assay (EMSA) to assess the formation of our protein-DNA

22,23 complex in terms of binding constants (Kd) and binding cooperativity (Hill coefficient).

1.1.4 Applications in synthetic biology

We designed HinZip as a novel DNA-binding protein to modulate synthetic biology systems. Synthetic biology is a multidisciplinary field that focuses on re-engineering an organism’s genome to introduce functions beyond the host’s natural capabilities.24 These functions range from biosensing of pollutants to producing high value chemicals and drugs. One of the noteworthy examples is the microbial production of artemisinic acid,25 the precursor molecule of the anti-malarial drug artemisinin. Conventional extraction of artemisinin from the native plant Artemesia annua is time-consuming and inefficient, which makes anti-malarial treatments scarcely available and costly.26 To combat these issues, a genetically engineered strain of S.cerevisiae (brewer’s yeast) was developed to overexpress artemisinic acid via introduction of gene circuits found in A.annua and other organisms.25 This synthetic pathway can produce high yields of artemisinic acid from fermented yeast, and commercial production of artemisinin has been integrated into anti-malarial treatments since 2013.25 Similarly, we envisioned HinZip being used as a modulator for synthetic systems to produce valuable compounds, such as vaccines. Efficient development of vaccines was prompted by the influenza pandemic in 2009.27 Since then, synthetic antigen-encoding RNA has replaced the use of killed or weakened viruses as vaccines. Recently, researchers have begun to develop synthetic gene 8

circuitries for rapid vaccine production.27 Considering the current coronavirus outbreak, the development of HinZip and other proteins as synthetic biology modulators is crucial for advancing the gene-based vaccine technology.28

1.1.5 Research objectives

Our goal for this project is to construct the hybrid minimalist protein HinZip that binds to our designed cognate DNA with high affinity. We aim to achieve this by employing a well- characterized DNA-binding motif from Hin recombinase. We also envisioned HinZip’s DNA- binding to be orthogonal, meaning that HinZip should not interact with the host genome’s biomolecules in a synthetic biology system. We aim to achieve orthogonality by adding FosW to the construct: as a HinZip dimer can recognize a larger, potentially palindromic sequence comprising hix half-sites, this unique target sequence should encourage the HinZip dimer binding at a unique site in host genome (Figure 4a). Depending on the length of the spacers separating the half-sites, the size of the DNA target site can range from 24-33+ bp (Table 2 in

Appendix 2), which is likely to be unique in the genome. Furthermore, by combining modules from TFs found in bacteria and mammals to mimic a unique plant protein structure, we hope to minimize any side activity that may arise from interaction with the host. This will translate to higher efficiency and signal-to-noise ratio once our system is implemented into synthetic biology circuitries.

1.2 Materials and Methods

1.2.1 Handling of E. coli cells

1.2.1.1 Preparation of chemically competent E. coli cells

Chemically competent E. coli strains (DH5, USO, and BL21) were prepared by transformation and storage methods as previously described.29 E. coli cells were grown in LB

9

(Appendix 1) to early exponential growth phase where the optical density of the cell culture at

600 nm (OD600) was between 0.3 and 0.4. The pellet was collected by centrifugation at 4000 rpm at 4 °C (J2-HC Centrifuge, Beckman). The pellet was resuspended in ice-cold transformation and storage solution (TSS) at 10% of the original volume. TSS comprised 10% w/v PEG 8000, 5% v/v DMSO, and 20 mM MgCl2 in LB. The resuspended cells were aliquoted in 100 μL volumes and stored at -80 C for long-term storage.

1.2.1.2 Transformation of chemically competent E. coli cells

1 L plasmid DNA (intact plasmid or ligation product) was added to 100 L chemically competent cells. The cells were incubated for 30 minutes on ice followed by heat shocking for 60 seconds at 42 C. The cells were then placed back on ice for 15 minutes. 1 mL Super Optimal broth with Catabolite repression (SOC; Super Optimal broth, SOB, with 20 mM glucose) was mixed with each cell suspension. The cells were recovered for 1 hour at 37 C with shaking at

200 rpm. The successfully recovered cells were grown on LB agar plates containing the corresponding antibiotic(s).

1.2.1.3 Culturing and storage of E. coli cells

DH5 (Invitrogen) and USO (Addgene bacterial strain #18049), the two E. coli strains used in the experiments, were grown in LB media or on LB agar plates, unless otherwise stated.

One colony was picked from each agar plate to initiate an overnight culture in 6 mL LB.

Overnight cultures, when applicable, were also started by inoculating from glycerol stocks of cells in LB. Overnight cultures were grown at 37 C with shaking at 200 rpm using a MaxQ 400

Shaker (Thermo Scientific) for 14-16 hours. Growth media for the overnight cultures contained antibiotics corresponding to the selected plasmid(s). The media contained ampicillin (Amp) for pB1H2 selection and kanamycin (Kan) for pH3U3 or pET28a selection. The final concentration

10

was 50 mg/L for Amp and 30 mg/L for Kan. After cells were successfully grown in antibiotic-

containing media, 50% v/v glycerol was added in equal volume to the cell culture to make

glycerol stocks. The glycerol stocks were stored at -80 C for long-term storage.

1.2.2 Codon optimization

DNA sequences coding for HinZip and variants were optimized in E. coli codons using

the codon optimization tool from Integrate DNA Technologies.

1.2.3 Modification of DNA

1.2.3.1 Preparation of plasmid DNA

Plasmids were harvested from DH5 cells and purified using a QIAprep Spin Miniprep

Kit (Qiagen). The 6 mL overnight culture was centrifuged, and the supernatant was discarded to

obtain the cell pellet. The cells were then lysed and treated under the manufacturer’s instruction

with the following modification: (1) Wash Buffer, PE, was allowed to incubate on the column

for 3 minutes before centrifugation to ensure adequate column equilibration; (2) an additional

spin for 1 minute was performed after removal of wash buffer to ensure complete removal of

residual ethanol; (3) DNA plasmids were eluted using nuclease-free water instead of Elution

Buffer, EB; (4) washed cell lysate and water were incubated on the column for 5 minutes before

spinning to improve elution efficiency.

1.2.3.2 Restriction digestion of plasmid

Restriction endonucleases and buffers were purchased from New England Biolabs. 100-

300 ng DNA plasmids were digested under the manufacturer’s instructions. Restriction

endonucleases were heat inactivated at 65 C for 20 minutes when applicable. Reactions were

performed in an Applied Biosystems Veriti® 96-Well Thermal Cycler (Life Technologies).

11

1.2.3.3 Alkaline phosphatase treatment

Plasmids used as vectors in subsequent ligation reactions were treated with alkaline phosphatase after digestion to prevent re-circularization. Alkaline phosphatase and buffer were purchased from New England Biolabs. The phosphatase and buffer were added to the digestion reaction immediately after completion following the manufacturer’s instructions.

1.2.3.4 Agarose gel electrophoresis

Gels containing 1-2% agarose were made by mixing 0.6 g agarose (Invitrogen) with 60 mL TAE buffer (Appendix 1). 3 L SYBR® Safe DNA Gel Stain (Invitrogen) was added to aid visualization of the gel by UV/Vis spectrometry at 302 nm (VWR LM-20E transilluminator,

VWR Scientific). 10-60 L digested DNA vector was loaded into the wells of the agarose gels after loading the 1 kb Plus DNA Ladder (Invitrogen) in the reference lane. The gels were electrophoresed at 100 V for 30-60 minutes with an EC 105 Electrophoresis Power Supply (E-C

Apparatus Corporation).

1.2.3.5 Gel extraction of digested fragments

After electrophoresis and visualization of the agarose gel, the band corresponding to the desired plasmid fragment were excised from the gel with a scalpel. The excised gel piece was treated with a QIAquick Gel Extraction Kit (Qiagen) to extract and purify the DNA fragment from the gel. Manufacturer’s instructions were followed with the following modifications: (1) the dissolved gel and water were incubated on the column for 10 minutes before elution to ensure thorough binding and elution of DNA; (2) DNA elution was performed with 20 L nuclease-free water instead of the Elution Buffer, EB.

12

1.2.3.6 Chemical Ligation

T4 DNA ligase and buffer were purchased from New England Biolabs. 30 ng vector

DNA were used. Inserts containing the oligonucleotides of interest were added in a 10- to 20- fold molar excess to vector DNA in all ligation reactions to ensure optimal ligation efficiency.

The required insert concentrations were calculated using the Ligation Calculator (Insilico Online

Bioinformatics Resources). Ligation reactions were carried out under the manufacturer’s instruction with the following exception: reactions were incubated at 14 C overnight followed by incubation for 2 hours at 16 C. Reactions were performed in an Applied Biosystems Veriti®

96-Well Thermal Cycler (Life Technologies).

1.2.3.7 Site directed mutagenesis

PCR primers were designed with the intended mutant nucleotide embedded in a 26- nucleotide sequence (Table 1 in Appendix 2). Oligonucleotides were synthesized by Eurofins

Genomics. The lyophilized primers were re-suspended in the recommended volumes of nuclease-free water to make 100 M stock solutions. The stock solutions were diluted to 10 M before being added to the reactions. All PCR reactions had a 50 μL total volume and were performed with Phusion® High-Fidelity DNA polymerase (New England Biolabs) following the manufacturer’s instructions. Reactions were performed in an Applied Biosystems Veriti® 96-

Well Thermal Cycler (Life Technologies).

1.2.3.8 Quantification of DNA

DNA concentrations were measured using a NanoDrop 2000 spectrophotometer (Thermo

Scientific) following the manufacturer’s instructions.

13

1.2.3.9 DNA sequencing

All Sanger sequencing of plasmid DNA and PCR products was performed by The Centre for Applied Genomics (TCAG) at The Hospital for Sick Children, Toronto.

1.2.4 Subcloning of constructs into plasmid vectors

1.2.4.1 Subcloning of protein constructs into the pB1H2w2 vector

The HinZip-encoding DNA sequences, which are flanked by restriction sites that allowed ligation into the pB1H2w2 vector’s multi-cloning sites, were synthesized by Integrated DNA

Technologies. pB1H2w2 (Addgene plasmid #18038) was a gift from Dr. Scot Wolfe.19 The plasmid vector contained the two restriction sites KpnI and XbaI (Figure 6), which were used to insert the protein encoding sequence (Figure 3a in Appendix 2). The protein coding sequences contained a XhoI restriction site followed by a stop codon at the 3’ ends (Figure 6). The position of the XhoI before the stop codon on pB1H2w2 was intended for subcloning of the protein constructs into pET28a(+) (Novagen plasmid #69864) for protein overexpression and purification (Figure 3b in Appendix 2). Site directed mutagenesis was done previously by the

Shin group to insert a KpnI restriction site on the pET28a plasmid.30 Both pB1H2w2 and the protein coding oligonucleotides were digested with KpnI and XbaI followed by alkaline phosphatase treatment on only the linear pB1H2w2 to minimize self-annealing. The digested

DNA fragments were separated from the reaction mixtures and recovered from a 1-2% agarose gel, as described above. The recovered DNA fragments were quantified and ligated before transforming into the DH5α cells for subsequent plasmid preparation. The ligation product was confirmed by Sanger sequencing.

The DNA sequence coding for HinZip/LA, the HinZip variant with the leucine-to-alanine mutations in the LZ, was obtained by first digesting pB1H2w2/HinZip with XhoI and BamHI, a

14

restriction site positioned immediately before the LZ subdomain (Figure 3c in Appendix 2). This digestion removed the leucine-containing LZ. Along with the removal of the LZ on HinZip,

MEF/LA, a basic helix-loop-helix/zipper (bHLHZ) TF from previous graduate work from the

Shin lab,17,30 was digested also with XhoI and BamHI. The LA insert and the linearized pB1H2w2/HinZip were purified by agarose gel electrophoresis, quantified, and chemically ligated, as described above. The ligation product was confirmed by Sanger sequencing.

The DNA sequence coding for the Hin 55-mer, the truncated variant of HinZip without an LZ, was made by first mutating the sequence immediately following the BamHI restriction site to a XhoI using site directed mutagenesis as described above. The site directed mutagenesis primers were synthesized by Eurofins Genomics (Table 1 in Appendix 2). The LZ was then excised by restriction digestion using XhoI (Figure 3c in Appendix 2). The digested pB1H2w2/HinZip was recovered by gel electrophoresis and allowed to re-anneal during chemical ligation. The ligation product was confirmed by Sanger sequencing.

The oligonucleotides of the GGASGS extension of the disordered linkage between the

Hin DBD and the LZ were ordered from Eurofins Genomics (Table 1 in Appendix 2). The forward oligonucleotide contained the coding sequence of the GGASGS extension, the last five nucleotides of the BamHI restriction sequence at the 5’ end, and the first nucleotide of the

BamHI sequence at the 3’ end. The reverse oligonucleotide contained the complementary sequence to the GGASGS coding sequence, the last five nucleotides of the BamHI restriction sequence at the 5’ end, and the first nucleotide of the BamHI sequence at the 3’ end (Table 1 in

Appendix 2). These oligonucleotides were designed so that the annealed duplex carried the appropriate 5’ and 3’ sticky ends for ligation. Both pB1H2w2/HinZip and pB1H2w2/HinZip/LA were digested by BamHI. The linearized plasmids were recovered and ligated with the designed

15

insert. Note that the ACTAGS extension was the unintended result from the same ligation reaction. By virtue of using only one palindromic restriction site, the reverse strand of the duplex may be inserted to the forward strand of the vector (Figure 3c in Appendix 2).

: KpnI : XhoI : Stop codon :XbaI

Figure 6. Subcloning scheme of the protein-encoding sequences into pB1H2w2 and pET28a(+). KpnI and XbaI were first used to incorporate the protein-encoding DNA into pB1H2w2. This effectively introduces XhoI into the pB1H2w2 vector as well. KpnI and XhoI were then used to subclone the protein constructs from pB1H2w2 into pET28a(+). The stop codon between XhoI and XbaI was necessary for pB1H2w2 for terminating transcription right at the end of the protein-encoding sequence. However, the stop codon was discarded during the subcloning into pET28a(+) because the hexa-histidine tag after XhoI needed to be included in the final transcript.

1.2.4.2 Subcloning of protein constructs into the pET28a(+) vector

Protein-encoding sequences in pB1H2w2 were digested with KpnI and XhoI, and the excised DNA segments were recovered on a 1% agarose gel. The recovered protein-encoding insert segments were then ligated into pET28a(+) using the multicloning sites of the plasmid vector (Figure 6). XhoI was positioned immediately before a hexa-histidine tag (Figure 3b in

Appendix 2), which was used for protein purification by metal affinity chromatography

(discussed in section 1.2.6). Restriction digestion, agarose gel electrophoresis, agarose gel extraction, and chemical ligation of the insert DNA were performed as described in sections

1.2.3.4-1.2.3.6 and 1.2.3.2. The ligation product was confirmed by Sanger sequencing

16

1.2.4.3 Subcloning of DNA target sites into the pH3U3 vector

The DNA target invHixC with different spacers was ordered from Eurofins Genomics.

The forward and reverse nucleotides were designed in similar fashion as described above. The annealed duplexes of the DNA target carried the appropriate sticky ends for NotI and EcoRI at the 5’ and 3’ positions, respectively (Figure 7; Table 2 in Appendix 2). NotI and EcoRI were present on pH3U3 (Addgene plasmid # 12609, a gift from Dr. Scot Wolfe).21 pH3U3 plasmids were digested with NotI and EcoRI, and the digested vector was recovered by gel electrophoresis. Target and NS DNA fragments were inserted by chemical ligation (Table 2 in

Appendix 2). The ligation products were confirmed by Sanger sequencing.

: NotI : EcoRI

Figure 7. Subcloning scheme of the invHixC target sites into pH3U3. NotI and EcoRI were used to incorporate the invHixC DNA segments into pH3U3.

1.2.5 Bacterial one-hybrid (B1H) protocol

1.2.5.1 Construction of DNA-binding partner (bait) and reporter system (prey)

Protein-encoding pB1H2w2 plasmids and target DNA-containing pH3U3 plasmids were used in the B1H assays as the bait and the prey, respectively. Construction of the bait and prey plasmids by subcloning were performed as described above.

1.2.5.2 B1H binding and autoactivation assays

Histidine-deficient NM selection plates were prepared along with histidine-deficient

(His-) and histidine-rich (His+) minimal growth (NM) media the day before performing the assay as described previously (Appendix 1).19 E. coli USO cells (ΔrpoZ, ΔhisB, ΔpyrF) were doubly transformed with pB1H2w2 coding for HinZip or one of its variants and pH3U3 carrying

17

one of the target or NS sequences. After 18-24 hours development on LB agar plates containing

Amp and Kan, overnight cultures were established from single colonies of the transformed USO cells. The overnight cultures were grown at 37 °C with shaking for 16-18 hours before 1 mL aliquots were taken from each of the cultures to inoculate 3 mL His+ NM media cultures. The inoculated His+ NM cultures were grown at 37 °C with shaking for 1 hour before being centrifuged at 1380g for 5 minutes. Centrifugation was carried out by a Centrific Model 228 centrifuge (Fisher Scientific). After decanting the His+ supernatant, the pellets were resuspended in 1 mL His- NM before being centrifuged again at 1380g for 5 minutes. This wash step was repeated twice to completely remove histidine from the pellets. After the third wash step, the pellets were resuspended in His- NM before being diluted 14-fold into 1 mL His- NM. The concentrations of the USO cells in His- NM were determined by their OD600. The cultures were diluted first to obtain OD600 0.1, and then were serially diluted by 10-fold each time for five times to obtain a range of concentrations of USO cells from 10-1 to 106. 5 μL from each dilution was placed on the His- NM selection plates with 0-20 mM 3-AT and on an LB agar plate. One positive and one negative control were included with the B1H assays. The positive control consisted of USO cells carrying Zif268, a zinc finger transcription factor, on pB1H2w2 and the specific binding sequence of Zif268 (GCGTGGGGGCG) on pH3U3.31 The negative control consisted of USO cells carrying Zif268 on pB1H2w2 and the -11 GC E-box sequence on pH3U3

(Table 2 in Appendix 2). The -11 GC E-box sequence was from previous studies done in the

Shin group and showed weak, non-specific interaction with Zif268.30

Autoactivation of the target binding sequences was investigated by transforming the USO cells with only pH3U3 plasmids carrying the DNA sequence under investigation. The transformed USO cells were grown on LB plates with only Kan and all other steps described above were done in the absence of Amp.

18

1.2.6 Protein expression

HinZip and variants were overexpressed in E. coli and purified using previously described methods with the modifications described below.32 Optimized protein coding sequences were cloned into pET28a(+) from pB1H2w2 as described in section 1.2.4.2. The pET28a(+) plasmids encoding the various proteins were transformed into E. coli BL21(DE3) pLysS competent cells. The BL21(DE3) pLysS competent cells were grown in three 500 mL LB culture containing kanamycin until the mid-log phase of growth (OD600 0.4-0.6). During the mid- log phase, 2 mM IPTG were added to each of cultures to induce protein synthesis. Induction was carried out for 5 hours before the cultures were centrifuged. The concentration of IPTG can be reduced to 0.5 mM for a 4-hour induction period to avoid stressing the E. coli cells.33 The pellets were sonicated to release the cellular components, and the lysates were purified by affinity chromatography using a cobalt resin column (TALON, Takara Bio) following the manufacturer’s protocol. We used cobalt resin column to bind to the hexa-histidine tag fused to the expressed protein, as mentioned in section 1.2.4.2. As the lysate traveled through the column, the hexa- histidine tag chelated to the cobalt on the resin. This chelation immobilized the protein as the rest of the lysate eluted.34 The immobilized protein was then eluted using high concentration of imidazole. The eluted samples were reduced by 10 mM DTT for 1 hour at 37 °C before placing onto the Amicon® Ultra-15 Centrifugal Filter. The protein solutions were then concentrated by centrifugation until the volume reached below 5 mL. The concentrated protein solutions were further purified by reverse-phase HPLC (Beckman System Gold) with a semi-preparative reversed-phase C18 column (Vydac) using a water/acetonitrile/0.6% trifluoroacetic acid gradient that was previously described by the Shin group.30

Fractions from the HPLC were taken based on UV/Vis detection at 280 nm (Beckman

System Gold). The presence of target proteins was confirmed by electrospray ionization mass

19

spectrometry (ESI-MS) (Waters Micromass ZQ, Model MM1). The concentration of each protein was determined by UV/Vis spectrometry (Nanodrop 2000 Spectrophotometer, Thermo

Fisher). The protein solutions were aliquoted in 200, 500, and 700 μL. The aliquots were then snap-frozen in liquid nitrogen for 1 minute and lyophilized overnight (Modulyod Freeze Dryer,

Thermo Fisher). The lyophilized proteins were placed in -80 °C for long-term storage.

Due to technical difficulties, purification using reversed-phased HPLC was not done for all the protein variants. Instead, fast protein liquid chromatography (FPLC) was conducted using a size exclusion column, Sephadex G-75 (ÄKTA FPLC, Amersham Biosciences). Water was used as the mobile phase to elute the proteins. Eluted fractions from the previously mentioned

TALON column were concentrated by centrifugation to a total volume of 1 mL instead to accommodate the sample loop size of the FPLC. Detection of protein elution was done by UV detection at 280 nm (ÄKTA UPC-900). The use of water as the only mobile phase was not appropriate, as it caused insufficient elution of the proteins from the column and resulted in proteins being detected in multiple fractions. To avoid this issue, the elution buffer used in the metal affinity chromatography mentioned earlier in this section can be used here for more effective elution.

1.2.7 Circular dichroism (CD)

The lyophilized protein was resuspended in CD buffer (15 mM Na2HPO4, 5 mM

KH2PO4, 50 mM NaCl, pH 7.4) to 20 μM and incubated for 1 hour at 37 °C. The 20 μM protein stock was further diluted to 2 μM final concentration with or without DNA duplex to 2 mL with

CD buffer. A temperature-leap (T-leap) tactic was applied where the final protein sample was incubated sequentially at 4 °C overnight, 37 °C for 30 minutes, and at room temperature for 1 hour prior to CD analysis.35 The purpose of this T-leap tactic is discussed in section 1.3.8. CD

20

was performed on an Aviv 215 spectrometer with a Suprasil, 10 mm path-length cell (Hellma) at

22 °C. CD signal was obtained from 190 to 300 nm at 0.2 nm increments with a sampling time of

0.2 s. Each protein sample was measured in duplicate scans. The blank buffer spectrum was subtracted from the sample spectrum. CD data was not smoothed.

1.2.8 Electrophoretic mobility shift assay (EMSA)

Forward and reverse oligonucleotides for the fluorescent recognition sequences were designed manually and ordered from Eurofins Genomics (Table 1 in Appendix 2). The forward oligonucleotides were synthesized with 6-carboxyfluorescein (6-FAM) attached to the 5’ ends.

The fluorescent forward oligonucleotides were annealed to 1.5 times molar excess of the corresponding unlabeled reverse oligonucleotides in 10 mM Tris-HCl, pH 7. The DNA solutions were placed at 95 °C for 10 minutes and slow cooled to room temperature over 3 hours. The

EMSAs were performed by incubating protein variants of desired concentrations with the EMSA master mix. Two master mix solutions were used to investigate which was the more suitable buffer condition: the master mix contained 2 nM 6-FAM-labeled DNA duplex in either 1) Tris

EMSA buffer (20 mM Tris, pH 8, 0.15 mM EDTA, 0.5M NaCl, 10 mM CHAPS, 2 mM DTT)36 or 2) HEPES EMSA buffer (20 mM HEPES, pH 7.5, 50 mM NaCl, 10 mM CHAPS, 1 mM

11 DTT, 5 mM MgCl2, 10% v/v glycerol). 100 μg/mL BSA and 2 μg/mL poly dI-dC were also added to the master mixes.37 The proteins were reconstituted in HEPES EMSA buffer and incubated at 37 °C for 1 hour before quantification using UV/Vis spectrophotometry as described previously (Nanodrop 2000 Spectrophotometer, Thermo Fisher). The EMSA binding reactions were established by mixing 6 μL protein with 24 μL EMSA master mix for a final volume of 30

μL. The binding reactions were sequentially incubated at 4 °C overnight, at 37 °C for 30 minutes, and finally at room temperature for 1 hour.35 Prior to electrophoresis, 4 μL 30% Ficoll was added to the reactions. Electrophoresis was done on a pre-equilibrated native PAGE gel (8%

21

polyacrylamide, 0.5% TBE; Appendix 1). The gel was run at 200V for 5 minutes followed by

100V for 15 minutes before visualization using GelDoc (BioRad ChemiDoc MP Imaging

System).

1.3 Results and Discussion

1.3.1 Assessing DNA-binding affinity of HinZip and variants in vivo via B1H

Since we constructed HinZip with dimeric, cooperative binding in mind, we designed the

DNA full-sites to contain two identical 12 bp hix half-sites with the left half-site positioned in the normal direction and the right half-site reversed (Figure 8a). This was done so that the LZs would be situated in proximity to each other for efficient dimerization (Figure 4a). We termed the full-site invHixC (inverse consensus Hix half-sites). We also tested different spacer lengths between the two half-sites to further optimize HinZip’s dimeric binding. The tested spacer variants ranged from 0 to 9 bp of GA repeats to cover almost one full turn of B-form DNA, as one turn comprises 10 bp (Figure 8b). The B1H showed that HinZip promoted the most cell growth when binding to the 0 and 2sp invHixC at 10 mM and 20 mM 3-AT, while its binding to the 7 and 9sp invHixC showed the least amount of growth (Figure 8c). As described above in section 1.1.3, the B1H reporter assay gives cell growth as the output signal correlating with protein binding to the target sequence and transcriptionally activating His3 reporter . Therefore, the growth observed in Figure 5c indicated that HinZip’s binding to the 0 and 2sp invHixc DNA was the strongest, and its binding to the 7 and 9sp invHixC DNA was the weakest. The positive and negative control gave expected results, where zinc finger TF Zif268 showed strong binding to its specific DNA target and weak binding to the -11 GC E-box

(Table 2 in Appendix 2).30 To determine HinZip’s binding specificity, an NS control was established using HinZip and the -11 GC E-box. This -11 GC E-box was one of the DNA targets for the upstream stimulatory factor 1 (USF1), a basic helix-loop-helix/zipper (bHLHZ) TF

22

investigated in a previous study from the Shin group.38 The -11 GC E-box contains a 6 bp palindromic full-site, CACGTG (Table 2 in Appendix 2), which is significantly shorter than the invHixC full-sites (24-33+ bp).10,38 The half-sites of the -11 GC E-box also do not contain a poly

A/T tract, which proved to be essential for Hin’s specific DNA-binding by the N-terminal disordered arm.10 Based on these two reasonings, we hypothesized that the -11 GC E-box could not provide any specific DNA-binding advantage for HinZip. The lack of cell growth in the NS control lane concurred with our hypothesis and indicated that HinZip’s binding to the invHixC target DNA was specific (Figure 8c).

We decided to continue the in vivo investigation with the 0sp and 9sp invHixC. We chose the 0sp because it produced the strongest binding signal and therefore would be the most sensitive to changes made in HinZip’s structure that impacted its binding affinity. The 9sp was chosen, because it separated the two half-sites the farthest, so it represented possible monomeric binding by HinZip to the invHixC with potentially little-to-zero cooperative interaction between the LZs.

23

a)

b)

0sp invHixC

2sp invHixC

HinZip 5sp invHixC 7sp invHixC

9sp invHixC

NS

Positive ctrl

Negative ctrl c)

Figure 8. Investigating optimal spacer lengths from the 0sp to 9sp invHixC for HinZip binding. a) Schematic representation of the invHixC target full-site in the B1H. As shown, each half-site of invHixC was positioned facing each other. Different spacers (0-9 bp) consisting of GA repeats were placed between the two half-sites. b) Sequences of the 0 to 9sp invHixC full-sites. The target half-sites are bolded, and the GA spacers are highlighted yellow. c) B1H of HinZip binding to 0 to 9sp invHixC. Shown are the 10 mM (left) and 20 mM (right) 3-AT plates. USO cells were transformed with the HinZip construct and the reporter constructs containing the 0 to 9sp invhixC on the pB1H2w2 and pH3U3 vectors, respectively. The cells -1 -6 were plated as 10-fold serial dilutions (10 to 10 OD600 from left to right). HinZip showed the -6 strongest binding to the 0 and 2sp invhixC yielding the most growth at 10 OD600. The positive control (+) is zinc finger TF Zif268 and its cognate DNA, whereas the negative control (-) is Zif268 with the -11 GC E-box.19,30 The NS control is HinZip with the -11 GC E-box (Table 2 in

24

Appendix 2). Note that the negative control (-) lane is at the bottom of the plate and only plated -2 -5 from 10 to 10 OD600.

After the two candidates of invHixC were determined, we carried out B1H assays with

HinZip and two variants, HinZip/LA and the Hin 55-mer. We designed these two variants to minimize cooperative interaction between the two protein monomers. In the FosW zipper motif of HinZip/LA, we mutated each leucine residue to alanine to minimize the coiled-coil formation between the LZs,30 whereas in the Hin 55-mer, the zipper motif was completely excised from the protein (Figure 3c in Appendix 2).

0sp HinZip 9sp

0sp HinZip/LA 9sp

0sp Hin 55-mer 9sp

+

-

Figure 9. HinZip, HinZip/LA, and Hin 55-mer binding to the 0sp and 9sp invHixC. Transcriptional activation was visualized on the 2.5 mM 3-AT plate. USO cells were transformed with HinZip or a HinZip variant with the 0sp or 9sp invHixC on the respective plasmid vector. Dilution and controls were as described in Figure 8. HinZip and HinZip/LA showed stronger transcriptional activation from the 0sp invHixC indicating stronger binding to the DNA target. They also demonstrated clearer differentiation between the different spacers than the zipper-less Hin 55-mer. The lack of differentiation between the 0sp and 9sp invHixC resulted in similar growth from Hin 55-mer binding to the two DNA targets. Note that the -2 -5 negative control (-) lane is at the bottom of the plate and only plated from 10 to 10 OD600.

As expected, the Hin 55-mer showed weaker binding to the 0sp invHixC compared to

HinZip. We observed significantly reduced binding signal for HinZip and Hin 55-mer at 10-4 and

-3 10 OD600, respectively, which we considered a significant difference in growth. Another observation we made from the two lanes of Hin 55-mer was that there was little difference

25

between Hin 55-mer binding to the 0sp and the 9sp invHixC (Figure 9). This concurred with our expectation that binding by the zipper-less Hin 55-mer was unbiased to the different spacer lengths, as only monomeric binding could occur. On the other hand, we were surprised by the lack of difference between HinZip and HinZip/LA binding to the two invHixC target sites. We expected the Leu-to-Ala mutations in the FosW LZs to disrupt dimerization of the HinZip/LA monomers and reduce E. coli cell growth from HinZip/LA’s binding to DNA. This impact of the

Leu-to-Ala mutations in FosW has been seen before in a previous study of MEF from the Shin group, where approximately two orders of magnitude of growth difference was observed in the

B1H between MEF and MEF/LA.30 A theory that could explain this deviation was that the linker between the DBD and the LZ was too short. This could affect the cooperative interaction between the two LZs, and thereby minimize the effect of the Leu-to-Ala mutations in the zippers.

1.3.2 Linker extension between the LZ and the DBD

To confirm our suspicion of the linker being too short, we extended it by six amino acid residues: GGASGS and ACTAGS (Figure 3 in Appendix 2). As mentioned in the Materials and

Methods, the ACTAGS extension was an unintended result of the reversed insertion of the extension duplex, but we included this extension in subsequent B1H assays, because it contained a cysteine which could form disulfide bonds with one another to covalently dimerize the HinZip monomers. Thus, we compared binding to the 0sp and 9sp invHixC by both the GGASGS and

ACTAGS variants to the original HinZip and HinZip/LA. The expectation was that by extending the disordered linkage between the DBD and the LZ, we provided more freedom of orientation for the LZs to interact more effectively. This would then further stabilize the dimeric binding of

HinZip to the 0sp invHixC. We expected this stabilization to be kept to a minimum when binding to the 9sp invHixC, since the two half-sites were positioned poorly for LZ interactions. Also, we expected to observe this binding advantage exclusively on the extended variants of HinZip, as

26

the extended HinZip/LA should benefit very little from the enhanced interaction of the zipper domains due to the Leu-to-Ala mutations. Furthermore, we anticipated that the ACTAGS variants of HinZip and HinZip/LA could covalently dimerize via a disulfide linkage. This added dimerization by disulfide bonding should be independent from FosW’s non-covalent dimerization, and thus we expected to observe better binding by the ACTAGS variants of both

HinZip and HinZip/LA regardless of the Leu-to-Ala mutations. The potentially enhanced binding by ACTAGS’s covalent dimerization could also help us determine whether the GGASGS variants of the proteins were dimerizing non-covalently in comparison.

HinZip GGASGS HinZip

0sp HinZip/LA

GGASGS HinZip GGASGS HinZip 9sp HinZip GGASGS ACTAGS HinZip/LA GGASGS 0sp HinZip/LA HinZip X 0sp ACTAGS HinZip/LA HinZip/LA GGASGS ACTAGS HinZip/LA HinZip v ACTAGS ACTAGS 9sp HinZip/LA + ACTAGS a) b) Figure 10. The extended and original constructs of HinZip and HinZip/LA binding to the 0sp and 9sp invHixC. a) Extended HinZip and HinZip/LA binding to the 0 and 9sp invHixC. The extensions did not differentiate HinZip’s binding from HinZip/LA’s, as both yielded identical growth to the 0sp and 9sp invHixC. The ACTAGS extension appeared to provide better binding to 0sp shown in the 10-6 cells (last two columns in a and b). b) Extended HinZip and HinZip/LA compared with their original constructs binding to the 0sp invHixC. This B1H fully demonstrates the lack of impact of the linker extension on each variant’s binding to the cognate DNA, as no difference in growth was observed across all the rows. This debunks the observation we made from a where the ACTAGS variants showed a slight binding advantage. Note that the 10-6 cell of HinZip/LA-0sp invHixC was the result of pipetting error. Dilution and controls are as described in Figure 8.

The results from the B1H assays were surprising. First, we did not meet the expectation of differentiating HinZip’s and HinZip/LA’s binding to the 0sp and 9sp invHixC with the linker extensions (Figure 10a). This indicated the possibility that the cysteines in the ACTAGS

27

extension may not be covalently linked. This could be caused by the reducing cytoplasmic environment in the host E. coli USO cells. As most disulfide bond formations in E. coli proteins are catalyzed in the oxidative environment of the periplasm,40 the cysteine-containing constructs of HinZip and HinZip/LA may have found covalent dimerization difficult to achieve in the cytoplasm, where DNA binding occurred. Secondly, the linker extensions did not enhance

HinZip’s and HinZip/LA’s binding to the 0sp invHixC. As Figure 10b shows, all constructs showed almost identical transcriptional activation of the reporter system. As B1H was a reporter system established in E. coli cells, we could not confirm that all proteins were expressed to the same concentrations. Therefore, we can not quantitate DNA-binding of the protein constructs to the 0sp invHixC target using the observed cell growth. Nevertheless, the result in Figure 10b indicated that the added freedom to the LZs did not impact the stability of the HinZip dimer significantly upon binding to the DNA. Our rationale for this lack of improvement was that the

FosW LZs were already interacting efficiently in the original HinZip, and binding by the extended constructs could be entropically disfavoured, as the zipper interactions were positioned farther from the DBD. Based on this hypothesis, we adjusted the protein construct accordingly.

1.3.3 Linker deletion between the LZ and the DBD

Since our goal for HinZip was to imitate the structure of classes I and II HD-Zip TFs, we explored whether the initial design of the franken-protein could be improved. Our original design fused the LZ to the C-terminal disordered arm of the Hin DBD. This C-terminal disordered arm migrates into the DNA minor groove via electrostatic interactions.10 Unlike the N-terminal disordered arm, this C-terminal disordered arm does not make residue-specific hydrogen bonds to the bases in the minor groove, and instead points the side chains away from the minor groove.

This orientation of the C-terminal disordered arm results in a series of hydrogen bonds formed non-specifically between the minor groove DNA bases and the carbonyl groups of the C-

28

terminal polypeptide backbone. As these hydrogen bonds are not residue-specific, the C-terminal disordered arm only contributes to Hin’s non-specific DNA-binding activity,10 and previous studies have shown that shortening the C-terminus of Hin’s DBD only reduced recombinase activity of the native Hin recombinase, but did not abolish binding, as observed from deletion of the N-terminal warhead.6 Based on these findings, we decided to make changes to this disordered

C-terminus in an effort to differentiate binding by HinZip and its LA mutant.

HinZip design was also made more challenging by the complete lack of 3D structures of

HD-zip TFs. Protein data bank (PDB) entries of classes I and II HD-Zips generally show a 9-12 amino-acid linker between the HD and the LZ; however, no structural information is available for this linker region (UniProtKB primary accession number: P92953). To approximate the HD-

Zip structure, we used a structure prediction software PSIRED to generate a possible secondary structure using a HD-Zip protein sequence.41 The protein used in PSIRED was ATHB4, a class II

HD-Zip TF that regulates shade avoidance and hormone responses in A. thaliana.42 In the prediction result (Figure 11a), ATHB4 showed an elongated α-helix that continued from the recognition helix to the LZ at the C-terminus. Unlike the original HinZip, PSIRED did not show a disordered C-terminal linker that connects ATHB4’s DBD to its LZ. We therefore reconsidered the original linker design. We deleted the entire C-terminal arm to create a more rigid HinZip, which we termed tightHinZip (Figure 11b; Figure 3c in Appendix 2).

29

a)

N-terminus

C-terminus

b)

Figure 11. Structures of HinZip and tightHinZip. a) Secondary structure prediction showed continuous α-helicity at the C-terminus of ATHB4. The 3-state prediction (Pred) shows a continuous helix comprising helix 3 and the LZ. The confidence (Conf) of this prediction is high as indicated in the first row of the result (Confidence was checked by submitting the prediction to a critical assessment server CASP3). Generated by PSIRED V4.0.41 b) Comparison of linkers and LZ orientations in HinZip (pink) and tightHinZip (tan). Overlay of the two constructs shows that by removing the disordered C-terminal linkage between helix 3 and the LZ, tightHinZip could no longer extend the zipper around the DNA to interact with the other zipper monomer like HinZip. This added restriction to the LZs should localize tightHinZip’s dimerization to the same face of the DNA. We expected tightHinZip to favour binding to the 0sp target, as the 0sp target positioned the two half-sites the closest. Visualized with Chimera 1.13.1.

As shown in Figure 11b, we expected FosW in tightHinZip to lose the freedom of being able to extend around the DNA and dimerize from different sides (Figure 4a). Instead, this

30

shorter linker should restrict dimerization of the LZs to the same face of DNA, which should encourage binding preference of tightHinZip toward the 0sp target site, as mentioned in Figure

11b. This potential binding preference would concur with our goal to mimic HD-Zip’s DNA- binding, as the target DNA for class I and II HD-Zip TFs, CAATNATTG, separates the half-sites by a 1 bp spacer N, where N can be any one of the four DNA bases (Figure 11b; Table 2 in

Appendix 2; Figure 3c in Appendix 2).

1.3.4 Assessing autoactivation of the B1H reporter system

Before we tested the new tightHinZip, we needed to ensure that the signals in our B1H assays contained little-to-zero background noise. Between the protein-encoding pB1H2w2 plasmid and the target site-containing pH3U3 plasmid in our B1H, we expected background noise to arise from autoactivation of the pH3U3 plasmid. Autoactivation in the B1H likely came from non-specific transcriptional activation of the His3 reporter gene by an endogenous TF in the host’s proteome. This can cause inaccurate interpretation of the B1H data, as E. coli’s survival would no longer depend on HinZip’s specific binding, as we intended.19,43

We first assessed the original invHixC reporter system. To differentiate this system from any change we made during this investigation, we termed the system “-12 invHixC” using the number of nucleotides between the cognate site and the weak lac promoter (Figure 13b). Upon comparing the growth of USO cells containing both the prey and bait plasmids and USO cells containing only the prey plasmid of 0sp invHixC, we found that the -12 reporter system showed significant background noise from autoactivation (Figure 12).

31

HinZip

HinZip/LA

-12 0sp Hin 55-mer

USF1(-)

No protein

+

Figure 12. B1H testing autoactivation of the -12 0sp invHixC. Shown are the 2.5 mM (left) and 10 mM (right) 3-AT plates. USO cells were transformed with either the -12 0sp invHixC alone (No protein) or the -12 0sp invHixC with the USF1 as a negative control (USF1(-)); USF1 binds specifically to the E-box (CACGTG).38 These two populations of transformants were compared to cells containing the -12 0sp invHixC and the HinZip variants. The -12 reporter system showed significant autoactivation resulting in growth almost identical with the proposed specific binding (top three rows). As inhibition increased on the 10 mM 3-AT plate, HinZip and its variants showed significant growth, whereas no growth was observed from USF1(-) and No protein rows. This indicated that while the -12 invHixC reporter system showed autoactivation at low 3-AT concentration, the autoactivation can not completely negate the finding shown in Figure 8, where HinZip showed binding preference to the 0 and 2sp invHixC targets.

Although the -12 invHixC reporter system showed significant autoactivation (Figure 12), the observations we made in the previous B1H assays, namely in Figure 8 and 9, should still be valid. First, at 10 mM 3-AT, autoactivation of the -12 invHixC reporter system was eliminated, while HinZip maintained significant transcriptional activation (Figure 12). This indicated that a significant portion of the signal that we observed at high concentrations of 3-AT (10 and 20 mM) in Figure 8 stemmed from HinZip’s specific DNA-binding activity. Second, on the same B1H plate containing 10 mM 3-AT (Figure 12), we observed a significant growth difference between

HinZip and the Hin 55-mer. We also observed almost identical growth between HinZip and

HinZip/LA. Both observations concurred with the previous B1H assay where the zipper- containing HinZip and HinZip/LA showed identical growth between each other and significantly

32

better growth than the zipper-less Hin 55-mer (Figure 9). Evidently, Figure 12 showed a larger difference between the zipper-containing and the zipper-less HinZip constructs, but, as mentioned in section 1.3.2, the B1H assays were done in cells. Therefore, we could not confirm that all of our protein constructs were expressed to the same concentrations, and thus, variation in this cell assay could exist.

Autoactivation of the -12 invHixC reporter system may have caused HinZip’s and

HinZip/LA’s binding to the -12 invHixC reporter system to appear identical (Figure 9; Figure

12). It is possible that the Leu-to-Ala mutations in HinZip/LA’s FosW did not impact

HinZip/LA’s DNA-binding as profoundly as they did in MEF/LA,30 and thus, the little difference between HinZip’s and HinZip/LA’s DNA-binding could have been shadowed by autoactivation of the -12 invHixC reporter system. This speculation will have to be confirmed in future B1H experiments.

To correct for the autoactivation and increase the signal-to-noise ratio of our B1H reporter system, we tested various lengths of spacers between the HixC half-sites and the weak lac promoter. In total, four different spacers were examined. We found the -16 reporter system to be the least autoactivating, and therefore, provided the lowest background noise for the B1H assays (Figure 13a). This reporter system was used in the subsequent in vivo studies.

33

-8 0sp invHixC

-10 0sp invHixC b) invHixC lac promoter

-14 0sp invHixC

-16 0sp invHixC

+

a) -

Figure 13. B1H assay for autoactivation. a) The -8 to -16 reporter systems. Shown is the 1 mM 3-AT plate. USO cells were transformed with only the reporter system on pH3U3 and plated on kanamycin-only 3-AT plates. The -16 reporter showed the least growth from autoactivation. b) Sequences of the spacer variants. Nucleotides were deleted from the -12 reporter system to generate the -8 and -10 reporters (represented by dashes). GC repeats were added to the -12 reporter system to generate the -14 and -16 reporters (highlighted red).

1.3.5 Further B1H analysis using the -16 invHixC reporter system

After we determined the least autoactivating reporter plasmid construct, we performed

B1H assays with HinZip and tightHinZip binding to the -16 invHixC reporter system to investigate the impact of truncating the C-terminal linker on the proteins’ DNA-binding activities. We also explored the 0-9 spacer lengths between the hixC half-sites for the -16 reporter system to determine if the best binder was still the 0sp target. To our surprise, the assay showed no growth from HinZip’s binding and very little growth from tightHinZip’s binding to the -16 invHixC reporter system (Figure 4 in Appendix 2). This result may be explained by the high affinity of the N-terminal arm towards the minor groove of the cognate DNA. Since the omega subunit of the RNA polymerase was fused to N-terminal arm, and the head of the arm was completely embedded in the minor groove, the omega subunit may have been too restricted to reach the weak lac promoter to initiate transcription of the downstream His3 gene. Using this

34

reasoning, we decided to disregard the little growth from the tightHinZip B1H for the time being until further investigation is done.

1.3.6 Examining secondary structure of HinZip variants for proper protein folding

After we successfully purified and lyophilized HinZip from the E. coli BL21(DE3) host cells, the protein was reconstituted in phosphate buffer for CD analysis. We aimed to examine the secondary structure of HinZip in both the absence and the presence of DNA. The important information to gain from this was the -helicity of the protein. Our interest in determining the - helicity of HinZip stemmed from previous studies in the Shin group, where members studied the binding of E-box sequences by bHLHZ TFs.30 A general trend observed in E-box binding was that more helical TFs dimerized and bound better to the E-box sequences. Given that HinZip was designed to dimerize via the formation of a coiled coil with two α-helical LZs, and that it contained a HTH in its DBD, we proposed that a reasonable amount of -helical secondary structure would aid in its DNA-binding affinity and specificity.

1000

0

) 190 200 210 220 230 240 250 260

1 -1000 - -2000

• • dmol -3000

2 HinZip + NS DNA -4000 -5000 HinZip + no DNA

(deg • cm • (deg -6000

Mean Residue Ellipticity Ellipticity Residue Mean -7000

-8000 -9000 Wavelength (nm) Wavelength (nm)

(deg • cm2 • dmol-1) Figure 14. Representative CD spectra of HinZip. The spectra were generated for HinZip in the absence of DNA (orange) and incubated with NS DNA (blue). CD samples comprised the following: 2 μM HinZip monomer (1μM HinZip dimer) plus no DNA (orange); 2 μM -11 GC E- box DNA duplex (blue), which is the NS DNA sequence (Table 2 in Appendix 2). HinZip was resuspended in CD buffer (15 mM Na2HPO4, 5 mM KH2PO4, 50 mM NaCl, pH 7.4) to 20 μM and incubated for 1 hour at 37 °C. Each set of CD data contained duplicate scans from 190 to 300 nm obtained at 22 °C (Data from 190 to 260 nm is shown here). Duplicate scans were 35

averaged and not smoothed. The blank buffer control was subtracted manually from the averaged data. While the orange curve showed the two distinct local minima at 222 nm and 208 nm characteristic of α-helices, the minima were lost when NS DNA was added to HinZip.

As shown in Figure 14, HinZip, in the absence of DNA, showed local minima at 222 nm and 208 nm typically seen in -helices. In the presence of NS DNA (Table 2 in Appendix 2),

HinZip appeared to show increased helicity demonstrated by the deeper minimum in the 208-222 nm region; however, the HinZip-NS CD spectrum lacked the distinct double minima at 222 nm and 208 nm (Figure 14). The disappearance of the double minima was unexpected, as DNA- binding proteins generally adopt more helical folds when interacting with DNA.44 As the deeper minimum in the 208-222 nm region indicated increased helicity, we concluded that the loss of the local minima did not accurately reflect HinZip’s secondary structure in the presence of NS

DNA. The cause of this accuracy was the improper blank used in the CD experiment (Figure 14).

An appropriate CD blank solution should contain all the components of the sample except the protein. The blank solution used for the HinZip-only CD experiment consisted of only the phosphate buffer. This blank solution was appropriate for the HinZip-only CD experiment.

The blank solution did not contain the NS DNA duplex. Thus, the final spectrum still contained the CD signal from the DNA duplex after blank subtraction, which may explain the NS CD spectrum not showing distinct double minima at 208 and 222 nm but showing general possibility of α-helical secondary structure. Since the CD instrument has become unreliable, we could not investigate further. CD analysis will need to be repeated.

1.3.7 Assessing DNA binding capability in vitro via EMSA

To assess HinZip’s binding to the 0sp invHixC, we incubated 10 nM to 2000 nM HinZip with 2 nM 6-FAM labelled oligonucleotide overnight before loading the mixtures onto a polyacrylamide gel for electrophoresis (Table 2 in Appendix 2). Our goal was to observe the

36

retardation of DNA mobility by HinZip’s binding to the target sequence.22 As HinZip’s concentration increased in the EMSA mixtures, we expected to see a shift of DNA fluorescence from the free state (high mobility) to the protein-bound state (low mobility). However, this was not what our EMSA gel showed. As Figure 15a shows, we saw no retardation of DNA mobility between 10 nM and 200 nM HinZip. At 1000 nM and 2000 nM, all fluorescent bands disappeared from the polyacrylamide gel (Figure 15a). Considering that aggregation of native

Hin recombinase has been mentioned in previous literature, we deduced that at high concentrations, aggregation of HinZip would occur despite chemical detergents (CHAPS or urea) in the binding reactions.11 Protein aggregation could entrap the negatively charged DNA in the wells. This may explain the disappearance of fluorescent DNA at high HinZip concentrations, but it did not explain the lack of bound DNA at lower concentrations. Native Hin recombinase and Hin DBD bound to cognate DNA with high affinity (Kd = 3.8 ± 0.8 and 34 ± 9 nM, respectively).11 Therefore, we expected to observe bound DNA at 10 nM HinZip, as it was designed to dimerize like the native Hin recombinase. However, Figure 15a showed otherwise.

Our initial reasoning was that the basic nature of HinZip (pI 9.73) rendered the protein highly positively charged in the TBE buffer (pH 8), which could prevent the protein-DNA complex from migrating into the gel towards the cathode. To test our theory, we loaded the same samples onto a 0.5% agarose gel with the wells positioned in the middle. This allowed us to potentially capture the unbound and protein-bound DNA migrating to the cathode and the anode, respectively. However, the agarose gel showed the same result as the polyacrylamide gel (Figure

15b). This indicated that the basicity of HinZip was not the issue. We then considered the impact of the T-leap tactic on HinZip’s stability. HinZip was incubated with target DNA at 4 °C overnight, 37 °C for 30 minutes, followed by room temperature for 1 hour. This was done to promote better folding of HinZip, which would enhance HinZip’s DNA-binding activity. The

37

same T-leap tactic was applied in previous studies for the same purpose.30,35 However, it was possible that the incubation times, temperatures, and buffer composition for HinZip needed to be determined empirically through trials to ensure proper protein folding. The loss of properly folded secondary structure would also extend to the loss of DNA-binding capability. Misfolded

HinZip would also be highly prone to aggregation, which could explain the phenomenon we observed in the EMSA on both gels. Possibly other T-leap programs need to be tested to see whether HinZip-bound DNA can be observed.

0 10 100 200 1000 2000 nM

a) b)

Figure 15. EMSA of HinZip binding to 0sp invHixC. a) 6% polyacrylamide gel. Lanes containing 10-200 nM HinZip monomers showed only unbound DNA bands at the bottom of the gel, indicating that no protein-DNA binding occurred. Increasing concentration of HinZip appeared to cause DNA to aggregate and remain in the well, which eventually caused no DNA to be observed in the gel at high HinZip concentrations (1000 nM and 2000 nM) b) 0.5% agarose gel. The same binding reactions from a) were run on the agarose gel to capture the protein-bound DNA travelling in the opposite direction. The agarose gel yielded identical results as the polyacrylamide gel. Note that the lane next to the blank lane (0 nM HinZip) was not used on neither gel. Brightness observed in the blank and unused lane in the agarose gel may be light leaking through the cracks on the glass screen.

1.3.8 Summary

From the B1H assays we discussed in sections 1.3.1 to 1.3.5, we observed DNA-binding preference of HinZip and HinZip/LA to the -12 0sp invHixC target DNA, where the two hixC

38

half-sites were positioned closest to each other (Figure 8, 9). This binding preference was not observed in the zipper-less Hin 55-mer’s DNA-binding, which resulted in nearly identical cell growth between the Hin 55-mer binding to the -12 0 and 9sp invHixC targets (Figure 9). The -12 invHixC reporter system showed significant autoactivation, which was observed at low 3-AT concentration (2.5 mM) and was eliminated by increasing the 3-AT concentration (10 mM), as observed in Figure 12. Overall, the zipper-containing protein constructs showed higher transcriptional activation than the zipper-less Hin 55-mer (Figure 9, 12); however, as the -12 invHixC reporter system showed significant autoactivation, we need to examine the reporter construct further to minimize the background noise (Figure 13).

Preliminary EMSA and CD results were inconclusive. Due to the use of improper blank in the HinZip-NS DNA CD experiment, the final spectrum showed loss of local minima at 208 nm and 222 nm in the presence of NS DNA (Figure 14), which may not reflect HinZip’s secondary structure accurately. EMSAs using the polyacrylamide and the agarose gels did not show the retardation of DNA mobility expected in the bound state. Instead, fluorescence was detected in the wells (Figure 15), which indicated potential aggregation of misfolded HinZip.

1.3.9 Future directions

The experimental methods employed by this project still need adjustments; namely, the

CD experiment in vitro and the B1H reporter plasmid construction in vivo need to be re- examined. We will ensure that the proper blank solution is used for the CD experiment. The conditions of our EMSA will be assessed as well to resolve HinZip’s possible aggregation issue.

The in vitro analyses will need to be repeated to a) examine proper folding of HinZip’s secondary structure, and b) examine HinZip’s binding to invHixC. It is also not guaranteed that protein aggregation was the only issue with HinZip’s EMSA, and the basicity of HinZip may still

39

pose problems for the migration of the protein-DNA complex. As mentioned previously,

HinZip’s alkaline pI (pI 9.73) renders the protein highly positively charged in TBE (pH 8). This positive nature aids HinZip in binding to the negatively charged DNA but prevents the protein-

DNA complex from migrating into the polyacrylamide gel. In that case, we may employ agarose gel electrophoresis as described above or change the pH of the running buffer to lessen the repulsion on the complex by the cathode.

As for improving the B1H, we should first try the -8 invHixC reporter system on the pH3U3 reporter plasmid. As mentioned, our current rationale for HinZip’s lack of transcriptional activation of the -16 reporter system (Figure 4 in Appendix 2) is that the N-terminal arm was embedded deeply in the minor groove; thus, the RNAP fused to the N-terminal disordered arm could not reach the weak lac promoter when the promoter was positioned 16 nucleotides away.

From the autoactivation B1H, we could see that the -8 0sp invHixC, though visibly more autoactivating than the -16, was still markedly better in terms of signal-to-noise ratio compared to the other variants (Figure 13). This is worth exploring as the -8 invHixC positions the target sites much closer to the promoter.

We will need to examine more deeply the dimerization of HinZip. Native polyacrylamide gel electrophoresis (Native PAGE) can be used to visualize HinZip’s dimerization in vitro.

Native PAGE does not include SDS as a denaturant like SDS-PAGE does, and thus, HinZip’s native form would be retained in native PAGE.45 Given that proper folding of HinZip’s secondary structure is ensured by employing a proper T-leap program, we expect to observe retardation of protein mobility in native PAGE, as HinZip monomers dimerize. From there, we can explore different concentrations of HinZip using native PAGE to find HinZip’s dimerization constant. We can introduce target and NS DNA to the protein sample to determine the impact of

40

DNA on HinZip’s dimerization constant as well. Thermal shift assay (TSA) is another method we can employ to investigate the stability of the HinZip dimer by determining the dimer’s denaturation temperature.46 TSA is also useful in demonstrating ligand-induced stabilization of the HinZip dimer by DNA. By examining the denaturation temperature of the HinZip dimer in the absence and in the presence of DNA, we will be able to determine whether HinZip’s dimerization requires DNA-binding to occur first. The native PAGE and TSA methods should also be used to study the variants of HinZip discussed in this thesis to examine the impact of the structural changes on the stability of the dimer.45,46

As a broader outlook for this project, the development of HinZip and its DNA circuitries can be geared towards synthetic biology applications. We have seen examples of synthetic biological systems benefiting the environmental and pharmaceutical industries, but cases of synthetic circuits relying on a designed franken-protein that is orthogonal to the host’s native proteome are still rarely seen.26 The bio-orthogonality in HinZip’s design may be a turning point of the future synthetic biology field.

41

1.4 References

(1) Agalou, A.; Purwantomo, S.; Övernäs, E.; Johannesson, H.; Zhu, X.; Estiati, A.; De Kam, R. J.; Engström, P.; Slamet-Loedin, I. H.; Zhu, Z.; Wang, M.; Xiong, L.; Meijer, A. H.; Ouwerkerk, P. B. F. A Genome-Wide Survey of HD-Zip Genes in Rice and Analysis of Drought-Responsive Family Members. Plant Mol. Biol. 2008, 66 (1–2), 87–103. https://doi.org/10.1007/s11103-007-9255-7.

(2) Elhiti, M.; Stasolla, C. Structure and Function of Homodomain-Leucine Zipper (HD-Zip) Proteins. Plant Signal. Behav. 2009, 4 (2), 86–88. https://doi.org/10.4161/psb.4.2.7692.

(3) Ariel, F. D.; Manavella, P. A.; Dezar, C. A.; Chan, R. L. The True Story of the HD-Zip Family. Trends Plant Sci. 2007, 12 (9), 419–426. https://doi.org/10.1016/j.tplants.2007.08.003.

(4) Mason, J. M.; Arndt, K. M. Coiled Coil Domains: Stability, Specificity, and Biological Implications. ChemBioChem 2004, 5 (2), 170–176. https://doi.org/10.1002/cbic.200300781.

(5) Henriksson, E.; Olsson, A. S. B.; Johannesson, H.; Johansson, H.; Hanson, J.; Engström, P.; Söderman, E. Homeodomain Leucine Zipper Class I Genes in Arabidopsis. Expression Patterns and Phylogenetic Relationships. Plant Physiol. 2005, 139 (1), 509–518. https://doi.org/10.1104/pp.105.063461.

(6) Mack, D. P.; Sluka, J. P.; Shin, J. A.; Griffin, J. H.; Simon, M. I.; Dervan, P. B. Orientation of the Putative Recognition Helix in the DNA-Binding Domain of Hin Recombinase Complexed with the Hix Site. Biochemistry 1990, 29 (28), 6561–6567. https://doi.org/10.1021/bi00480a003.

(7) Haykinson, M. J.; Johnson, L. M.; Soong, J.; Johnson, R. C. The Hin Dimer Interface Is Critical for Fis-Mediated Activation of the Catalytic Steps of Site-Specific DNA Inversion. Curr. Biol. 1996, 6 (2), 163–177. https://doi.org/10.1016/S0960- 9822(02)00449-9.

(8) Dhar, G.; Sanders, E. R.; Johnson, R. C. Architecture of the Hin Synaptic Complex during Recombination: The Recombinase Subunits Translocate with the DNA Strands. Cell 2004,

42

119 (1), 33–45. https://doi.org/10.1016/j.cell.2004.09.010.

(9) Dhar, G.; Mclean, M. M.; Heiss, J. K.; Johnson, R. C. The Hin Recombinase Assembles a Swivel That Exchanges DNA Strands. Nucleic Acids Res. 2009, 37 (14), 4743–4756. https://doi.org/10.1093/nar/gkp466.

(10) Feng, J. A.; Johnson, R. C.; Dickerson, R. E. Hin Recombinase Bound to DNA: The Origin of Specificity in Major and Minor Groove Interactions. Science (80-. ). 1994, 263 (5145), 348–355. https://doi.org/10.1126/science.8278807.

(11) Chiu, T. K.; Sohn, C.; Dickerson, R. E.; Johnson, R. C. Testing Water-Mediated DNA Recognition by the Hin Recombinase. EMBO J. 2002, 21 (4), 801–814. https://doi.org/10.1093/emboj/21.4.801.

(12) Haran, T. E.; Joachimiak, A.; Sigler, P. B. The DNA Target of the Trp Repressor. EMBO J. 1992, 11 (8), 3021–3030. https://doi.org/10.1002/j.1460-2075.1992.tb05372.x.

(13) Suda, N.; Itoh, T.; Nakato, R.; Shirakawa, D.; Bando, M.; Katou, Y.; Kataoka, K.; Shirahige, K.; Tickle, C.; Tanaka, M. Dimeric Combinations of MafB, CFos and CJun Control the Apoptosis-Survival Balance in Limb Morphogenesis. Dev. 2014, 141 (14), 2885–2894. https://doi.org/10.1242/dev.099150.

(14) Worrall, J. A. R.; Mason, J. M. Thermodynamic Analysis of Jun-Fos Coiled Coil Peptide Antagonists. FEBS J. 2011, 278 (4), 663–672. https://doi.org/10.1111/j.1742- 4658.2010.07988.x.

(15) Nikolaev, Y.; Deillon, C.; Hoffmann, S. R. K.; Bigler, L.; Friess, S.; Zenobi, R.; Pervushin, K.; Hunziker, P.; Gutte, B. The Leucine Zipper Domains of the Transcription Factors GCN4 and C-Jun Have Ribonuclease Activity. PLoS One 2010, 5 (5). https://doi.org/10.1371/journal.pone.0010765.

(16) Mason, J. M.; Schmitz, M. A.; Müller, K. M.; Arndt, K. M. Semirational Design of Jun- Fos Coiled Coils with Increased Affinity: Universal Implications for Leucine Zipper Prediction and Design. Proc. Natl. Acad. Sci. U. S. A. 2006, 103 (24), 8989–8994. https://doi.org/10.1073/pnas.0509880103. 43

(17) Chen, G.; De Jong, A. T.; Shin, J. A. Forced Homodimerization of the C-Fos Leucine Zipper in Designed BHLHZ-like Hybrid Proteins MaxbHLH-Fos and ArntbHLH-Fos. Mol. Biosyst. 2012, 8 (4), 1286–1296. https://doi.org/10.1039/c2mb05354c.

(18) Lajmi, A. R.; Lovrencic, M. E.; Wallace, T. R.; Thomlinson, R. R.; Shin, J. A. Minimalist, Alanine-Based, Helical Protein Dimers Bind to Specific DNA Sites [2]. J. Am. Chem. Soc. 2000, 122 (23), 5638–5639. https://doi.org/10.1021/ja993025a.

(19) Xiangdong Meng Michael H. Brodsky, S. A. W. A Bacterial One-Hybrid System for Determining the DNA-Binding Specificity of Transcription Factors. Nat Biotechnol 2005, 23 (8), 988–994. https://doi.org/10.1016/j.earlhumdev.2006.05.022.

(20) Mentesana, P. E.; Dosil, M.; Konopka, J. B. Functional Assays for Mammalian G-Protein- Coupled Receptors in Yeast. In Methods in Enzymology; Academic Press Inc., 2002; Vol. 344, pp 92–111. https://doi.org/10.1016/S0076-6879(02)44708-8.

(21) Joung, J. K.; Ramm, E. I.; Pabo, C. O. A Bacterial Two-Hybrid Selection System for Studying Protein – DNA and Protein – Protein Interactions. 2000, 97 (13).

(22) Heffler, M. A.; Walters, R. D.; Kugel, J. F. Using Electrophoretic Mobility Shift Assays to Measure Equilibrium Dissociation Constants: GAL4-P53 Binding DNA as a Model System. Biochem. Mol. Biol. Educ. 2012, 40 (6), 383–387. https://doi.org/10.1002/bmb.20649.

(23) Pagano, J. M.; Clingman, C. C.; Ryder, S. P. Quantitative Approaches to Monitor Protein- Nucleic Acid Interactions Using Fluorescent Probes. Rna 2011, 17 (1), 14–20. https://doi.org/10.1261/rna.2428111.

(24) El Karoui, M.; Hoyos-Flight, M.; Fletcher, L. Future Trends in Synthetic Biology—a Report. Front. Bioeng. Biotechnol. 2019, 7 (AUG), 1–8. https://doi.org/10.3389/fbioe.2019.00175.

(25) Kung, S. H.; Lund, S.; Murarka, A.; McPhee, D.; Paddon, C. J. Approaches and Recent Developments for the Commercial Production of Semi-Synthetic Artemisinin. Front. Plant Sci. 2018, 9 (January), 1–7. https://doi.org/10.3389/fpls.2018.00087. 44

(26) Hale, V.; Keasling, J. D.; Renninger, N.; Diagana, T. T. Microbially Derived Artemisinin: A Biotechnology Solution to the Global Problem of Access to Affordable Antimalarial Drugs. Am. J. Trop. Med. Hyg. 2007, 77 (SUPPL. 6), 198–202. https://doi.org/10.4269/ajtmh.2007.77.198.

(27) Dolgin, E. Synthetic Biology Speeds Vaccine Development. Nat. Milestones 2020, 1263 (November), S24.

(28) Corbett, K. S. et al. SARS-CoV-2 MRNA Vaccine Development Enabled by Prototype Pathogen Preparedness 2. Prepr. bioRxiv 2020, 21 (1), 1–9.

(29) C. T. CHUNG, SUZANNE L. NIEMELA, R. H. M. One-Step Preparation of Competent Escherichia Coli: Transformation and Storage of Bacterial Cells in the Same Solution. Proc. Natl. Acad. Sci. - PNAS 1989, 86 (7), 2172–2175. https://doi.org/10.1016/j.jmaa.2007.08.036.

(30) Inamoto, I.; Sheoran, I.; Popa, S. C.; Hussain, M.; Shin, J. A. Combining Rational Design and Continuous Evolution on Minimalist Proteins That Target the E-Box DNA Site. ACS Chem. Biol. 2021, 16 (1), 35–44. https://doi.org/10.1021/acschembio.0c00684.

(31) Christy, B.; Nathans, D. DNA Binding Site of the Growth Factor-Inducible Protein Zif268. Proc. Natl. Acad. Sci. U. S. A. 1989, 86 (22), 8737–8741. https://doi.org/10.1073/pnas.86.22.8737.

(32) Xu, J.; Chen, G.; De Jong, A. T.; Shahravan, S. H.; Shin, J. A. Max-E47, a Designed Minimalist Protein That Targets the E-Box DNA Site in Vivo and in Vitro. J. Am. Chem. Soc. 2009, 131 (22), 7839–7848. https://doi.org/10.1021/ja901306q.

(33) Mühlmann, M.; Forsten, E.; Noack, S.; Büchs, J. Optimizing Recombinant Protein Expression via Automated Induction Profiling in Microtiter Plates at Different Temperatures. Microb. Cell Fact. 2017, 16, 220. https://doi.org/10.1186/s12934-017- 0832-4.

(34) Bornhorst, J. A.; Falke, J. J. [16] Purification of Proteins Using Polyhistidine Affinity Tags. 45

(35) Xie, Y.; Wetlaufer, D. B. Control of Aggregation in Protein Refolding: The Temperature- Leap Tactic. Protein Sci. 1996, 5 (3), 517–523. https://doi.org/10.1002/pro.5560050314.

(36) Chow, H. K.; Xu, J.; Shahravan, S. H.; De Jong, A. T.; Chen, G.; Shin, J. A. Hybrids of the BHLH and BZIP Protein Motifs Display Different DNA-Binding Activities in Vivo vs. in Vitro. PLoS One 2008, 3 (10). https://doi.org/10.1371/journal.pone.0003514.

(37) Hellman, L. M.; Fried, M. G. Electrophoretic Mobility Shift Assay (EMSA) for Detecting Protein- Nucleic Acid Interactions. Nat Protoc. 2007, 2 (8), 1849–1861. https://doi.org/10.3969/j.issn.1672-7347.2012.03.002.

(38) Popa, S. C. Investigating the Intrinsically Disordered Regions of USF1 and Other BHLHZ Transcription Factors. 2020.

(39) Feng, J. A.; Johnson, R. C.; Dickerson, R. E. Hin Recombinase Bound to DNA: The Origin of Specificity in Major and Minor Groove Interactions. Science (80-. ). 1994, 263 (5145), 348–355. https://doi.org/10.1126/science.8278807.

(40) Stewart, E. J.; Åslund, F.; Beckwith, J. Disulfide Bond Formation in the Escherichia Coli Cytoplasm: An in Vivo Role Reversal for the Thioredoxins. EMBO J. 1998, 17 (19), 5543–5550. https://doi.org/10.1093/emboj/17.19.5543.

(41) Mcguffin, L. J.; Bryson, K.; Jones, D. T. The PSIPRED Protein Structure Prediction Server; 2000; Vol. 16.

(42) line Sorin, C.; Salla-Martret, M.; Bou-Torrent, J.; Roig-Villanova, I.; Martínez-García, J. F. ATHB4, a Regulator of Shade Avoidance, Modulates Hormone Response in Arabidopsis Seedlings. https://doi.org/10.1111/j.1365-313X.2009.03866.x.

(43) Janik, K.; Schlink, K. Unravelling the Function of a Bacterial Effector from a Non- Cultivable Plant Pathogen Using a Yeast Two-Hybrid Screen. J. Vis. Exp. 2017, 2017 (119), 1–10. https://doi.org/10.3791/55150.

(44) Suzuki, M.; Gerstein, M. Binding Geometry of Α‐helices That Recognize DNA. Proteins Struct. Funct. Bioinforma. 1995, 23 (4), 525–535. https://doi.org/10.1002/prot.340230407.

46

(45) Matte, A.; Kozlov, G.; Trempe, J.-F.; Currie, M. A.; Burk, D.; Jia, Z.; Gehring, K.; Ekiel, I.; Berghuis, A. M.; Cygler, M. Preparation and Characterization of Bacterial Protein Complexes for Structural Analysis; Academic Press, 2009; Vol. 76, pp 1–42. https://doi.org/10.1016/s1876-1623(08)76001-2.

(46) Elgert, C.; Rühle, A.; Sandner, P.; Behrends, S. Thermal Shift Assay: Strengths and Weaknesses of the Method to Investigate the Ligand-Induced Thermostabilization of Soluble Guanylyl Cyclase. J. Pharm. Biomed. Anal. 2020, 181, 113065. https://doi.org/10.1016/j.jpba.2019.113065.

47

1.5 Appendix

1.5.1 Appendix 1 – Composition of media and buffers LB/ LB agar 50x TAE 10 g tryptone 2M Tris 5g yeast extract 1M acetic acid 10g NaCl 50 mM EDTA Add ddH2O to 1 L pH 8.3 Add 15 g agar for plates SOB media 5x TBE 20 g tryptone 445 mM Tris 5 g yeast extract 445 mM boric acid 10 mM EDTA pH 8.0 10x M9 Salt 67.8 g Na2HPO4 30 g KH2PO4 5 g NaCl 10 g NH4Cl Add ddH2O to 1 L

33x Amino acid solution Prepare the following solutions: Solution 1: Solution 4: 0.99 g Phe 1.04 g Asp 1.1 g Lys 18.7 g Glu 2.5 g Arg Add dd H2O to 100 mL Add dd H2O to 100 mL Solution 5: Solution 2: 14.6 g Gln 0.7 g Val 0.36 g Tyr 0.84 g Ala Add dd H2O to ~90 mL 0.41 g Trp 0.2 g Gly Solution 6: Add dd H2O to 100 mL 0.79 g Ile 0.77 g Leu Solution 3: Add dd H2O to 100 mL 8.4 g Ser 4.6 g Pro 0.96 g Asn 0.71 g Thr Add dd H2O to 100 mL Add the six solutions listed above and add NaOH pellets until all amino acids dissolve. Filter sterilize the final mixture and store at 4°C.

48

1.5.2 Appendix 2 – Oligonucleotide sequences and supplementary figures

Appendix 2. Table 1. Sequences of oligonucleotides used for site directed mutagenesis/cloning Name Sequence -12 0sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAATTATCAAAAACAGCGAAGG -12 0sp invHixC REV primer 5’AATTCCTTCGCTGTTTTTGATAATTATCAAAAACAGCTCCTGC -12 2sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAAGATTATCAAAAACAGCGAA GG -12 2sp invHixC REV primer 5’AATTCCTTCGCTGTTTTTGATAATCTTATCAAAAACAGCTCCT GC -12 5sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAAGAGAGTTATCAAAAACAGC GAAGG -12 5sp invHixC REV primer 5’AATTCCTTCGCACAAAAACTATTCTCTCAATAGTTTTTGTGCT CCTGC -12 7sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAAGAGAGAGTTATCAAAAACA GCGAAGG -12 7sp invHixC REV primer 5’AATTCCTTCGCTGTTTTTGATAACTCTCTCTTATCAAAAACAG CTCCTGC -12 9sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAAGAGAGAGAGTTATCAAAAA CAGCGAAGG -12 9sp invHixC REV primer 5’AATTCCTTCGCTGTTTTTGATAACTCTCTCTCTTATCAAAAAC AGCTCCTGC -8 0sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAATTATCAAAAACAGGG -8 0sp invHixC REV primer 5’AATTCCCTGTTTTTGATAATTATCAAAAACAGCTCCTGC -10 0sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAATTATCAAAAACAGCGGG -10 0sp invHixC REV primer 5’AATTCCCGCTGTTTTTGATAATTATCAAAAACAGCTCCTGC -14 0sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAATTATCAAAAACAGCGAAGG CG -14 0sp invHixC REV primer 5’AATTCGCCTTCGCTGTTTTTGATAATTATCAAAAACAGCTCCT GC -16 0sp invHixC FWD primer 5’GGCCGCAGGAGCTGTTTTTGATAATTATCAAAAACAGCGAAGG CGCG -16 0sp invHixC REV primer 5’AATTCGCGCCTTCGCTGTTTTTGATAATTATCAAAAACAGCTC CTGC Hin_XhoI FWD primer 5’CCGGTCTCGAGGAGCGCAACTATGCG Hin_XhoI REV primer 5’CGCATAGTTGCGCTCCTCGAGACCGG Linker extension FWD primer 5’GATCCGGCGGTGCAAGCG Linker extension REV primer 5’GATCCGCTTGCACCGCCG EMSA_0sp invHixC FWD 5’[FAM]AGGAGCACGTGTTTTTGATAATTATCAAA primer AACACGTGCTCCT [FAM] refers to the 6-FAM tag added to the EMSA oligonucleotide to provide fluorescence.

Digested sticky ends of EcoRI, NotI, and BamHI are bolded. Oligonucleotides used for cloning the cognate DNA into pH3U3 were ordered with the sticky ends so they could be directly ligated into the reporter plasmid without digestion. 49

Appendix 2. Table 2. DNA sequences used in B1H, CD, or EMSA Name Sequence 0sp invHixC, B1H reporter TGTTTTTGATAATTATCAAAAACA 2sp invHixC, B1H reporter TGTTTTTGATAAGATTATCAAAAACA 5sp invHixC, B1H reporter TGTTTTTGATAAGAGAGTTATCAAAAACA 7sp invHixC, B1H reporter TGTTTTTGATAAGAGAGAGTTATCAAAAACA 9sp invHixC, B1H reporter TGTTTTTGATAAGAGAGAGAGTTATCAAAAACA Positive control, B1H reporter GCGTGGGGCGatCGAATTCTTTACA -11 GC E-box, Non-specific GCCACGTGCGCCATCGAATTCTTTACA control, B1H reporter, CD 0sp invHixC, EMSA [FAM]AGGAGCACGTGTTTTTGATAATTATCAAAAACACGTGCTCCT

Sequences listed here were cognate DNA targets used in HinZip’s CD, EMSA, and B1H experiments. Refer to Figure 13b for details on the -8, -10, -14, and -16 constructs of invHixC.

Cognate sites are bolded, and flanking sequences are highlighted in red. Spacers comprising 0-9 bp of GA repeats are highlighted yellow. [FAM]refers to the 6-FAM tag added to the EMSA oligonucleotide.

50

Appendix 2. Figure 1. N-terminal sequences of HD-Zip TFs. Residues such as glycine (G), arginine (R), and lysine (K) are highly conserved in the N-terminal region of HD-Zip TFs before helix 1. The presence of these residues often suggests a disordered conformation, which concurs with HD-Zip’s general structure across the four classes. Adapted from ref. 5.

51

Proteins/Protein motifs: : Hin recombinase DBD

: Hin recombinase catalytic domain

: Histone-like protein (HU)

: Fis protein

DNA sequences: : hixL : hixR : Secondary site : Enhancer site

Appendix 2. Figure 2. Formation of Hin recombinase’s tetrameric complex during DNA inversion. Two monomers of Hin recombinase bind to each of the two hix target sequences that flank the 1 kb Salmonella DNA segment. The monomers then dimerize via their catalytic domains. The dimers interact with the Fis proteins on the adjacent enhancer sites to form the final tetrameric invertasome complex. A histone-like protein (HU) binds to a secondary site within the inverted DNA to facilitate DNA bending. The invertasome complex facilitates DNA inversion.

52

a) b)

c)

Appendix 2. Figure 3. Plasmid maps of pB1H2w2 and pET28a. a) The multiple cloning sites of the pB1H2w2 plasmid. KpnI and XbaI are positioned downstream of the RNAP ω-subunit.

The insertion of the protein coding sequence using these two restriction sites created the protein constructs in our B1H assays where the ω-subunit of RNAP was fused to the N-terminus of the Hin DBD. b) The multiple cloning sites of the pET28a plasmid. KpnI was inserted previously between XhoI and NotI on pET28a for subcloning of the protein coding sequence for protein expression. The insertion of the protein sequence was immediately upstream of a poly-histidine segment. c) Protein sequences for HinZip and variants. The mutated/extended residues are highlighted in red. The residues from restriction sites are highlighted yellow; GT=KpnI, KL=HindIII, GS=BamHI, LE=XhoI.

53

-16 0sp invHixC

-16 2sp invHixC

tightHinZip -16 5sp invHixC

-16 7sp invHixC

-16 9sp invHixC

+

-

-16 0sp invHixC

-16 2sp invHixC

HinZip -16 5sp invHixC

-16 7sp invHixC

-16 9sp invHixC

+

-

Appendix 2. Figure 4. B1H of HinZip and tight HinZip binding to the -16 invHixC reporter system. Shown are the 2.5 mM 3-AT plates. Neither protein showed significant transcriptional activation.

54