UNIVERSITY OF CINCINNATI

Date:______

I, ______, hereby submit this work as part of the requirements for the degree of: in:

It is entitled:

This work and its defense approved by:

Chair: ______

DNA REGOGNITION BY THE K50 CLASS HOMEODOMAIN PITX2: SOLUTION

STRUCTURE, MOLECULAR DYNAMICS, AND IMPLICATIONS FOR MUTATIONS

THAT CAUSE RIEGER SYNDROME.

A dissertation submitted to the

Division of Research and Advanced Studies of the University of Cincinnati

In partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY (Ph.D.)

In the Department of Molecular Genetics, Biochemistry and Microbiology of the College of Medicine

2005

by

Beth A. Chaney

B.S. Wilmington College, 2000

Committee Chair: Mark Rance, Ph.D.

ABSTRACT

We have determined the solution structure of a complex containing the K50 class homeodomain Pituitary 2 (PITX2) bound to its consensus DNA site

(TAATCC). Previous studies have suggested that residue 50 is an important determinant of differential DNA-binding specificity among homeodomains. Although structures of several

homeodomain-DNA complexes have been determined, this is the first structure of a native K50

class homeodomain. The only K50 homeodomain structure determined previously is an X-ray

crystal structure of an altered specificity mutant, Q50K (EnQ50K). Analysis of the

NMR structure of the PITX2 homeodomain indicates that the lysine at position 50 makes contacts with two guanines on the antisense strand of the DNA, adjacent to the TAAT core DNA sequence, consistent with the structure of EnQ50K. Our evidence suggests that this side chain may make fluctuating interactions with the DNA. Mutations in the human PITX2 are responsible for Rieger syndrome, an autosomal dominant disorder. Analysis of the residues mutated in Rieger syndrome indicates that many of these residues are involved in DNA binding, while others are involved in formation of the hydrophobic core of the protein. We have also performed molecular dynamics (MD) simulations on the PITX2 homeodomain-DNA complex.

The results indicate that motion of the K50 side chain is on a time scale longer than what we can simulate by MD. The results also show a number of long-lived water molecules in the vicinity of

R52 and R53, which form water-bridging interactions with the DNA. A number of water molecules are shown to be in the vicinity of other arginines in contact with the DNA, and near

K50. We also performed molecular dynamics simulations of Rieger mutant complexes. The results of these simulations were compared to the wild-type case, and there are many differences in levels of hydration, water residence times, energy levels, water bridging interactions, and

direct protein-DNA interactions. Overall, the role of K50 in homeodomain recognition is further clarified, and the results indicate that native K50 homeodomains may exhibit differences from altered specificity mutants. These results also provide further insight into how Rieger mutations cause severe phenotypic consequences.

Acknowledgements

There have been countless people who have helped me get this far. First and foremost, I have to

thank my thesis advisor, Dr. Mark Rance who allowed me to join his lab, and has provided the

best working environment/"home" for me for the past 5 years. His guidance and support have

been invaluable. I also have to thank all of the members of my thesis committee (Dr. Paul

Rosevear, Dr. Gary Dean, Dr. Jun Ma and Dr. John Maggio) who have been a great help when I

needed to determine a path for my thesis to take, and have provided help along that path.

Countless thanks go to Dr. Kimber Clark-Baldwin, who taught me virtually everything I know about labwork and has always been a great second opinion when I was trying to interpret strange lab results. I also have to thank Jamie Titus, who was struggling along in her project at the same time I was, and was always there to commiserate with when our clones kept coming up negative!

And of course, I can't forget Eric Johnson, who has a great taste in music and let me copy his

CDs. He was also always there to bring us back down to earth with all of his equations. I would also like to thank Dr. Jack Howarth for computer support and maintaining the NMR facility at the University of Cincinnati. And special thanks to members of the Rosevear Lab who have provided comic relief over the years, especially Neal and Alex, both of whom I greatly miss! I'd

also like to thank the love of my life, Jeff, for putting up with my endless complaining about how

I want to graduate and always being there for me and encouraging me to stick with it. And last,

but most certainly not least, I'd like to dedicate this work to my parents, Jerry and Phyllis

Chaney, whose love and support have kept me going over the years, and without whom, I would

never have made it this far!

TABLE OF CONTENTS

Table of Contents……………………………………………………………………….. 1 List of Figures and Tables………………………………………………………………. 5 Abbreviations…………………………………………………………………………… 8

CHAPTER 1: The PITX2 Homeodomain—a review of the literature 10

Introduction…………………………………………………………………………….. 10

PITX2 and Disease……………………………………………………………………… 10

PITX2 and Development……………………………………………………………….. 14

PITX1 and PITX3……………………………………………………………………… 22

The Homeodomain…………………………………………………………………….. 23

Homeodomain Structure and DNA Recognition………………………………………. 25

Bicoid………………………………………………………………………………….. 36

Non-consensus Site Recognition…………………………………….……………….. 37

Statement of Research Goals…………………………………………………………. 41

CHAPTER 2: Materials and Methods 43

Expression of the PITX2 homeodomain………………………………………………. 43

Purification of the PITX2 homeodomain……………………………………………… 45

Gel Shift Assays………………………………………………………………………. 47

Determination of KD…………………………………………………………………. 47

NMR Structure Determination………………………………………………………... 48

Protein assignments…………………………………………………………... 48

DNA assignments………………………………………………………….…. 49

1 TABLE OF CONTENTS (continued)

Structural constraints…………………………………………………………. 49

Data processing and analysis…………………………………………………. 49

Relaxation rates………………………………………………………………. 50

Structure calculations…………………………………………………………. 50

Docking of the protein to the DNA…………………………………………… 52

Structure refinement and analysis…………………………………………….. 56

Molecular Dynamics Calculations…………………………………………………… 57

Preparation of mutant protein files…………………………………………… 57

Molecular dynamics………………………………………………………….. 58

Analysis of production run data……………………………………………… 67

CHAPTER 3: Solution Structure of the K50 Class Homeodomain PITX2 Bound to DNA and Implications for Mutations That Cause Rieger Syndrome 71

Functional Analysis of Purified PITX2 Homeodomain……………………………… 71

KD of the PITX2 Homeodomain Bound to its DNA Consensus Site………………… 72

Analysis of Protein Folding by HSQC………………………………………………. 73

Resonance Assignments…………………………………………………………….. 75

Chemical Shift Indices……………………………………………………………… 77

Aromatic Assignments……………………………………………………………… 79

Side Chain Assignments for Arginine, Asparagine, and Glutamine Residues…….. 80

Chemical Shift Assignments for the DNA bound to the PITX2 homeodomain……. 81

Protein-DNA NOEs………………………………………………………………… 82

Tertiary Structure of the Pitx2 Homeodomain……………………………………… 82

2 TABLE OF CONTENTS (continued)

Structure determination…………………………………………………….. 82

Quality of the NMR structure ……………………………………………… 83

Tertiary structure of the PITX2 homeodomain-DNA complex……………. 87

Protein-DNA recognition…………………………………………………… 94

The role of lysine at position 50…………………………………………… 98

Analysis of residues mutated in Rieger syndrome………………………… 103

Concluding Remarks………………………………………………………………. 106

CHAPTER 4: DNA Recognition by the Human PITX2 Homeodomain: Molecular Dynamics Simulation of Wild-Type and Rieger Mutant Complexes 109

Introduction…………………………………………………………………………. 109

Overall Behavior of the Trajectories……………………………………………….. 112

Analysis of the molecular dynamics of the wild-type PITX2 HD-DNA Complex ………………………………………………………………... 114

Hydration and water-mediated protein-DNA contacts…………………….. 116

Properties of Lys50………………………………………………………… 119

Analysis of Mutant Complexes…………………………………………………….. 122

T30P Simulation…………………………………………………………... 123

R31H Simulation…………………………………………………………... 124

V45L Simulation…………………………………………………………... 124

R46W Simulation…………………………………………………………... 127

K50Q Simulation…………………………………………………………... 128

R52C Simulation…………………………………………………………... 128

3 TABLE OF CONTENTS (continued)

R53P Simulation…………………………………………………………... 129

Discussion…………………………………………………………………………. 130

CHAPTER 5: Thesis Summary and Future Directions 135

CHAPTER 6: Literature Cited 140

4 LIST OF FIGURES AND TABLES

FIGURES

Figure 1.1 Symptoms of Rieger sydrome…………………………………… 12

Figure 1.2 Structure of the Engrailed homeodomain as a model for overall homeodomain structure …………………………………………. 27

Figure 2.1 Amino acid sequence of the PITX2 homeodomain used for the structural studies…………………………………………………. 43

Figure 2.2 Optimizing expression conditions for the PITX2 homeodomain…. 44

Figure 2.3 Purification of the PITX2 homeodomain and production of a pure NMR sample………………………………………………… 46

Figure 2.4 DNA sequence of the binding site used in the structural studies…. 46

Figure 3.1 Gel shift assays of the PITX2 homeodomain……………………. 72

Figure 3.2 KD of PITX2 homeodomain……………………………………… 73

Figure 3.3 15N HSQC for the PITX2 homeodomain bound to its concensus DNA site……………………………………………… 74

Figure 3.4 15N-HSQC of the PITX2 homeodomain labeled with backbone and side chain assignments obtained through triple-resonance experiments……………………………………… 75

Figure 3.5 Ensemble of structures of the PITX2 homeodomain/DNA complex…………………………………………………………… 86

Figure 3.6 Ramachandran plot………………………………………………… 87

Figure 3.7 Structure of the PITX2 homeodomain/DNA complex……………. 88

Figure 3.8 Overlay of PITX2 homeodomain and EnQ50K homeodomain structures…………………………………………………………… 92

Figure 3.9 R2 relaxation rate constants for the PITX2 homeodomain………… 93

Figure 3.10 Detailed view of the protein-DNA interface and protein-DNA contacts……………………………………………… 94

5 FIGURES (continued)

Figure 3.11 The K50 side chain may be mobile……………………………….. 101

Figure 3.12 Ribbon diagram of the PITX2 homeodomain/DNA complex showing the positions of the side chains for the residues known to be mutated in Rieger syndrome and related disorders…………… 105

Figure 4.1 RMSD values of the MD snapshots versus the starting NMR structure……………………………………………………….. 113

Figure 4.2 Total energy levels of the MD snapshots as a function of simulation time………………………………………………………. 114

Figure 4.3 NUCPLOT diagram of the average structure of the wild-type protein-DNA complex during the 2 ns trajectory……………………. 115

Figure 4.4 Outline of some of the water molecules at the protein-DNA interface……………………………………………………………… 118

Figure 4.5 Snapshots of a single water molecule’s trajectory during the 2 ns simulation time…………………………………………………. 119

Figure 4.6 Properties of K50 during the MD simulation………………………… 121

Figure 4.7 NUCPLOT diagram of the average structure of the V45L mutant protein-DNA complex during the 2 ns trajectory…………… 126

Figure 4.8 Overlay of the wild-type and R46W mutant complexes……………. 127

Figure 4.9 NUCPLOT diagram of the average structure of the R53P mutant protein-DNA complex during the 2 ns trajectory…………… 130

TABLES

Table 1.1 Mutations found in Rieger syndrome, and their properties………… 20

Table 1.2 Sequence alignment of homeodomains……………………………. 25

Table 1.3 List of DNA sites that PITX2 recognizes………………………….. 40

Table 2.1 Sequences of oligonucleotide duplexes used in gel shift assays…… 47

Table 3.1 List of chemical shifts for the PITX2 homeodomain bound to DNA……………………………………………………………… 76

6

TABLES (continued)

α α Table 3.2 Chemical shift indices for the H , C , and CO atoms of the PITX2 homeodomain………………………………………….. 78

Table 3.3 Assignments of atoms in aromatic groups of the PITX2 homeodomain……………………………………………………… 79

Table 3.4 Chemical shift assignments of the arginine side chains…………… 80

Table 3.5 Chemical shift assignments of the asparagine side chains………… 80

Table 3.6 Chemical shift assignments of the glutamine side chains………… 80

Table 3.7 Chemical shift assignments of the DNA binding site…………….. 81

Table 3.8 Table of protein-DNA NOEs……………………………………… 82

Table 3.9 NMR structure statistics…………………………………………… 84

Table 4.1 Hydration in the WT and mutant trajectories……………………… 118

7 ABBREVIATIONS

AMBER Assisted Model Building with Energy Refinement

AML Acute Myeloid Leukemia

ANF Atrial Natriuretic Factor gene

Antp Antennapedia bb backbone

Bcd Bicoid

CANDID combined automated NOE assignment and structure determination

CSI chemical shift indices

CYANA CANDID (see above) + DYANA (dynamics algorithm for NMR applications

DORV double outlet right ventricle

DSS 2,2-dimethyl-2-silapentane-5-sulfonic acid

DTT dithiothreitol

En Engrailed

EnQ50K Engrailed Q50K mutant

Ftz Fushi tarazu

GCMa Glial Cells Missing homolog 1

HD homeodomain

HMQC heteronuclear multiple quantum coherence

HSQC heteronuclear single quantum correlation

IPTG isopropyl-beta-D-thiogalactopyranoside

MD molecular dynamics

MEF2A Myocyte Enhancing Factor 2A

8 NMR nuclear magnetic resonance

NOE nuclear Overhauser effect

NOESY nuclear Overhauser enhancement spectroscopy

Nps maximum number of picoseconds a water is within 3.0 Å

Nw maximum number of water molecules within 3.0 Å

PBS phosphate buffer solution

PDB

PITX1 Pituitary homeobox protein 1

PITX2 Human Pituitary homeobox protein 2

Pitx2 Mouse, chick, zebrafish Pituitary homeobox protein 2

PITX3 Pituitary homeobox protein 3

PLOD Procollagen gene

PME Particle Mesh Ewald

PMSF phenyl methyl sulfonyl fluoride

POMC Pro-opiomelanocortin gene

RMSD root mean square deviation

SANDER simulated annealing with NMR-derived energy restraints

TCB thrombin cleavage buffer

TOCSY total correlation spectroscopy

Vnd vnd/Nk-2

WT wild-type

9 CHAPTER 1: The PITX2 Homeodomain—a review of the literature

Introduction

Pituitary homeobox protein 2 (PITX2) is a that binds to DNA with its homeodomain region. This protein plays important roles in the development of the heart, and in left-right asymmetry. Mutations in the human PITX2 gene are responsible for Rieger syndrome, which is an autosomal dominant disorder. Many of the mutations found in this gene in Rieger syndrome are single amino acid substitutions within the homeodomain region, which suggests

this domain is very important in development. The homeodomain of PITX2 is a member of the

K50 class of homeodomains, which all have a characteristic lysine residue at position 50 of the

homeodomain. The Drosophila morphogenetic protein Bicoid is another important and well-

studied member of this class of homeodomain . Bicoid and PITX2 are both known to

control pattern formation in a dose-dependent manner during embryonic development. The

highly homologous proteins PITX1 and PITX3 are also known to function in a similar manner during embryogenesis, and their homeodomain regions are also highly homologous, including

the lysine residue at position 50 of the homeodomain. Previously, there had been no structure

determined for a K50 class homeodomain. The goal of this research was to determine the

structure of the PITX2 homeodomain bound to DNA and to analyze the molecular dynamics of

this interaction. This chapter covers previous research regarding the importance of PITX2 and

related proteins in development, the mechanisms by which homeodomains bind to DNA, and structural information available from other classes of homeodomain proteins.

PITX2 and Disease

The PITX2 gene was originally identified by positional cloning of the 4q25 locus in patients with Rieger syndrome [Semina et al, 1996a]. Other groups have also cloned this gene

10 and given it various other names (Ptx2, Otlx2, Brx1, and ARP1) [Kitamura et al, 1997; Arakawa et al, 1998]. Mutations in this gene not only cause Rieger syndrome, but also the related and less severe conditions iris hypoplasia and iridogoniodysgenesis syndrome [Kozlowski & Walter,

2000; Kulak et al, 1998]. These are all autosomal dominant disorders, and they all are characterized by anterior segment abnormalities, which are abnormalities of the eye. Rieger syndrome is a dominant haploinsufficient disorder, which indicates that reduction of PITX2 activity by half can cause disorders of development [Semina et al, 1996a]. There are numerous mutations in the highly homologous PITX1 gene that result in Treacher Collins syndrome

[Graham & McGonnell, 1999]. This disorder includes symptoms that are highly variable, such as underdeveloped and malformed facial bones, hearing loss, and strabismus (a turning in of the eyes).

Axenfeld-Rieger syndrome is a heterogeneous disorder, which is characterized by malformations of the eyes, teeth and umbilicus. It is a group of anomalies that includes Rieger syndrome, Axenfeld anomaly and Rieger anomaly. Patients with Axenfeld anomaly have defects of the eye, with abnormal iris tissue. Patients with Rieger anomaly have the abnormalities seen in Axenfeld anomaly, with the addition of iris changes and a displaced pupil. The most important ocular feature of this family of disorders is glaucoma, which develops in approximately 50% of patients [Espinoza et al, 2002]. Iris hypoplasia is the mildest of the disorders, characterized solely by maldevelopment of the iris stroma and early-onset glaucoma.

The pigment epithelium that is visible through the thin stroma gives the iris a striking color of slate gray to chocolate brown [Alward et al, 1998]. Patients with iridogoniodysgenesis syndrome have these defects, along with abnormalities in iridocorneal angle tissue differentiation

[Kulak et al, 1998]. Rieger syndrome is the most extreme member of this family of diseases,

11 with ocular, facial, dental, and umbilical anomalies (See Figure 1.1). Omphalocele, when abdominal contents protrude through the base of the umbilical cord, is found in about 5% of patients [Katz et al, 2004]. Teeth anomalies occur as abnormally small teeth (microdontia) along with spaces between teeth, misshapened teeth and missing teeth (hypodontia). In older patients, the teeth can become brittle and fall out. Consistent with the role of PITX2 in heart development, patients with Rieger syndrome often exhibit cardiac defects [Gage et al, 1999a;

Kitamura et al, 1999; Lu et al, 1999; Mammi et al, 1998].

A B D C

Figure 1.1. Symptoms of Rieger Syndrome. A) Dental hypoplasia. B) Omphalocele. C) Glaucoma and craniofacial defects. D) Protrusion of umbilicus [Semina et al, 1996a].

Rieger syndrome is associated with mutations in the PITX2 gene, and can also be associated with gene abnormalities [Riise et al, 2001; Perveen et al, 2000]. Mutations in

PITX2 account for about 40% of the known cases of Rieger syndrome [Hjalt et al, 1999].

Sequencing of DNA from human patients has shown that many mutations in PITX2 result in single amino acid substitutions within the homeodomain region [Priston et al, 2001]. This illustrates the importance of the homeodomain of PITX2 in development. PITX2 with a mutation at position 45 of the homeodomain region (V45L) can bind DNA at slightly lower

12 levels, and has a 200% increase in transactivation activity [Priston et al, 2001]. It's believed that this mutation affects the homeodomain conformation in such a way to affect DNA binding and transactivation differently. Another mutation found in the homeodomain in Rieger syndrome is

K50E [Saadi et al, 2001]. Transient transfection assays with the promoter and both

PITX2 and another pituitary transcription factor, Pit-1, show a strong synergistic effect on transactivation [Amendt et al, 1998]. The K50E mutation suppresses this synergism [Saadi et al,

2001, Amendt et al, 1998]. It's been found that PITX2 can form homodimers in the absence of

DNA, and the K50E mutation has a stronger dimerization activity [Saadi et al, 2003]. The wild-

type PITX2 homodimers can bind cooperatively to DNA, but the K50E-WT heterodimers have

greatly reduced cooperativity and transactivation function. This mutation therefore acts in a

dominant negative fashion. A R46W mutation was found specifically in iris hypoplasia [Heon et

al, 1995; Alward et al, 1998]. A R31H mutation was found specifically in iridogoniodysgenesis

syndrome [Chisholm & Chudley, 1983; Walter et al, 1996; Kulak et al, 1998]. Other mutations

found in the homeodomain of PITX2 in Rieger syndrome include L16Q, T30P, and R53P

[Semina et al, 1996a; Semina et al, 1996b; Murray et al, 1992]. Analysis of these mutant

proteins by electrophoretic mobility shift assays has shown that the iris hypoplasia mutant

(R46W) retains most of its DNA-binding activity, while the Rieger syndrome mutants are

nonfunctional [Kozlowski & Walter, 2000]. These results support the hypothesis that differences

in functional amounts of PITX2 may be the basis for the wide spectrum of anomalies in the

Axenfeld-Rieger group of disorders. Other studies have also provided evidence that physical or

functional haploinsufficiency of PITX2 is a pathogenic mechanism for Rieger syndrome

[Flomen et al, 1998; Espinoza et al, 2002]. The R53P mutant also exhibits cytoplasmic staining

in COS-7 cells, which supports the hypothesis that there is a nuclear localization signal within

13 the third helix of the PITX2 homeodomain. These mutants will be discussed further below and are outlined in Table 1.1. A recent study described a Chinese family in which mutational analysis showed a frame shift mutation that causes the PITX2 protein to be truncated after the homeodomain [Wang et al, 2003]. Affected members of this family show prominent dental abnormalities along with the other symptoms of Rieger syndrome.

PITX2 and Development

PITX2 is a protein that is found in many developing tissues in vertebrate embryos. It is expressed in the brain, heart, pituitary, mandibular and maxillary regions, eye, gut, limb and umbilicus [Semina et al, 1996a; Gage & Camper, 1997; Mucchielli et al, 1997; Hjalt et al, 2000].

It is the first transcriptional marker observed during tooth development [Green et al, 2001].

There have been three major isoforms of PITX2 identified, and these isoforms are produced by and use of different promoters [Semina et al, 1996a; Gage & Camper, 1997;

Arakawa et al, 1998; Gage et al, 1999a; Kitamura et al, 1999]. All of the isoforms contain different N-terminal domains, while the homeodomain and C-terminal domains are identical.

The C-terminal region contains a transcriptional activation domain. Phosphorylation of the C- terminus by PKC enhances the interaction with cellular factors, and increases transcriptional activation [Espinoza et al, 2005]. Studies have shown that tissue and organ development is differentially regulated by PITX2a and PITX2c isoforms. In the chick, Pitx2c plays a crucial role in left-right axis determination and rightward heart looping [Yu et al, 2001]. In zebrafish,

Pitx2a has a greater impact on cardiac symmetry than Pitx2c [Essner et al, 2000]. Experiments in mice have shown that different organs have different dosage requirements for Pitx2c [Liu et al, 2001]. This study showed that lower levels of Pitx2c are required for cardiac atria development, while higher levels are required for duodenum and lung development. Another

14 PITX2 isoform (PITX2d) has been identified, but it has a truncated homeodomain and does not

bind to DNA [Cox et al, 2002]. This study showed that PITX2d can negatively regulate the

transcriptional activities of PITX2a and PITX2c. One study has shown that in the craniofacial

region where all three isoforms are expressed, it's not the isoform type that controls differential

regulation of , but the dosage of the Pitx2 protein [Liu et al, 2003]. They found that

repression of Bmp4 signaling requires high doses of Pitx2, while maintenance of Fgf8 signaling

only requires low levels of Pitx2.

Experiments in mice have shown that Pitx2 is expressed in the odontogenic epithelium

and is the first transcriptional marker of tooth development [Mucchielli et al, 1997; Green et al,

2001]. In tooth development, the epithelium differentiates into the enamel-secreting ameloblasts, while the mesenchyme cells become the dentin-secreting odontoblasts. Expression of Pitx2 is

restricted to the epithelium, and can be detected as early as embryonic day 8.5 during mouse

tooth [Mucchielli et al, 1997; St. Amand et al, 2000]. Pitx2 expression remains

specific to the oral epithelium with a progressive restriction to the dental placodes, followed by

high-level expression in the dental lamina and enamel knot. Postnatal expression is still detected

in relatively undifferentiated epithelial tissue in the tooth germs, in the later-developing second

and third molar anlage [Green et al, 2001]. Pitx2 is found in the preameloblasts, and is absent

from the fully differentiated ameloblasts postnatally [Mucchielli et al, 1997].

The internal organs of vertebrates are arranged on both sides of the body's midline with a characteristic asymmetry. At the end of gastrulation, the precursors of the respiratory and digestive organs and the heart are located at the midline. The first visible sign of left-right asymmetry is the right-sided looping of the developing heart. Eventually, all visceral organs show left-right asymmetry, either as single organs (heart, stomach and spleen) or because paired

15 organs such as the lungs display more lobes on one side than the other. The right lung has three

lobes, while the left lung has two lobes. The liver and gallbladder are positioned to the right of

the midline, while the stomach and spleen are positioned to the left. The apex of the heart points

to the left and the primitive gut coils in a counterclockwise direction. Alterations in left-right specification can lead to severe defects, including left-right reversals of organ position (situs inversus), mirror image symmetry of asymmetric tissues (isomerism), or random and

independent laterality defects in different tissues (heterotaxia). Situs inversus carries minor

medical risk because the organs are normal in structure and in their positions relative to one

another, but the other defects can have severe consequences [Bisgrove & Yost, 2001]. PITX2

plays a role in left-right asymmetry as part of the . It is expressed on

the left side of developing embryos, and when expressed on the right side, the location of the organs is reversed. Nodal signals have been implicated in specification of the germ layer, patterning of the nervous system, and determination of bilateral asymmetry of organs. Nodal signaling on the left side of the developing embryo causes induction of expression of Pitx2 in the left side of the developing embryo [Kathiriya & Srivastava, 2000]. Members of the Lefty family

of TGF-β proteins have been shown to act as inhibitors of Nodal signaling, and it appears that a delicate balance between Lefty and Nodal proteins regulates the left-sided expression of Pitx2

[Bisgrove & Yost, 2001].

Another study has shown a role for Pitx2 in mediating proliferation of specific cell types as part of the Wnt/Dvl/β-Catenin pathway [Clevers, 2002; Kioussi et al, 2002]. Pitx2 expression in cardiac neural crest cells is decreased in Dvl2-/- mice, and chromatin immunoprecipitation

analysis in a pituitary cell line has shown that Lef1 and β-catenin physically occupy the Pitx2

promoter. In Pitx2-/- mice, the cardiac outflow tract and pituitary glands have lower numbers of

16 proliferating cells, and transgenic overexpression of Pitx2 leads to increased cell numbers. It was subsequently shown that the cell cycle regulator has bicoid binding sites in its promotor region that are bound and subsequently regulated by Pitx2. This regulation also involves a physical interaction of Pitx2 with β-catenin. These results support a model in which

Wnt signaling activates Pitx2, which then drives cell proliferation in a tissue-specific manner. A more recent study has shown that this pathway partially regulates Pitx2 by controlling the turnover of the unstable Pitx2 mRNA [Briata et al, 2003]. In return, Pitx2 is a mediator of

Wnt/β-catenin-induced mRNA stabilization.

During looping of the heart in chicks, Pitx2 is present in the left atrium, in the ventral portion of the ventricles and in the left-ventral part of the outflow tract [Campione et al, 2001].

Mouse Pitx2 shows a similar developmental expression pattern. Pitx2 null mice show no alteration in heart looping, but they have numerous heart abnormalities, showing that Pitx2 is important for normal heart development [Lu et al, 1999; Lin et al, 1999; Gage et al, 1999a;

Kitamura et al, 1999]. The cardiac ventricles are displaced rightwards, the heart fails to septate the atrium, and there is variable hypoplasia of the ventricles in the mutant mice. These hearts fail to develop tricuspid and mitral valves and a common atrioventricular valve develops

[Kitamura et al, 1999]. Loss of Pitx2 function causes severe cardiovascular defects, such as atrial isomerism, double inlet left ventricle, transposition of the great arteries, persistent truncus arteriosus, and abnormal aortic arch remodeling, which are all conditions found in humans

[Franco & Campione, 2003]. Ectopic expression of either Pitx2c or Pitx2a via retroviral infection to the right side equally randomizes heart looping direction [Yu et al, 2001]. Ectopic

Pitx2c expression in the developing myocardium of mice creates double outlet right ventricle

(DORV) [Franco & Campione, 2003].

17 Mice that have been genetically engineered to be homozygous for a Pitx2 null allele have, in addition to the above heart defects, arrest of development of the pituitary gland, numerous defects of the eye, and alteration of development of the mandibular and maxillary regions [Gage et al, 1999a; Lu et al, 1999; Lin et al, 1999]. These null mutants display right pulmonary isomerism and altered architecture of the lobes of the left lung [Kitamura et al, 1999;

Hjalt et al, 2000]. These mice die by embryonic day 14.5. Mice that are heterozygous for a null

Pitx2 allele also exhibit defects in embryogenesis [Gage et al, 1999a]. A small number of these mice have anterior chamber defects of the eye and heart defects [Gage et al, 1999a]. They also fail to close the ventral body wall, which is consistent with omphalocele found in some Rieger patients [Gage et al, 1999a]. Rieger syndrome results from haploinsufficiency, which is consistent with some of the defects seen in the heterozygous mice [Gage et al, 1999a; Flomen et al, 1997]. The correct concentration of Pitx2 protein appears to be crucial for normal physiological function of this transcription factor.

Several target genes for PITX2 have been identified previously. It is believed that protein binding to the C-terminus of PITX2 allows for binding of PITX2 to DNA, possibly by masking an inhibitory domain [Amendt et al, 1999]. In the pituitary, the prolactin gene is synergistically activated by Pit-1 and PITX2 [Amendt et al, 1998; Quentien et al, 2002a]. Other PITX2 target genes in the pituitary have also been described [Tremblay et al, 2000]. A number of genes outside of the pituitary have been shown to be regulated specifically by PITX2. It has been shown by chromatin immunoprecipitation with antibodies specific for PITX2 that PITX2 regulates expression of procollagen lysyl hydroxylase (PLOD) and Dlx2 [Hjalt et al, 2001; Green et al, 2001]. Subsequent experiments showed that the Atrial Natriuretic Factor (ANF) gene is also a target of PITX2 [Ganga et al, 2003]. ANF is expressed early in embryonic development

18 when cells are committed to the cardiac phenotype. It has also been found that PITX2 strongly activates the Gad1 promoter, which is involved in GABAergic neuron differentiation during mammalian neural development [Westmoreland et al, 2001]. The gene PLOD2 encodes a protein that is responsible for hydroxylation of lysines in collagens, which plays a role in creating the extracellular matrix and provides a foundation for the morphogenesis of tissues and organs. The gene Dlx2 encodes a transcription factor that is expressed in the mesenchymal and epithelial cells of the facial region and tooth-forming anlage, and is also expressed in the diencephalon. Dlx2 is a member of the distal-less family of genes, and has been shown to regulate branchial arch development [Qiu et al, 1995; Thomas et al, 2000]. PITX2 and Dlx2 are expressed in the same tissues during early development. PITX2 binds to consensus and nonconsensus bicoid sites in the Dlx2 promoter and activates this promoter 30-fold in CHO cells

[Espinoza et al, 2002]. Another study found a 45-fold activation of this promoter in CHO cells by PITX2 [Green et al, 2001]. PITX2 proteins engineered with mutations found in Rieger syndrome were used to determine if they could transactivate the Dlx2 promoter. A phenotypically less severe mutation (R46W) is able to bind and transactivate the promoter. A more severe mutation (T30P), which presents with the full spectrum of anomalies, is unable to transactivate the Dlx2 promoter. One study looked at five mutations found in Rieger syndrome that are still able to bind to the CE-3 DNA response element from the pituitary POMC gene

[Quentien et al, 2002b]. All five of the mutant proteins (L16Q, T30P, R31H, R46W, R53P) have lost the transactivation function of three different pituitary gene promoters (prolactin, growth hormone, and pit-1). Four of the five mutations tested fail to affect wild-type PITX2 induction of the prolactin promoter, while the fifth (R53P) acts as a dominant negative inhibitor, blocking wild-type PITX2 induction of the prolactin promoter. Small changes in the protein conformation

19 caused by point mutations in DNA-binding or protein-binding surfaces may alter the types of

protein binding partnerships that form, and may also change which cofactors are recruited to a

target gene promoter [Voss & Day, 2002]. A summary of the point mutations found in the

PITX2 homeodomain in Rieger syndrome, and their biochemical effects is shown in Table 1.1.

Thus, the molecular basis of tooth anomalies in Rieger syndrome appears to be the inability of

PITX2 to activate genes involved in tooth morphogenesis. An additional mutation, V45L has

been found to have an increase in activation function by about 200%, even though DNA binding is lowered [Priston et al, 2001]. Overexpression of Pitx2 in mice has been shown to have very similar phenotypic consequences, with glaucoma and anterior defects [Holmberg et al, 2004].

Therefore, overexpression of PITX2 in development appears to be just as detrimental as underexpression.

Mutation Disease Properties L16Q Rieger Syndrome Unstable, no activation, no consensus binding T30P Rieger Syndrome No activation, only binds consensus R31H Iridogoniodysgenesis Reduced activation, only binds consensus site V45L Rieger Syndrome <10-fold reduction in DNA-binding, Increased activation R46W Iris Hypoplasia Reduced binding to CE-3 site, reduced activation K50E Rieger Syndrome No DNA binding or activation, Dominant negative K50Q Rieger Syndrome Not known R52C Rieger Syndrome Not known R53P Rieger Syndrome No CE-3 binding, no activation, dominant negative

Table 1.1: Mutations found in Rieger Syndrome, and their developmental properties [Semina et al, 1996; Priston et al, 2001; Kulak et al, 1998; Saadi et al, 2001; Heon et al, 1995; Alward et al, 1998; Chisholm & Chudley, 1983; Walter et al, 1996; Murray et al, 1992; Quentien et al, 2002b].

20 A recent study has indicated that PITX2 interacts with the transcription factor GCMa, which is expressed in the placenta during development, and in the kidney and thymus postnatally

[Schubert et al, 2004]. This study showed that there is cooperative binding between these two proteins in binding to promoters. Another study found that PITX2 interacts with myocyte enhancing factor 2A (MEF2A) in regulating expression of ANF [Toro et al, 2004]. And as mentioned above, the prolactin gene is synergistically activated by Pit-1 and PITX2 [Amendt et al, 1998; Quentien et al, 2002a], and Pitx2 associates with β-catenin to regulate cyclin D2 expression [Kioussi et al, 2002]. Therefore, PITX2 may be important in interacting with other transcription factors to regulate .

It is believed that PITX2 plays a role in regulation of cell differentiation and cell proliferation in adult vertebrates as well. PITX2 has been isolated as a downstream target of the human acute leukemia ALL1 gene, which has been implicated in the development of human acute leukemia associated with abnormalities at 11q23 [Arakawa et al, 1998]. PITX2 is expressed in normal human bone marrow and leukemic cell lines with a normal ALL1 allele, but is not expressed in the leukemic cell lines in which ALL1 is rearranged. A recent study has shown that expression of PITX2a induces actin-myosin reorganization and increased cell spreading in HeLa cells [Wei & Adelstein, 2002]. As discussed below, the lysine at position 50 of the homeodomain is critical for specificity of PITX2 binding to the bicoid DNA site. When this lysine was mutated, the mutants did not cause the changes in cell spreading and morphology, which suggests that this cellular phenotype requires PITX2a with lysine at position 50 [Wei &

Adelstein, 2002]. Another study has found hypermethylation of the PITX2 promoter region in

86% of patients with acute myeloid leukemia (AML) [Toyota et al, 2001]. Hypermethylation of

CpG-rich promoter regions can result in gene silencing.

21

PITX1 and PITX3

PITX1 is crucial for proper development of the craniofacial region. This gene is

expressed in the head and oral cavity regions during embryonic development [Crawford et al,

1997]. It is also expressed in the tissues that give rise to the lower body wall, bladder and

hindgut. Pitx1 is expressed in the mesenchyme of the hindlimb bud, but not in that of the

forelimb. One study has shown that together, Pitx1 and Pitx2 are required for formation of the

hindlimb buds in the mouse and when present in low amounts, they're also important for

development of the femur, tibia, and digit 1 hindlimb structures [Marcil et al, 2003]. Treacher

Collins Syndrome is believed to be caused by mutations in PITX1 [Crawford et al, 1997]. This is

an autosomal dominant disorder that is characterized by craniofacial abnormalities. PITX1 and

PITX2 have 97% similarity in the homeodomain region [Gage & Camper, 1997; Suh et al,

2002]. Along with Pitx2, Pitx1 is involved in regulating tooth development [St. Amand et al,

2000]. Mice deficient in Pitx1 have severe defects in development of the hindlimb, along with cleft palate formation and additional mild pituitary defects [Lanctot et al, 1999; Szeto et al,

1999]. In contrast to Pitx2 and Pitx3, mutations in Pitx1 in mice are recessive [Gage et al,

1999b]. In mice that are mutant for Pitx1, the pelvic girdle is smaller, and the long bones of the

limbs are significantly shorter [Graham & McGonnell, 1999; Lanctot et al, 1999]. The reduced

diameter of the bones is due to impaired ossification and calcification. The joints are also

altered. PITX1 was found to be involved in transcription of the pro-opiomelanocortin (POMC)

gene in the anterior pituitary lobe [Lamonerie et al, 1996]. The first cells to differentiate in the

pituitary are the cells that express this gene. Mutations in Pitx1 cause the aphakia phenotype in

mice. Human congenital aphakia is very rare and is classified as primary, in which no lens

22 anlage has developed, or secondary, in which a lens has begun to develop but is then expelled in

utero [Semina et al, 1997].

PITX3 was cloned from neuronal tissues and is expressed in the midbrain dopaminergic

neurons [Smidt et al, 1997]. These particular neurons are of interest because they are involved in

the pathogenesis of Parkinson's disease. A recent study has found that Pitx3 activates the mouse

tyrosine hydroxylase promoter [Lebel et al, 2001]. Tyrosine hydroxylase is the rate-limiting

enzyme of dopamine and noradrenaline biosynthesis. Mice deficient in Pitx3 have

microphthalmia, and arrested development of the lens and anterior segment structures [Semina et

al, 2000]. This protein was also found to be involved in cataract and congenital total cataract

when mutated [Semina et al, 2000]. The PITX3 protein has 70% overall identity to other

members of this family [Semina et al, 1998].

The Homeodomain

The homeodomain is a protein domain that has been conserved throughout evolution in organisms such as yeast, Drosophila, and humans [Dave et al, 2000]. The homeodomain consists of 60 amino acids, and the 180 DNA sequence that encodes it is called the homeobox.

Proteins with homeodomains are known to be important in controlling embryonic pattern formation and cell-type specification and differentiation [Dave et al, 2000]. Homeotic mutations were first observed in genetic Drosophila research about 75 years ago [Billeter, 1996]. The homeodomain was first seen in proteins that are involved in specifying segment identity in

Drosophila. Mutations in these proteins in flies cause one body segment to be transformed into another, which is called homeosis. Examples are mutations resulting in transformations of the

third thoracic segment to a second thoracic segment, which leads to four-winged flies (bithorax mutations) [Lewis, 1978], or mutations that result in the replacement of the antennae on a fly's

23 head by legs (Antennapedia mutations) [Gehring, 1966]. As gene cloning techniques developed,

these mutations were localized to their genes, and it was found that a 180 base pair segment

could be cross-hybridized with many other genes, and the homeobox was discovered [McGinnis

et al, 1984; Scott & Weiner, 1984]. Hundreds of were subsequently discovered.

Homeoboxes have been found at all levels of development: in the establishment of morphogenetic gradients, in the structure formation of groups of body segments, and in defining the unique identity of single segments. This domain is responsible for recognizing specific sequences of DNA, and thereby recruiting the corresponding transcription factors to specific target genes. In the course of evolution, the amino acid sequence of the homeodomain has been conserved to a very high degree. An example is the human Hox-A7 homeodomain, which differs in only 1 out of 60 positions from that of the Antennapedia homedomain from Drosophila

[Gehring et al, 1994]. The protein sequence of homeodomains tends to be more highly conserved than that of the DNA sequence, which suggests that it is the protein sequence that is being selected for and maintained during evolution [Scott et al, 1989]. A sequence alignment of many common homeodomains is shown in Table 1.2. There are currently over 750 known homeodomain proteins. Mutations in homeodomains have been found in many forms of human disease, including the ones discussed above for the PITX family of homeobox proteins

[Boncinelli, 1997]. Other diseases caused by mutations in homeobox genes include mutations in

HOXD13, which causes (a abnormality of the hands and feet involving both webbing of the fingers and duplication of fingers) [Muragaki et al, 1996; Akarsu et al, 1995], and mutations in HOXA9 involved in acute myeloid leukemia [Nakamura et al, 1996; Borrow et al,

1996].

24

10 20 30 Pitx2 Q R R Q R T H F T S Q Q L Q Q L E A T F Q R N R Y P D M S T

Bcd P R R T R T T F T S S Q I A E L E Q H F L Q G R Y L T A P R Antp R K R G R Q T Y T R Y Q T L E L E K E F H F N R Y L T R R R En E K R P R T A F S S E Q L A R L K R E F N E N R Y L T E R R Ftz S K R T R Q T Y T R Y Q T L E L E K E F H F N R Y I T R R R Mat2 K P Y R G H R F T K E N V R I L E S W F A K N* P Y L D T K G

Vnd K R K R R V L F T K A Q T Y E L E R R F R Q Q R Y L S A P E 40 50 60

Pitx2 R E E I A V W T N L T E A R V R V W F K N R R A K W R K R E Bcd L A D L S A K L A L G T A Q V K I W F K N R R R R H K I Q S

Antp R I E I A H A L C L T E A Q I K I W F Q N R R M K W K K E N En R Q Q L S S E L G L N E A Q I K I W F Q N K R A K I K K S

Ftz R I D I A N A L S L S E R Q I K I W F Q N R R M K S K K D R Mat2 L E N L M K N T S L S R I Q I K N W V S N R R R K E K T I T Vnd R E H L A S L I R L T P T Q V K I W F Q N H R Y K T K R A Q

Table 1.2: Sequence alignment of homeodomains. En stands for Engrailed, Ftz for Fushi tarazu, Antp for Antennapedia, and Vnd for Vnd/Nk-2, * = there is a 3 residue insertion here that was taken out for alignment. Highlighted residues indicate residues conserved in all 7 homeodomains shown.

Homeodomain Structure and DNA Recognition

The homeodomain consists of a self-folding, stable protein domain of 60 amino acids,

and previous structures of homeodomains have shown that it consists of a compact three-helix

structure and a flexible N-terminal arm [Gehring et al, 1994; Wolberger, 1993] (See Figure 1.2).

The third helix is called the recognition helix, and it makes specific contacts within the major

groove of the DNA. The overall arrangement of the homeodomain-DNA complex structures that

have been determined are very similar, with helices I and II being aligned in an antiparallel

arrangement above the DNA. The recognition helix (helix III) is positioned in the major groove

of the DNA. Homeodomains have evolved different DNA specificities in part by altering the

amino acid residue at position 50, which can interact with base pairs 5 and 6, and to a lesser

25 extent, base pair 4, in the TAATNN consensus binding site. A previous study has shown that each of 6 different amino acids tested at position 50 confers a different DNA binding specificity

[Wilson DS et al, 1996]. Tucker-Kellogg et al. [1997] and others have emphasized the point that the degree of specificity of a homeodomain for its particular DNA binding consensus site depends on the identity of the amino acid residue in position 50. Most homeodomains contain a glutamine residue at this position, and are therefore referred to as Q50 homeodomains. Q50 homeodomains prefer DNA sequences such as TAATTA and TAATGG. The homeodomain of

Bicoid, which is a Drosophila morphogenetic protein, contains a lysine at position 50, and is the founding member of the K50 class of homeodomains [Hanes & Brent, 1989]. The K50 class of homeodomains recognizes a consensus DNA sequence of TAATCC. Much attention has been focused on the consequences of lysine being located at position 50, largely due to the fact that the most dramatic examples of altered DNA specificity occur when a lysine is either introduced or replaced at position 50. For example, when Q50 in Engrailed is mutated to an alanine, the

Q50A mutant has a very similar affinity and specificity as the wild-type protein, but when mutated to a lysine, the specificity changes from TAATTA to TAATCC, clearly demonstrating the important role played by the residue in position 50, especially in the case of K50, in defining the specificity of DNA binding [Ades & Sauer, 1994; Fraenkel et al, 1998; Grant et al, 2000].

Percival-Smith et al [1990] investigated wild-type and Q50K mutant Fushi tarazu homeodomains in conjunction with altering the base pairs at positions 5 and 6 in the binding site, and found that differences in KD of ~100-fold are observed when the binding site is not the optimal one. In addition to position 50, position 47 has a role in defining specificity for some homeodomains in correlation with base-pair 4 of the binding site, especially when the residue is phenylalanine or arginine [Tron et al, 2001; Pomerantz & Sharp, 1994].

26

Figure 1.2. Structure of the Engrailed homeodomain as a model for overall homeodomain structure [Pabo & Sauer, 1992].

A comparison of the homeodomain's three-dimensional structure with that of other DNA- binding proteins reveals a large amount of structural similarity of the fragment formed by helices

II and III to the earlier described helix-turn-helix motif, which is part of many prokaryotic repressors [Qian et al, 1989; Brennan & Matthews, 1989]. However, the second helix of this motif (helix III) is longer in homeodomains, although its C-terminal end was found to be structurally less stable in free homeodomains and is sometimes called helix IV [Qian et al, 1989;

Billeter et al, 1990; Qian et al, 1993; Tsao et al, 1994]. The first experimental structure of a homeodomain-DNA complex (Antennapedia) revealed that homeodomains interact in a different way with DNA than prokaryotic repressor domains [Otting et al, 1990]. In conventional helix- turn-helix motifs, residues of the turn and of the first helical loop contact the DNA bases. In homeodomains, these contacts are made with residues located about two helical turns away from the N-terminal end of helix III [Harrison, 1991]. The structure of the LFB1/HNF1 homeodomain, which contains a 21 residue long insertion in the turn between helices II and III superimposes very well with the Antennapedia homeodomain, which indicates that the structure of the turn is not important for the DNA binding of homeodomains [Ceska et al, 1993; Leiting et al, 1993; Schott et al, 1997]. The relative arrangement of the helices therefore may help to stabilize a global fold so that a helix can fit in the major groove of DNA.

27 The tertiary structure of the homeodomain was first determined by solution-state NMR studies of the Antennapedia homeodomain from Drosophila [Otting et al, 1988; Qian et al, 1989;

Billeter et al, 1990; Guntert et al, 1991]. This structure consists of 3 α-helices folded into a

compact, globular structure, and an N-terminal extension. The N-terminal extension precedes

helix I, which is separated from helix II by a loose loop. Helix II forms a helix-turn-helix motif

with helix III. Structural information about the contacts the homeodomain makes with DNA

have been elucidated from the X-ray crystal structure of the Drosophila Engrailed protein bound to DNA and the NMR structure of the Antennapedia HD-DNA complex [Clarke et al, 1994;

Kissinger et al, 1990; Fraenkel et al, 1998; Fraenkel & Pabo, 1998]. A number of structural studies have shown that the global structure and method of DNA binding of homeodomains are highly conserved. Intermolecular interactions responsible for the specificity of the DNA recognition are concentrated to amino acid residues in helix III [Kissinger et al, 1990; Otting et al, 1990; Wolberger et al, 1991; Billeter & Wuthrich, 1993; Klemm et al, 1994; Hirsch &

Aggarwal, 1995; Li et al, 1995; Wilson DS et al, 1996]. Because the recognition helix spans the entire major groove, residues from three helical turns can reach the DNA bases (residues 47, 50,

51 and 54) [Qian et al, 1989]. Approximately six DNA base pairs are involved in the recognition site (typically a TAAT core followed by 2 base pairs that differ). The N-terminal arm is known to make additional contacts with bases in the minor groove of the DNA [Billeter, 1996]. The loop between helices I and II makes contacts with the DNA backbone. While previous studies agree on the overall docking arrangement, questions remain concerning the roles of key residues at the protein-DNA interface, hydration, and the extent of side chain motion.

The three helices span residues 10-21, 28-38 and 42-58. The removal of the N-terminal residues 1-6, which are flexibly disordered, causes only localized structural variations and does

28 not noticeably affect helix I [Qian et al, 1994a]. The structure is stabilized mostly by hydrophobic interactions involving F8, L13, L16, F20, L40, V45, W48, and F49 [Gehring et al,

1994; Qian et al, 1989; Kornberg, 1993]. Gln12 probably stabilizes the N-terminus by hydrogen bonding to residue 9 [Billeter, 1996]. Several regions of homeodomains are involved in nonspecific contacts with DNA: the N-terminus, the loop between helices I and II and residues from all parts of helix III. Many of these interactions involve basic residues that contact the phosphate groups of the DNA. Some examples are the arginines that are highly conserved at positions 31, 52 and 53, and the lysine at position 55. Specific contacts with DNA are limited to a small number of residues at the N-terminus, and residues from the recognition helix.

Most models of DNA recognition by proteins rely on intermolecular interactions and the complementarity of the protein and the DNA surface. Asn51 is conserved in nearly all of the known homeodomains [Billeter, 1996]. It has been found to form hydrogen bonds with a conserved adenine in the DNA binding site. It is highly disruptive if this amino acid residue is mutated. No other amino acid side chain at position 51 can give the same high affinity as asparagine. It is believed that this residue contributes mostly to the correct positioning of the recognition helix in the major groove, and not to binding specificity [Billeter, 1996]. The positioning is also likely due to backbone contacts made by other conserved amino acid residues.

Sixteen residues in the α-helices are conserved in homeodomains. The side chains of eight of these amino acid residues are believed to define the conformational relationships of the three helices, as they point to the interior of the protein (see above). The major contacts with the DNA are made by amino acids usually in positions 2 or 3 and 5 of the N-terminal arm, as well as residues in the recognition helix (positions 46, 47, 50, 51, and 54) [Billeter, 1996]. These contacts seem to make up the primary interactions that are responsible for specific binding. The

29 surrounding residues can also affect the specificity through more indirect effects. Residue 47 interacts only with the β-strand of DNA and always contacts base pair 4 and sometimes base pair

3 [Billeter, 1996]. If the side chain of this residue is hydrophobic, the nucleotide that is contacted in base pair 4 is T and the contacts are hydrophobic. When an asparagine is found at this position, then a C is contacted by a hydrogen bond. Residue 54 is the last residue at the C- terminal end of the helix that is involved in contacting DNA. The role of this residue may be to stabilize the helix or modulate DNA specificity [Billeter, 1996].

In terms of structural stability, the recognition helix of free homeodomains can be divided into a well-structured N-terminal half (residues 42 to 51), and a C-terminal half (residues 52 to

59) that may be less structured, depending on the homeodomain [Tell et al, 1999]. A previous study has shown that the identities of the residues at positions 54 and 56 can alter the structural stability and melting temperature of the homeodomain [Tell et al, 1999]. This study found that when both of these residues are hydrophobic, the stability of the C-terminal half of the recognition helix is the highest. Alternatively, when the residues at positions 54 and 56 are a combination of hydrophobic and hydrophilic residues, the whole stability of the recognition helix drops significantly. This unstable C-terminal half may provide the recognition helix with a structural flexibility, which allows a better “induced fit”. This higher flexibility could also allow the recognition of a wider spectrum of DNA sequences. Another study has suggested that the salt bridges formed in the homeodomain structure have no essential stabilizing role at room temperature, but instead might be important for improving thermostability [Iurcu-Mustata et al,

2001]. This hypothesis was supported by a correlation between the melting temperatures of several homeodomains and the number of salt bridges and cation-π interactions that connect the secondary structures. CD and fluorescence spectroscopy have been used in the past to probe the

30 structure and stability of the Bicoid protein (also a K50 protein, discussed below) [Subramaniam et al, 2001]. This study found that both W48 and F8 are necessary for the structural stability of the Bicoid homeodomain. An aromatic residue in position 8, and its interaction with W48, may

be critical for the structure of the homeodomain by bringing helix I and II into a conformation

optimal for DNA binding.

Relatively new NMR techniques have emerged that allow for the estimation of the

location and the lifetime of contacts between atomic groups of macromolecules and the water

molecules that are hydrating their surfaces [Otting et al, 1991]. When these techniques were

applied to the Antennapedia homeodomain-DNA complex, they demonstrated that there are

water molecules next to residues that are critical for specific DNA recognition [Qian et al, 1993].

A molecular dynamics simulation of the complex in a water bath implies the presence of up to

five water molecules in the cavity at the interface between the recognition helix and the DNA

[Billeter et al, 1996]. Water molecules at the protein-DNA interface are also visible in X-ray

crystal structures at high resolution [Hirsch & Aggarwal, 1995; Li et al, 1995; Wilson DS et al,

1996]. In the paired (S50) structure, position 50 forms hydrogen bonds to two water molecules,

which then hydrogen bond to DNA bases [Wilson DS et al, 1996]. This interfacial water is

likely to allow for mobility of the protein side chains at the interface. X-ray structures provide

information about the exact location of water molecules, while NMR data provides information

regarding the lifetime of a water-protein contact [Billeter, 1996]. The current model for the

specific homeodomain-DNA contacts is a fluctuating network of hydrogen bonds between polar

groups of the protein and the DNA, and the water molecules. These are complemented by

hydrophobic contacts. The lifetimes of the interactions are in the nanosecond to microsecond

range [Qian et al, 1993]. Water molecules appear to not only act to improve the

31 complementarity of the protein-DNA interaction, but also act to reduce entropic costs when a

large number of interactions are required for a highly specific recognition [Billeter et al, 1996].

Both the protein and DNA specifically recognize each other's hydration pattern, and there

appears to be a mobile arrangement of bonding interactions between protein, DNA and water

molecules.

There is very little known about the dynamics of the interaction between the

homeodomain and DNA. The few NMR studies that have been performed looking at side chain

dynamics of DNA-binding proteins have found evidence that there is significant conformational

flexibility at the protein-DNA interface [Slijper et al, 1997; van Heijenoort et al, 1998; Palmer,

1993; Pervushin et al, 1997]. The only studies that looked at NMR relaxation rate measurements

on homeodomain-DNA complexes were backbone 15N data on the Vnd/NK-2 homeodomain

[Fausti et al, 2001], and a study of 2H relaxation for asparagine and glutamine side chains in the

Ftz homeodomain [Pervushin et al, 1997]. The first study found that the motional behavior

primarily reflects the protein’s tertiary structure and stability of the backbone. The molecular

dynamics simulation described above for the Antennapedia homeodomain indicated that

homeodomain-DNA interactions are dynamic and fluctuating [Billeter et al, 1996], while

crystallographic studies indicate that the Antennapedia homeodomain has a well-defined

conformation at the protein-DNA interface [Fraenkel & Pabo, 1998]. Further studies into the

dynamics of the homeodomain-DNA complexes should provide insight into these issues.

Although structures of several homeodomains and homeodomain-DNA complexes have

been determined by X-ray crystallography or NMR spectroscopy [Banerjee-Basu et al, 2003],

including representatives of the wild-type Q50, S50, C50, G50 and I50 classes of homeodomains

[Otting et al, 1988; Li et al, 1995; Cox et al, 1995; Piper et al, 1999; Tejada et al, 1999], the only

32 experimentally determined K50 homeodomain structure available is an X-ray crystal structure of an altered specificity mutant, Engrailed Q50K (EnQ50K), bound to the TAATCC site [Tucker-

Kellogg et al, 1997]. The latter study found that the side chain of K50 projects into the major groove of the DNA and makes hydrogen bond contacts with the guanines at base pairs 5 and 6 of the complementary strand of the TAATCC binding site. This is the only case in which direct hydrogen bond contacts have been reported for amino acid residue 50 in any homeodomain-

DNA complex structure. Unfortunately, the relevance of the EnQ50K studies, or analyses of other mutants such as Paired S50K [Wilson DS et al, 1996] and Fushi tarazu Q50K [Zhao et al,

2000], to the case of native K50 homeodomains is unclear in the absence of experimental structural data for a native K50 homeodomain. For example, the identity of the amino acid residue at position 54 seems to be constrained by the residue at position 50. A glutamine at position 50 allows for many different residues to be present at position 54, with Met being the most abundant (see Table 1.2). However, Met54 is never found when position 50 is lysine

[Pellizzari et al, 1997]. Determining the biological relevance of studies of single site mutants should take into account possible covariation of residues [Clarke, 1995]. For example, structural studies of an EnQ50A mutant have been conducted [Grant et al, 2000], in order to provide additional information concerning the role of residue 50 in general, and Q50 in particular; however, a phage display selection of Engrailed mutants failed to recover a Q50A mutant – the only Q50A mutant recovered also contained a I47T mutation [Simon et al, 2004]. Another issue concerning the Engrailed Q50K mutant is the observation that it binds to the consensus

TAATCC site with an unusually high affinity, which approaches the picomolar range [Ades &

Sauer, 1994]. There is no evidence that natural K50 class homeodomains have such a high affinity for DNA [Amendt et al, 1998; Ma et al, 1996]. The full-length PITX2 protein has a KD

33 of 50 nM [Amendt et al, 1998]. A KD was determined for the Q50K mutant of the Fushi tarazu

homeodomain, and this value was found to be 0.63 nM [Percival-Smith et al, 1990], a much

lower affinity than the EnQ50K mutant. The KD for the PITX2 homeodomain alone was found

to be 2.6 +/- 0.38 nM (see Figure 3.2), which is comparable to the Fushi tarazu mutant, and also a much lower affinity than the Engrailed Q50K mutant. Moreover, the X-ray structure of

EnQ50K reveals two distinct conformations with the side chain of K50 contacting either the 5th or 6th position on the anti-sense strand of the DNA. Whether or not natural K50 homeodomains exhibit these two conformations in a static state, and whether or not the side chain of K50 exhibits a fluctuating state were unknown. These considerations underscore the importance of obtaining solution structures of native K50 class homeodomains.

The question regarding side chain conformational heterogeneity, referred to above in the context of the observations concerning the K50 side chain in the EnQ50K crystal structure, is broader in scope and of fundamental importance for understanding the full range of interactions that can occur at a protein-DNA interface. Crystallographic studies have generally indicated that there are several conserved and relatively stable contacts at the homeodomain-DNA interface. In

several instances, such as the aforementioned case of K50 in the EnQ50K structure and the case

of Gln50 in the crystal structure of an even-skipped homeodomain complex [Hirsch & Aggarwal,

1995], multiple, significantly populated conformations are observed for the side chain, while the

nearly invariant asparagine in position 51 is observed to make very stable contacts with the

adenine base in position 3 of the consensus TAAT core binding site. On the other hand, NMR

studies [Tsao et al, 1994] and molecular dynamics simulations [Billeter et al, 1996; Gutmanas &

Billeter, 2004] have provided strong indications of a dynamic, fluctuating environment

encompassing some of the key amino acid side chains at the interface, most importantly, the side

34 chains of asparagine 51 and of the position 50 residue. Billeter and co-workers [1996] proposed that, at least in the case of Antennapedia, the homeodomain achieves specificity through a fluctuating network of short-lived contacts that allow it to recognize DNA without the entropic cost that would result if side chains were immobilized upon DNA binding (as discussed above).

Significant interest has been expressed in the literature [Tucker-Kellogg et al, 1997; Billeter et al,

1996; Gutmanas & Billeter, 1994; Duan & Nilsson, 2002] for obtaining experimental data on native K50 homeodomains in order to shed further light on these fundamental issues.

It is difficult to explain the effects of single amino acid replacements without having the actual three-dimensional structure of the PITX2 homeodomain. While no structure of a natural

K50 homeodomain had been determined prior to our work, computer modeling had been used to create a model of the structure of the PITX2 homeodomain bound to DNA [Banerjee-Basu &

Baxevanis, 1999]. This study used threading analysis to model the PITX2 homeodomain structure after the Engrailed homeodomain. The Engrailed homeodomain is 35% homologous to the PITX2 homeodomain. They found that the structure is most likely stabilized primarily by hydrophobic interactions between residues at the helical interface, and the key hydrophobic interactions are strictly conserved between the Engrailed and PITX2 homeodomains. This study also did threading analysis of mutant PITX2 homeodomains and found that the severity of the defect as determined biochemically was directly correlated with how much that particular mutation would disrupt the putative structure and interactions of the homeodomain with DNA.

The threading analysis did not provide a PDB file that we could analyze in detail and is not necessarily indicative of true molecular structure. For this threading analysis, the focus was the role of Rieger mutations in causing disease, and there was no discussion of the role of K50 in determining the DNA-binding affinity and specificity of the homeodomain, which is something

35 best addressed via an experimentally determined structure rather than a threading model. This

study also did not analyze the K50 Rieger mutants, so there is no indication what the structure of

this side chain was in their analysis.

Proposals have been made on what effects the mutations seen in the PITX2

homeodomain in Rieger syndrome and the related anomalies may have on the structure and

function of this protein [Banerjee-Basu & Baxevanis, 1999; Kozlowski & Walter, 2000]. These

studies proposed that replacing R46 on the hydrophilic face of helix 3 with a hydrophobic and

bulky tryptophan residue may interfere with the stability of the homeodomain-DNA complex.

The arginine at position 46 may contact a sugar residue of the DNA backbone, just as K46 does

in Engrailed. The mutation at position 31 occurs at a position important for DNA binding in

helix 2. The reduction in function of the mutation at position 16 may be due to improper folding

of the homeodomain. A leucine at this position is conserved in homeodomains, and mutating

this residue may disrupt the packing of helix 1 against helix 3. One study did a survey of point

mutations found in 17 homeodomains, and three “hot spot” regions were found: Arg5, Arg31,

and amino acids in the recognition helix (especially Arg 52 and Arg 53) [D’Elia et al, 2001].

The arginines in the recognition helix contact phosphates in the DNA backbone, so it’s easy to see how this would disrupt homeodomain function. Arg31 also contacts the phosphate backbone, and is believed to establish a salt bridge with E42 at the N-terminus of the recognition helix, and is therefore involved in the correct packing of helices II and III.

Bicoid

Bicoid is the founding member of the K50 class of homeodomain proteins, and many of the studies of DNA recognition by K50 proteins have been performed with Bicoid. Head and thorax development in the Drosophila embryo require the maternal determinant Bicoid

36 [Frohnhofer & Nusslein-Volhard, 1986]. Bicoid mRNA is made during oogenesis and is transported into the egg, where it becomes localized at the anterior tip and diffuses away from there, forming a concentration gradient [Berleth et al, 1988; Driever & Nusslein-Volhard,

1988a]. This protein is expressed as an anteroposterior concentration gradient in the early embryo and is necessary for the expression of many zygotic genes in distinct anterior domains

[Driever & Nusslein-Volhard, 1988b; Struhl et al, 1989]. Increases or decreases in protein levels in different regions of the Drosophila embryo cause a corresponding posterior or anterior shift of anterior anlagen in the embryo [Driever & Nusslein-Volhard, 1988b]. Embryos from strong mutant bcd alleles completely lack head and thorax and instead, they have a second telson at the anterior end [Frohnhofer & Nusslein-Volhard, 1986]. The homeodomain of Bicoid is of the K50 class, and it recognizes DNA sequences found in the enhancer elements of Bcd-responsive genes such as hunchback, knirps, buttonhead, runt, hairy, orthodenticle and even-skipped [Burz et al,

1998; Driever & Nusslein-Volhard, 1989; Gao & Finkelstein, 1998; Hanes & Brent, 1989; La

Rosee et al, 1997; Small et al, 1992; Tsai & Gergen, 1994; Wilson et al, 1996; Wimmer et al,

1995; Yuan et al, 1996]. Bicoid also binds to caudal mRNA in the 3'UTR and prevents cap- dependent translation initiation of this mRNA and therefore prevents Caudal synthesis in response to the Bicoid gradient [Rivera-Pomar et al, 1996; Niessing et al, 2000].

Non-consensus Site Recognition

Although most homeodomains recognize a TAAT core sequence, they have evolved different DNA specificities by altering the amino acid residue at position 50, which can interact with base pairs 5 and 6 (see above). The K50 class of homeodomains, such as Bicoid and

PITX2, recognize a consensus DNA sequence of TAATCC. A previous study has shown that when the Q50 protein Fushi tarazu (Ftz) is mutated to having a lysine residue at position 50, the

37 Ftz(Q50K) protein can recognize the TAATCC consensus sequence in vitro [Zhao et al, 2000].

But, this mutant fails to select natural Bicoid targets in vivo. This indicates that the Ftz(Q50K)

mutant cannot recognize nonconsensus sites, and emphasizes the importance of nonconsensus

site recognition in vivo by Bicoid and PITX2. Bicoid is known to recognize at least three types

of nonconsensus DNA sites, which are TAAGCC, TGATCC, and AAATCC [Driever &

Nusslein-Volhard, 1989; Rivera-Pomar et al, 1995; Yuan et al, 1999]. Other nonconsensus sites

that have been reported to be recognized by Bicoid include TAAGCT and TCATCC [Driever &

Nusslein-Volhard, 1989]. Other studies have found that a single amino acid replacement at

position 50 of the homeodomain is sufficient to switch the DNA specificity of the Bicoid protein

[Hanes & Brent, 1989; Treisman et al, 1989].

Jun Ma's group (Cincinnati Children’s Hospital, Department of Developmental Biology) has performed experiments to look at the interactions between the Bicoid homeodomain and two different types of DNA sites, A1 and X1 [Dave et al, 2000]. A1 has the consensus sequence

TAATCC, while X1 has the nonconsensus sequence TAAGCT. Footprint analysis has shown that there are both shared and distinct contacts using the two different sites, which suggests that

Bicoid binds to these sites with a similar overall structure, but different interactions.

Experiments have indicated that Arg54 of the Bicoid homeodomain recognizes X1 by

specifically contacting the guanine at position 4. Arg54 is believed to contact the adenine of position 3 in the consensus site. In searching all natural K50 homeodomains, Bicoid is the only one with an arginine residue at position 54 of the homeodomain [Banerjee-Basu et al, 2003].

This residue is believed to allow Bicoid to recognize the X3 nonconsensus sequence (TGATCC)

and to allow efficient binding to X1. The PITX2 homeodomain has an alanine at position 54

instead of an arginine. Pitx2 has a high affinity for the A1 site, but weaker affinity to X1, and no

38 binding to the X3 (TGATCC) nonconsensus site. When the alanine at position 54 is mutated to

an arginine, the mutant homeodomain binds efficiently to the X3 site, and retains binding affinity

for A1 and X1.

Previous studies have indicated that recognition of nonconsensus DNA sites plays an

important role in mediating the function of Bicoid [Driever & Nusslein-Volhard, 1989; Ma et al,

1999; Rivera-Pomar et al, 1995; Yuan et al, 1999]. Chemical footprint assays have revealed

shared and distinct contacts with the consensus site (also called A1) and the nonconsensus site

TAAGCT (also called X1), and this suggests that the Bicoid homeodomain binds to these sites

with a similar overall structure, but different kinds of interactions [Dave et al, 2000].

Methylation interference studies have shown that when the guanines at positions 5 and 6 of the

antisense strand of A1 (3'-ATTAGG-5') are methylated, this interferes strongly with binding of

the Bicoid homeodomain [Dave et al, 2000]. This study also showed that when the guanine at

position 5 on the antisense strand of X1 (3'-ATTCGA-5') is methylated, Bicoid homeodomain

binding is inhibited, and methylation of position 4 of the sense strand of X1 (TAAGCT) also

interfered with binding. Dave et al [2000] also conducted KMnO4 interference assays to

determine which thymines are important for binding of the Bicoid homeodomain. They

demonstrated that the thymine at position 2 in both A1 (3'-ATTAGG-5') and X1 (3'-ATTCGA-

5') is important for Bicoid binding.

Studies have shown that the PITX2 homeodomain can also recognize different DNA sites. Biochemical studies have shown that the PITX2 homeodomain can recognize both the consensus TAATCC site and a non-consensus TAAGCT site [Dave et al, 2000]. Another study has shown that the PITX2 homeodomain can recognize a TAAGCC site from a PITX1 target gene, pro-opiomelanocortin (POMC) [Kozlowski & Walter, 2000]. A recent study has shown by

39 chromatin immunoprecipitation that the mouse procollagen lysyl hydroxylase PLOD2 gene is a

target gene of PITX2 [Hjalt et al, 2001]. The promoter region of this gene contains multiple

DNA sequences that are recognized by PITX2 in vitro and in cells. Several of these sites are

nonconsensus sequences. A PITX2(T30P) mutant homeodomain, which has a threonine-to-

proline mutation, can recognize a TAATCC site with only a small reduction in affinity, but cannot recognize the CE-3 element of POMC, which has a TAAGCC sequence. The sites that

PITX2 recognizes are presented in Table 1.3.

Type Core Sequence Source Sequence

Consensus TAAT hb A1 CCAACGTAATCCCCATAG

Non-consensus TAAG POMC CGCTGCTAAGCCGGCCATC

Non-consensus AAAT hb X3t CATCCAAATCCAAGTGCG

Non-consensus TCAT plod-2 C TTTGTTTTCATCCCTAAACAC

Non-consensus TAGT plod-2 E CACTTTTAGTCCCAGGATTT

Non-consensus TATT Dlx2 TATTCC

Non-consensus TTAT Dlx2 TTATCC

Non-consensus CAAT Dlx2 CAATCC

Table 1.3: List of DNA sites that PITX2 recognizes [Dave et al, 2000; Kozlowski & Walter, 2000; Yuan et al, 1999; Hjalt et al, 2001; Green et al, 2001; Espinoza et al, 2002].

40 A structural study of the MATα2 homeodomain revealed some interesting features of homeodomains when they bind nonspecifically to DNA [Aishima & Wolberger, 2003]. This study found that when this homeodomain nonspecifically binds to DNA, the third helix actually rotates and makes a different set of contacts with the DNA. These nonspecifically bound homeodomains make few of the expected base-specific contacts seen in other structures, yet make many contacts with the phosphate backbone of the DNA. In the nonspecific complex, residues at positions 46 and 47 make contacts with the DNA, while other contacts are lost. This provides evidence that the homeodomain is capable of adjusting itself structurally in order to sample different DNA binding sequences, and it is possible that PITX2 is also capable of adjusting itself structurally to bind different DNA sites.

Statement of Research Goals

The purpose of this research was to determine the structure of the PITX2 homeodomain bound to consensus DNA and then analyze the molecular dynamics of this interaction. The

PITX2 homeodomain is an important member of the K50 class of homeodomain proteins.

PITX2 plays vital functions in the development of the heart, umbilicus, pituitary and craniofacial regions, just to name a few. Mutations in PITX2 are known to cause the autosomal dominant disorder Rieger syndrome, and related disorders. The homeodomain of PITX2 has a lysine at position 50, which defines the DNA sequences it is able to recognize. No structure of a native

K50 homeodomain protein had been determined previous to the research presented in this thesis.

Analyzing such a structure will be useful in determining what governs the specificity of this class of homeodomains, and what structural characteristics allow for binding to both consensus and nonconsensus binding sites. Previous studies of homeodomains have indicated possible

41 fluctuating interactions with the DNA, but very little research has been performed to study this possibility.

The primary goal of this research was to determine the structure of the PITX2 homeodomain bound to DNA, and therefore have the first structure of a native K50 class homeodomain. The molecular dynamics of the wild-type PITX2 homeodomain interaction with the DNA was then explored to gain a better understanding of whether fluctuating interactions between K50 and the DNA are present. Molecular dynamics of Rieger mutant PITX2 homeodomain-DNA complexes were also explored, to gain a better understanding of differences in protein-DNA recognition, including differences in specific contacts, and differences in hydration of the protein-DNA interface, between the wild-type and mutant complexes. The results of these analyses are presented in this thesis.

42 CHAPTER 2: Materials and Methods

Expression of the PITX2 Homeodomain

The expression plasmid pGEX-1λt-pitx2HD was kindly provided by the laboratory of Dr.

Jun Ma (Division of Developmental Biology at the Cincinnati Children's Hospital Research

Foundation). This construct consists of a glutathione S-transferase tag and a thrombin cleavage

site prior to the PITX2 homeodomain sequence. There are two extra residues at the N-terminus

as a result of thrombin cleavage, and six extra residues at the C-terminus that are part of the

expression system. The final protein construct can be viewed in Figure 2.1.

1 11 21 Human PITX2 HD G S Q R R Q R T H F T S Q Q L Q Q L E A T F Q R N R 31 41 Y P D M S T R E E I A V W T N L T E A R V R V W F K 51 60 N R R A K W R K R E E F I V T D Figure 2.1: Amino acid sequence of the PITX2 homeodomain used for the structural studies. The extra residues at the N- and C-terminus are part of the expression system.

The PITX2 homeodomain was obtained by growing Escherichia coli strain BL21-Star

(Invitrogen) transformed with pGEX-1λt-pitx2HD in minimal media [0.85 g/L NaOH, 10.5 g/L

K2HPO4, 12 g/L Na2HPO4, 6 g/L KH2PO4, 1 g/L NaCl, 6 mg/L CaCl2, 13.2 mL/L concentrated

(12.2 N) HCl, nucleotides (0.5 g/L adenine, 0.65 g/L guanosine, 0.2 g/L thymine, 0.5 g/L uracil,

0.2 g/L cytosine), vitamins (1 mg/L choline chloride, 1 mg/L pyridoxal phosphate, 100 µg/L riboflavin, 50 mg/L thiamine, 50 mg/L niacin, 1 mg/L biotin), and trace elements (107 µg/L

MgCl2•6H2O, 20 µg/L FeCl2•4H2O, 0.7 µg/L CaCl2•2H2O, 0.26 µg/L H3BO3, 0.16 µg/L

43 MnCl2•4H2O, 16 ng/L CuCl2•2H2O, 2.4 µg/L Na2MoO4•2H2O, 10 µM FeCl3, 135 mM CaCl2,

50 µM ZnSO4)] containing 150 mg/L ampicillin, 4 g/L glucose, and 1 g/L NH4Cl. Half a liter of

bacterial culture was grown in baffled flasks in an incubator shaker at 37°C until saturation (A600

~ 5.0). This culture was spun down (2000g, 10 min) and resuspended in 1 L of minimal media

13 15 enriched with 10% Isogro (Isotec). When preparing labeled samples, C-glucose and NH4Cl

were used (Isotec). Expression was then induced by adding IPTG to a final concentration of

0.05-1 mM, and growing in an incubator shaker at 20°C for approximately 24 hours. The cells

were harvested by centrifugation (3200g, 30 min), and the cell paste was frozen at -80°C.

Typical wet cell paste yields were between 12g and 16g per liter of cell growth. Figure 2.2

illustrates optimization of expression conditions.

e e uble bl l l lu le so o ub s ub , nsol e G i sol l G, in T ub T IP G, G, lub IP T T o P P nsol s I I i , M M G 05mM G, T . 1m T IP , 0 , 1m IP t, M t, 0.05mM ht M h ig ight gh m e D n n m 1 e e le B 1 l l le AC ub e ub ub 8h, ub over over , overni , 8h, n lubl luble sol , overnig o o o insolubl sol C C 0°C, ti insol s insol s C, , insoluble °C, °C, 0° 0° 3 c , , C, 20 20 2 3 C, C C, C C, Standards 20°C ards 30° ndu 30° 37°C 37° -i 30° 30° 37° h, and , 37° h, 15 h, St Pre 4h, 4h, 4h, 4h 15 15 15h,

Figure 2.2: Optimizing expression conditions for the PITX2 homeodomain. (A) Narrowing optimum temperature range and length of induction. The most

induction with the most soluble protein was seen when inducing 15h at 30°C in this experiment. (B) Determining optimum temperature, length of induction, and concentration of IPTG. The most induction with the most soluble protein was ° seen when inducing for ~24h at 20 C with 0.05mM IPTG.

44 Purification of the PITX2 Homeodomain

Protein purification was carried out at 4°C to minimize protease degradation. For every

10 g of wet cells, 100 mL of ice-cold PBS buffer (144 mM NaCl, 2.7 mM KCl, 2.96 mM

Na2HPO4, 1.79 mM KH2PO4, pH 7.3) was used to resuspend the cells. Lysozyme (200 mg), 100

mM PMSF in isopropanol (667 µL), and two Complete protease inhibitor cocktail tablets

(Roche) were added for every 100 mL of resuspended cell mixture. This mixture was sonicated

and cell debris was removed by centrifugation (3200g, 25 min). The cleared lysate was applied

to a glutathione sepharose column and washed with PBS buffer, followed by thrombin cleavage

buffer (50 mM trizma hydrochloride, 150 mM NaCl, 2.5 mM CaCl2, pH 8.0). The resin was resuspended in thrombin cleavage buffer (TCB) and transferred to a 50 mL conical tube. The homeodomain was cleaved from the glutathione S-transferase fusion tag using 1 mg thrombin for

3 hours at 4°C with rotation. Nearly complete cleavage was obtained during this time as measured by SDS-PAGE (Figure 2.3). The cleaved protein was then eluted from the resin using

5 bed volumes of TCB. It was loaded onto a SP sepharose fast flow column (2 mL bed volume), washed with washing buffer (10 mM NaH2PO4, 250 mM NaCl, pH 7.0) and eluted with buffer

containing a higher salt concentration (10 mM NaH2PO4, 1 M NaCl, pH 7.0). Fractions

containing the homeodomain were identified by Abs278, pooled, and dialyzed overnight at 4°C in

10 mM NaH2PO4, pH 7.0. Protein yields were ~4.5 mg/L of cell growth. The consensus DNA

duplex (IdtDNA) was added in a 1:1 protein:DNA ratio and the complex was concentrated by

burying the dialysis bag in Spectra/Gel Absorbent (Spectrum, made of polyacrylate-polyalcohol)

or Aquacide (Calbiochem). The sequence of this DNA duplex can be seen in Figure 2.4.

Samples were concentrated to 540µL, and 60µL of D2O was added. Complete protease

inhibitors (1 tablet in 3 mL, add 1 µL), leupeptin (0.3 mM), DTT (2 mM), and Pefabloc (0.2

45 mM, Roche) were all added to inhibit proteases. Sodium azide (6 mM stock, add 1µL, final concentration 0.06 mM) was added to prevent bacterial growth in the sample.

avage

bin cle m

s ough A rd r da tan S Flow-th Resin after Thro C NMR Standards Uncut GST Sample Cut GST

mn ge olu c leava ST c G om fr gh n u tio sin ro ds d h duc ce Re -t n u sin after Thrombin B -i d ST low e tandar e In G F R S Pr

Uncut GST Cut GST

Figure 2.3: Purification of the PITX2 homeodomain and production of a pure NMR sample. (A) Cleavage of the fusion protein by thrombin. (B) Samples from different stages of the purification process. (C) Pure NMR sample of PITX2 HD.

Figure 2.4: DNA sequence of the binding site used in the structural studies. The DNA sequence consists of the TAATCC binding site surrounded by residues to confer stability to the double strand. Prepared using NUCPLOT.

46 Gel Shift Assays

Complementary oligonucleotides (Table 2.1) were annealed by heating to 95ºC and

slowly cooling to room temperature. This oligonucleotide duplex (7 pmol) was added to 5 pmol

of PITX2 homeodomain in 10 µl of H2O and incubated on ice for 30 min. These samples were

loaded onto a 15% acrylamide gel and stained with Sybr Green I (Invitrogen). The bands were

viewed under UV light and photographed.

Binding Site Sequence of Oligo

TAATCC 5'- GCT CTA ATC CCC G -3' 3'- CGA GAT TAG GGG C -5'

TAAGCC 5'- GCT GCT AAG CCG GCC -3' 3'- CGA CGA TTC GGC CGG -5'

TAGTCC 5'- GCT TTT AGT CCC AGG -3' 3'- CGA AAA TCA GGG TCC -5'

TATTCC 5'- GCT CTA TTC CCC G -3' 3'- CGA GAT AAG GGG C -5'

Table 2.1: Sequences of oligonucleotide duplexes used in gel shift assays.

Determination of KD

Measurements of KD were kindly provided by Vrushank Dave and taken by following the procedure of Dave et al [2000]. The DNA probe concentrations used in this analysis were 1, 2,

4, 8, 16, 20 and 40 nM. Quantitative gel shift assays were performed by measuring the bound

and free fractions of the probes with a PhosphorImager as previously described [Zhao et al,

2000]. The data were analyzed using Microsoft Excel (linear regression analysis) to determine

the KD value ( -1/KD = slope of the plot of Bound/Free against Bound DNA).

47 NMR Structure Determination

Determining the structure of a biomolecule or complex by NMR spectroscopy involves four basic steps: 1) identification of appropriate sample conditions (as discussed above); 2) sequence-specific assignment of the 1H, 13C, and 15N resonances; 3) identification of as many

geometrical constraints as possible from the NMR data, such as internuclear distances from

nuclear Overhauser enhancement (NOE) measurements, and dihedral angles from measurement

of scalar couplings; and 4) the calculation and refinement of the three-dimensional structure

from the structural constraints.

All NMR experiments were carried out on Varian Inova 600 and 800 MHz spectrometers.

The sample temperature was set to 295 K. Spectra were referenced to an external DSS standard.

Protein assignments: Protein 1H, 13C, and 15N resonance assignments were obtained primarily

from heteronuclear-edited NMR spectra, using conventional triple resonance 1H- {13C, 15N}

NMR probes. The pulse programming codes were written in-house. Approximately 92% of

α assignable atoms were assigned. Sequence-specific assignment of the backbone HN, N, C’, C

β and C resonances were obtained from 3D HNCO, HN(CO)CA, HNCA, CBCA(CO)NH and

HNCACB [Grzesiek & Bax, 1992a; Ikura et al, 1990; Kay et al, 1994; Muhandiram & Kay,

1994; Grzesiek & Bax, 1992b; Wittekind & Mueller, 1993; Sattler et al, 1999] spectra.

Assignment of the aliphatic side chain resonances was accomplished using a combination of 3D

15N-edited-TOCSY-HSQC [Zhang et al, 1994; Kay et al, 1992], H(CCO)NH-TOCSY, as well as

HBHA(CBCACO)NH spectra [Grzesiek & Bax, 1992b]. Aromatic 1H and 13C resonances were obtained from a combination of 2D HMQC, 2D HMQC-TOCSY, 3D HMQC-TOCSY and 2D

NOESY-HMQC spectra [Marion et al, 1989; Zerbe et al, 1996]. A HNHA experiment was performed to assign Hα and to obtain coupling constants [Vuister & Bax, 1993].

48 DNA assignments: Resonance assignments for unlabeled DNA bound to 13C, 15N-labeled protein were obtained using standard assignment methods for DNA [Wuthrich, 1986]. The data was

13 15 obtained with doubly C/ N-filtered NOESY and ω2-filtered TOCSY experiments [Otting &

Wuthrich, 1990; Breeze, 2000].

Structural constraints: The main source of structural information was the proton-proton distance constraints identified from NOESY spectra. Three-dimensional 15N-NOESY-HSQC

experiments [Talluri & Wagner, 1996] using 50-125 ms mixing times were used for

intramolecular restraints in the homeodomains, along with a 13C-NOESY-HSQC experiment

using a 150 ms mixing time [Sattler et al, 1999].

Intramolecular distance restraints for the DNA were obtained from an r-6 scaling of cross-

peak volumes in the NOESY spectra. Upper and lower bounds were calibrated on the cytosine

intraresidue H5-H6 NOE and set to +/- 15% of the calculated distance for base and H1’ protons.

Restraint boundaries to other sugar protons were widened an additional 10% to account for

effects of spin diffusion. Restraints from the longer mixing time 125 ms experiment were

assigned a lower bound of 3 Å and an upper bound of 5 Å.

13 Intermolecular restraints between the protein and DNA were obtained from 2D C(ω1)-

13 15 edited, [ C, N](ω2)-filtered NOESY spectra [Breeze, 2000; Stuart et al, 1999; Lee et al, 1994].

The NOEs were assigned manually and only unambiguously assigned peaks were used as

restraints in the docking calculation. Weak peaks were assigned an upper distance limit of 6.0 Å,

while medium peaks had an upper distance limit of 5.0 Å, and stronger peaks an upper distance

limit of 4.0 Å.

Data processing and analysis: Raw NMR data was processed using NMRPipe [Delaglio et al,

1995]. Linear prediction was used in the t1 dimension for 2D spectra and in the t1 and t2

49 dimensions for 3D spectra, using squared sinebell window functions for apodization and zero

filling in all dimensions. Spectra were viewed and analyzed using the Sparky graphical interface

[Goddard & Kneller]. This program was used to pick peaks and integrate them using a

Lorentzian function.

Relaxation Rates: R2 rate constants were determined on a Varian 600 MHz spectrometer using a

standard Carr-Purcell-Meiboom-Gill (CPMG) spin-echo experiment [Farrow et al, 1994; Skelton

et al, 1993]. The data was recorded at 21.8°C. The R2 relaxation rate constant was extracted from this data by following the procedure “Extracting R1 and R2 Relaxation Rate Constants from

Varian NMR Data” that was written for our lab by Dr. Mark Rance.

Structure calculation: Referenced chemical shift assignments and peak intensities from Sparky

were entered into the structure calculation program CYANA [Guntert et al, 1997; Herrmann et

al, 2002]. CYANA consists of an automated NOE assignment program, CANDID, which

automatically assigns all NOESY cross peaks, taking into account nearness of chemical shift,

network anchoring, ambiguous distance constraints, and constraint combination [Herrmann et al,

2002]. The structure is then calculated using the DYANA algorithm, which calculates structures using torsion angle dynamics [Guntert et al, 1997]. Calibration constants for peak intensities

versus upper distance limits were determined automatically by CYANA. Peak lists from 50 ms,

80 ms, and 125 ms 15N-NOESY-HSQC, and 150 ms 13C-NOESY-HSQC experiments were

entered into CYANA, along with a list of the chemical shift assignments (pitx2.prot), all

prepared using SPARKY. Chemical shift tolerances of 0.025 were used for the protons, and 0.4

for the 15N and 13C chemical shifts. The script prepared to enter all of this informaton is called

“pitx2.cya”:

50

pitx2.cya

# Combined automatic assignment and structure calculation with CANDID

peaks := 50ms2,80ms2,125ms2,13Cms3 # names of peak lists prot := pitx2 # names of proton lists tolerance := 0.025, 0.025, 0.4 # chemical shift tolerances (ppm); # order: 1H(a), 1H(b), 13C/15N(a), 13C/15N(b) # cal := 9.0E7, 3.0E8, 1.0E8 # calibration constants (will be determined # automatically by CANDID, if commented out)

subroutine ANNEAL # subroutine for structure calculation var n ./init # re-initialize read upl cycle$cycle.upl # NOE upper distance limits from CANDID read cco pitx2c.cco append # coupling constants read aco set.aco append #CA shift-derived dihedral angles hbond O 53 HN 56 # hydrogen bond hbond O 54 HN 57 hbond O 55 HN 58 hbond O 42 HN 45 hbond O 43 HN 46 hbond O 44 HN 47 hbond O 45 HN 48 hbond O 46 HN 49 hbond O 47 HN 50 hbond O 48 HN 51 hbond O 49 HN 52 hbond O 50 HN 53 hbond O 51 HN 54 hbond O 52 HN 55 hbond O 10 HN 13 hbond O 11 HN 14 hbond O 12 HN 15 hbond O 13 HN 16 hbond O 14 HN 17 hbond O 15 HN 18 hbond O 16 HN 19 hbond O 17 HN 20 hbond O 28 HN 31

51 hbond O 29 HN 32 hbond O 30 HN 33 hbond O 31 HN 34 hbond O 32 HN 35 hbond O 33 HN 36 hbond O 34 HN 37 seed=5671 # random number generator seed n=30 # number of start conformers if (def('nproc')) n=nint(real(n)/nproc)*nproc # adapt to a multiple of number of CPUs distance stat calc_all structures=n steps=10000 # structure calculation overview cycle$cycle structures=20 cor # write overview file and coordinates end

candid peaks=$peaks prot=$prot calculation=ANNEAL

Simulated annealing was performed using the default parameters. The command “cyana” can be typed in a terminal to start the program, and “pitx2” can then be typed to run the “pitx2.cya” script. The 20 lowest energy conformers were retained after structure calculation and used for docking to DNA.

Docking of the protein to the DNA: The protein was docked to the DNA using the AMBER all- atom force field with the Generalized Born solvation model [Case et al, 1996; Pearlman et al,

1995]. The 20 CYANA structures with the lowest values of target function were docked onto canonical B-form DNA. This was chosen as the starting DNA structure, since NOESY spectra for the complex indicated the DNA to be close to B-form. Starting structures of the complex were generated by systematically placing PITX2 in varying orientations relative to the DNA, with helix 3 approximately 50 Å from the DNA, in MOLMOL. For each of the 20 lowest- energy structures, 5 different orientations of the protein relative to the DNA were selected, yielding 100 starting conformers. The first orientation was with the third helix directly facing

52 the major groove of the DNA. The other 4 orientations were generated by rotating the first

orientation by 45 and 90 degrees in either direction.

The protein was docked onto the DNA by a 20 ps simulated annealing calculation (T =

600 K, time-step = 1 fs) using an altered version of a procedure described previously for docking

of TFIIIA to DNA [Wuttke et al, 1997]. DNA base-pairing was maintained by incorporating

Watson-Crick hydrogen-bonding restraints. These Watson-Crick DNA restraints were implemented as lower and upper bound restraints on base-paired heteroatom-heteroatom (2.7 to

3.1 Å) and heteroatom-proton distances (1.67 to 2.07 Å) (WatCrick.dist), and had a final force constant of 50 kcal mol-1Å-2. The intramolecular protein (pitx2.dist) and DNA restraints

(dnaDist.dist, dnaNOE.dist) had final force constants of 20 kcal mol-1Å-2. Protein (helix.dist) and DNA angle restraints (dnaAng.dist) had a final force constant of 32 kcal mol-1Å-2. Protein-

DNA intermolecular restraints (docking.dist) had a final force constant of 32 kcal mol-1Å-2.

Protein restraints were applied to prevent the protein conformation from being altered too much from the structure calculated by CYANA. DNA restraints were applied to prevent fraying of the

DNA, and to maintain the structure close to B-form.

The format of all distance restraint files entered into the AMBER docking calculation consists of the residue number of the first atom, the residue name, the atom name, the residue number of the second atom, the residue name, the atom name, and finally the upper distance limit. An example of a line from one of the distance restraint files is:

43 THR HN 46 ARG+ HB2 4.14

The format of all angle restraint files entered into the AMBER docking calculation consists of the residue number, the residue name, the angle name, followed by the lower and upper bounds on this angle. An example of a line from one of the angle restraint files is:

53 69 GUA GAMMA -20.0 140.0

To start the docking calculation, a parameter file has to be created. This is accomplished

by loading the PDB file into tleap as follows:

% tleap –f leaprc.ff99 > complex = loadpdb DNA+pitx2.pdb > saveamberparm complex DNA+pitx2.parm7 DNA+pitx2.x > quit

The distance and angle restraint files were then converted to the format that AMBER

requires by running makeDIST_RST and makeANG_RST commands as shown in the following

two examples:

% makeDIST_RST –upb docking.dist –pdb DNA+pitx2.pdb –rst RST.dist

% makeANG_RST –pdb DNA+pitx2.pdb –con dnaAng.dist –lib $AMBERHOME/src/nmr_aux/prepare_input/tordef.lib > RST.dnaang

The force constants were then set to the values described above by editing each RST file. The values rk2 and rk3 on line 5 of each file were set to the chosen force constants. All of the

converted restraints were then combined into one restraint file as follows:

% cat RST.pitx2 RST.dnadist RST.dnaNOE RST.wc RST.dist RST.ang RST.dnaang > RST

Now that the distance restraints and parameter files are prepared, one step of

minimization was performed using the file “RSTmin.in”:

RSTmin.in

energy minimization for Pitx2 with restraints

&cntrl

imin=1, maxcyc=100, ncyc=50, ntpr=20, ntb=0, /

&ewald

54 eedmeth=5,

/

This calculation takes about 1 minute to run, and can be executed using the following command:

% sander –O –i RSTmin.in –o RSTmin.out –c DNA+pitx2.x –p DNA+pitx2.parm7 –r RST.min.x

After this minimization, the calculation can be run to dock the protein to the DNA using the restraint RST file prepared above. The temperature was increased from 0 to 600 K over the first 4 ps, held at 600 K for 2 ps, then slowly cooled to 0 K over 14 ps. The weights of the force constants were linearly increased from 0.1 to 1 during the course of the calculation. The file is

“RSTanneal.in” and is executed in the same way as the minimization step, using the appropriate input file names:

RSTanneal.in

simulated annealing protocol, 20 ps

&cntrl nstlim=20000, pencut=-0.001, nmropt=1, ntpr=200, ntt=1, ntwx=200, cut=12.0, ntb=0, vlimit=10, igb=1, saltcon=0.2, offset=0.13, / &ewald / # #Simple simulated annealing algorithm: # #from steps 0 to 4000: heat the system to 600K #from steps 4001-6000: hold at 600K #from steps 6001-20000: final cooling # &wt type='TEMP0', istep1=0,istep2=4000,value1=0., value2=600., / &wt type='TEMP0', istep1=4001, istep2=6000, value1=600.0, value2=600.0, / &wt type='TEMP0', istep1=6001, istep2=20000, value1=600.0, value2=0.0, /

55

&wt type='TAUTP', istep1=0,istep2=20000,value1=0.1, value2=20.0, /

&wt type='REST', istep1=0,istep2=20000,value1=0.1, value2=1.0, /

&wt type='END' / LISTOUT=POUT DISANG=RST

This file also outlines the changes in temperature that are introduced during the calculation, as described above. This calculation takes approximately 13 hours to complete (all calculations performed with a 2.4 GHz Intel Xeon processor). An explanation of what the values in the input files represent can be found at http://amber.scripps.edu/tutorial/dna_NMR/nmr_dna_tutorial.htm.

Structure refinement and analysis: The 20 structures with the lowest total energy values were subjected to restrained energy minimization by the SANDER module of the AMBER 7.0 package [Case et al, 1996; Pearlman et al, 1995]. Each conformer was subjected to a conjugate- gradient energy minimization calculation with solvent included. New parameter files were created, adding water to a radius of 8.0 Å with the “solvatebox” command in tleap, and neutralizng with sodium ions:

% tleap –f leaprc.ff99 > complex = loadpdb DNA+pitx2_docked.pdb > solvateBox complex WATBOX216 8.0 > charge complex > addIons2 complex Na+ 0 > saveamberparm complex DNA+pitx2_docked.parm7 DNA+pitx2_docked.x > quit

Two minimization steps were carried out, using the files called “min.in” and “min_all.in”:

56 min.in Minimization with Cartesian restraints for the solute &cntrl imin=1, maxcyc=200, ntpr=5, ntr=1, &end END END

min_all.in Minimization of the entire molecular system &cntrl imin=1, maxcyc=200, ntpr=5, &end

The evaluation of the structure, i.e., analysis of geometry, stereochemistry, and energy distributions in the models, was performed using the program PROCHECK [Laskowski et al,

1993]. Restraint violations were determined and analyzed using the program AQUA [Laskowski

et al, 1996]. Graphics were prepared using MOLMOL [Koradi et al, 1996].

Molecular Dynamics Calculations

Preparation of Mutant Protein Files

MD simulations were begun using the NMR solution structure of the PITX2

homeodomain bound to its DNA site [Protein Data Bank (PDB) code: 1yz8]. The average

structure of the 20 conformers was used as the starting point. Mutations of the protein were made within the program MOLMOL [Koradi et al, 1996] by replacing the wild-type residue with the mutant one and saving the molecule as a new PDB file to use in MD experiments. The

“SelectRes” command is used in MOLMOL to select the residue to be mutated, and the

“ChangeRes” command is used to switch it to the mutant.

57 Molecular Dynamics

The SANDER and LEaP modules of the AMBER7 suite of programs and the 1999

version of the AMBER force field were used in setting up and performing the simulations [Case

et al, 1996; Pearlman et al, 1995; Wang et al, 2000]. All of the complexes were put through the same protocol, which was loosely based on that of Gutmanas & Billeter [2004]. The complexes

were initially minimized in-vacuo with 100 steps of steepest descent, followed by 900 steps of

the conjugate gradient method, with a 30 Å cutoff on nonbonded interactions. This calculation

takes about 9 minutes to run, and the input file is called “min.in”, with the execution script

illustrated in the commented out (#) section (this will be shown for each of the MD scripts

presented below):

min.in

pitx2: initial minimisation prior to MD &cntrl imin = 1, maxcyc = 1000, ncyc = 100, ntb = 0, igb = 0, cut = 30 /

# sander -O -i min.in -o pitx2_min.out -c pitx2_vac.crd -p pitx2_vac.parm7 -r pitx2_vac_init_min.rst

# ambpdb -p pitx2_vac.parm7 < pitx2_vac_init_min.rst > pitx2_vac_init_min.pdb

These minimized structures were then solvated in a box of water, with a minimal distance of 8 Å

from the solute to the border (performed in tleap, as described above for the docking

calculation). A total of 4558 water molecules were added. Seventeen sodium ions were added to

make the system’s charge neutral (“addions” command in tleap). The simulations had a total of

58 15703 atoms. The solvated complexes were then put through a step of energy minimization.

This step consisted of 1000 steps of steepest descent, with 50 kcal/(mol Å) restraints on all heavy atoms of the solute, and a 10 Å cutoff for nonbonded interactions. This calculation takes about

13 minutes to run, and the input file is “min2.in”:

min2.in

Minimization with Cartesian restraints for the solute &cntrl imin = 1, maxcyc = 1000, ncyc = 1000, ntb = 1, igb = 0, ntr = 1, cut = 10 &end Group input for restrained atoms 50.0 RES 1 94 END END

# sander -O -i min2.in -o pitx2_min2.out -c pitx2.crd -p pitx2.parm7 -r pitx2_min2.rst - ref pitx2.crd

A 25 ps MD simulation was then performed, with position restraints of 50 kcal/(mol Å) and a 10

Å cutoff, as above. A constant pressure of 1 atm was used. The SHAKE algorithm, with a timestep of 2 fs, was used. The system was heated from 100 K to 300 K during the first 2 ps, and kept at a constant temperature afterwards. This calculation takes about 8 hours to run, and the input file is “eq1.in”:

eq1.in

Pitx2: 25ps MD with res &cntrl imin = 0, irest = 0,

59 ntx = 1, ntb = 2, ntp = 1, cut = 10, ntr = 1, ntc = 2, ntf = 2, tempi = 100.0, temp0 = 300.0, tautp = 2, ntt = 1, gamma_ln = 1, nstlim = 12500, dt = 0.002 ntpr = 100, ntwx = 100, ntwr = 1000 / Keep solute fixed 50.0 RES 1 94 END END

# sander -O -i eq1.in -p pitx2.parm7 -c pitx2_min2.rst -r pitx2_eq1.rst -x pitx2_eq1.crd -o pitx2_eq1.out -ref pitx2_min2.rst

Another minimization step was performed, as above, with the position restraints relaxed to 25 kcal/(mol Å). This calculation takes about 40 minutes to run, and the input file is “eq2.in”:

eq2.in

Minimization with Cartesian restraints for the solute &cntrl imin = 1, maxcyc = 1000, ncyc = 1000, ntb = 1, igb = 0, ntr = 1, cut = 10 &end Group input for restrained atoms 25.0 RES 1 94 END END

60

# sander -O -i eq2.in -o pitx2_eq2.out -c pitx2_eq1.rst -p pitx2.parm7 -r pitx2_eq2.rst - ref pitx2_eq1.rst

Next, a 3 ps MD simulation was performed. The position restraints were kept at 25 kcal/(mol

Å), and Particle Mesh Ewald (PME) summation was introduced. This calculation takes about 30 minutes to run, and the input file is called “eq3.in”:

eq3.in

Pitx2: 3ps MD with res &cntrl imin = 0, irest = 0, ntx = 1, ntb = 2, ntp = 1, cut = 10, ntr = 1, ntc = 2, ntf = 2, tempi = 100.0, temp0 = 300.0, tautp = 2, ntt = 1, gamma_ln = 1.0, nstlim = 1500, dt = 0.002 ntpr = 100, ntwx = 100, ntwr = 1000 / &ewald / Keep solute fixed 25.0 RES 1 94 END END

# sander -O -i eq3.in -p pitx2.parm7 -c pitx2_eq2.rst -r pitx2_eq3.rst -x pitx2_eq3.crd -o pitx2_eq3.out -ref pitx2_eq2.rst

61 Then, 5 more minimization steps were performed, relaxing the position restraints from 20 to 0 kcal/(mol Å). These calculations take about 30 minutes each to run, and the input files are called

“eq4.in”, “eq5.in”, “eq6.in”, “eq7.in”, and “eq8.in”:

eq4.in

Minimization with Cartesian restraints for the solute &cntrl imin = 1, maxcyc = 1000, ncyc = 1000, ntb = 1, igb = 0, ntr = 1, cut = 10 &end Group input for restrained atoms 20.0 RES 1 94 END END

# sander -O -i eq4.in -o pitx2_eq4.out -c pitx2_eq3.rst -p pitx2.parm7 -r pitx2_eq4.rst - ref pitx2_eq3.rst

eq5.in

Minimization with Cartesian restraints for the solute &cntrl imin = 1, maxcyc = 1000, ncyc = 1000, ntb = 1, igb = 0, ntr = 1, cut = 10 &end Group input for restrained atoms 15.0 RES 1 94 END END

62 # sander -O -i eq5.in -o pitx2_eq5.out -c pitx2_eq4.rst -p pitx2.parm7 -r pitx2_eq5.rst - ref pitx2_eq4.rst eq6.in

Minimization with Cartesian restraints for the solute &cntrl imin = 1, maxcyc = 1000, ncyc = 1000, ntb = 1, igb = 0, ntr = 1, cut = 10 &end Group input for restrained atoms 10.0 RES 1 94 END END

# sander -O -i eq6.in -o pitx2_eq6.out -c pitx2_eq5.rst -p pitx2.parm7 -r pitx2_eq6.rst - ref pitx2_eq5.rst eq7.in

Minimization with Cartesian restraints for the solute &cntrl imin = 1, maxcyc = 1000, ncyc = 1000, ntb = 1, igb = 0, ntr = 1, cut = 10 &end Group input for restrained atoms 5.0 RES 1 94 END END

# sander -O -i eq7.in -o pitx2_eq7.out -c pitx2_eq6.rst -p pitx2.parm7 -r pitx2_eq7.rst - ref pitx2_eq6.rst

63 eq8.in

Minimization with Cartesian restraints for the solute &cntrl imin = 1, maxcyc = 1000, ncyc = 1000, ntb = 1, igb = 0, cut = 10 &end

# sander -O -i eq8.in -o pitx2_eq8.out -c pitx2_eq7.rst -p pitx2.parm7 -r pitx2_eq8.rst

A 2 ps MD run was then performed at constant pressure, heating the system from 100 K to 300 K during the first 2 ps, and keeping the temperature constant afterwards. This calculation takes

about 45 minutes to run, and the input file is called “eq9.in”:

eq9.in

Pitx2: 2ps MD &cntrl imin = 0, irest = 0, ntx = 1, ntb = 2, ntp = 1, cut = 10, ntc = 2, ntf = 2, tempi = 100.0, temp0 = 300.0, tautp = 2, ntt = 1, gamma_ln = 1.0, nstlim = 1000, dt = 0.002 ntpr = 100, ntwx = 100, ntwr = 1000 / END

# sander -O -i eq9.in -p pitx2.parm7 -c pitx2_eq8.rst -r pitx2_eq9.rst -x pitx2_eq9.crd -o pitx2_eq9.out

64 A final MD simulation was run for 100 ps at constant pressure, and a constant temperature of

300 K. This calculation runs overnight, and the input file is called “eq10.in”:

eq10.in

Pitx2: 100ps MD &cntrl imin = 0, irest = 0, ntx = 1, ntb = 2, ntp = 1, cut = 10, ntc = 2, ntf = 2, temp0 = 300.0, tautp = 2, ntt = 1, gamma_ln = 1.0, nstlim = 50000, dt = 0.002 ntpr = 100, ntwx = 100, ntwr = 1000 &end END END

# sander -O -i eq10.in -p pitx2.parm7 -c pitx2_eq9.rst -r pitx2_eq10.rst -x pitx2_eq10.crd -o pitx2_eq10.out -ref pitx2_eq9.rst

The trajectory length for the production run was 2 ns for each complex. Periodic boundary conditions were used, at constant volume and a constant temperature of 300 K. The SHAKE algorithm was used, with a timestep of 2 fs. A cutoff of 10 Å was used for long-range interactions, along with PME summation. This calculation takes about 11 days to run, and is run in separate steps of 200 ps each. The input file is called “pitx2_prod.in”:

pitx2_prod.in

Constant pressure constant temperature production run &cntrl nstlim=100000, dt=0.002, ntx=5, irest=1, ntpr=500, ntwr=5000, ntwx=5000,

65 temp0=300.0, ntt=1, tautp=2.0,

ntb=1, ntp=0,

ntc=2, ntf=2, cut=10,

nrespa=2, / &ewald / END END

To run the entire 2 ns in separate 200 ps steps, the script “run_pitx2_2000ps.x” is used:

run_pitx2_2000ps.x

#!/bin/csh set AMBERHOME="/usr/local/amber7" set MDSTARTJOB=1 set MDENDJOB=10 set MDCURRENTJOB=$MDSTARTJOB set MDINPUT=0

echo -n "Starting Script at: " date echo ""

while ( $MDCURRENTJOB <= $MDENDJOB ) echo -n "Job $MDCURRENTJOB started at: " date @ MDINPUT = $MDCURRENTJOB - 1 $AMBERHOME/exe/sander -O -i pitx2_prod.in \ -o pitx2_prod$MDCURRENTJOB.out \ -p pitx2.parm7 \ -c pitx2_prod$MDINPUT.rst \ -r pitx2_prod$MDCURRENTJOB.rst \ -x pitx2_prod$MDCURRENTJOB.crd \ -ref pitx2_prod$MDINPUT.rst gzip -9 -v pitx2_prod$MDCURRENTJOB.crd echo -n "Job $MDCURRENTJOB finished at: " date @ MDCURRENTJOB = $MDCURRENTJOB + 1 end echo "ALL DONE"

66

To make this script executable, the following script was typed in the terminal window:

% chmod +x run_pitx2_2000ps.x

To run this script in the background, the following script was typed in the terminal window:

% nohup ./run_pitx2_2000ps.x >& run.log &

Analysis of Production Run Data

PDB files were prepared of the average structure during the MD run for each complex, and for individual steps throughout the trajectory (“ambpdb” command). These were compared

and analyzed in MOLMOL [Koradi et al, 1996]. The average structures during the trajectories

of the MD simulations were calculated using the PTRAJ module of the AMBER package [Case

et al, 1996; Pearlman et al, 1995]. The input file for this calculation is called “ptraj.in”:

ptraj.in

trajin pitx2_prod1.crd.gz trajin pitx2_prod2.crd.gz trajin pitx2_prod3.crd.gz trajin pitx2_prod4.crd.gz trajin pitx2_prod5.crd.gz trajin pitx2_prod6.crd.gz trajin pitx2_prod7.crd.gz trajin pitx2_prod8.crd.gz trajin pitx2_prod9.crd.gz trajin pitx2_prod10.crd.gz trajout pitx2_2000ps.crd rms first out pitx2_2000ps.rmsfit @P,O3',O5',C3',C4',C5',CA time 10 center :1-94 image familiar solvent byres :WAT closest 1050 :1-2012 first average avg.pdb pdb

# ptraj pitx2.parm7 < ptraj.in

67 For calculation of the average structure, only the 1050 water molecules closest to the complex

were retained (“closest” in ptraj.in). For these structures, the program NUCPLOT was used to

get a full list of protein-DNA and water-mediated contacts [Luscombe et al, 1997]. In addition,

the PTRAJ and CARNAL modules in AMBER were used to characterize hydrogen bonds and

water contacts during the course of the trajectories [Case et al, 1996; Pearlman et al, 1995]. The input file for the PTRAJ calculation is called “hbond.in”:

hbond.in

trajin pitx2_2000ps.crd

# ptraj pitx2.parm7 < hbond.in > hbond44.out

# specify the electron pair DONOR # donor mask :46@O

# specify the ACCEPTOR(s)

acceptor WAT O H1 acceptor WAT O H2

# calculate the waters in the first and second solvation shells # (0-3.5A and 3.5-5.0A) and output to watershell.list

watershell :46 watershell44.list

# do the Hbond search/output

hbond solventacceptor O H1 solventacceptor O H2 solventneighbor 2 series hbond hbond distance 2.8 angle 35 donor acceptor neighbor 2 series \

The input file for the CARNAL calculation is called “carnal_hbond.in” and is edited for each

residue analyzed:

68 carnal_hbond.in

# HBOND analysis of ligand- interaction FILES_IN PARM p1 pitx2.parm7; STREAM s1 pitx2_2000ps.crd; FILES_OUT HBOND h2 pitx2_50_dna TABLE LIST; DECLARE GROUP g1 (RES 52); GROUP g2 (RES 85,86); OUTPUT HBOND h2 DONOR g2 ACCEPTOR g1 STATS; END

RMSD values of the trajectory versus the initial structure were calculated (“rms first out” command in ptraj.in). A Perl script was used to pull out energy versus time data (% process_mdout.perl *.out). To calculate the distance between the K50 NZ and the O6 and N7 atoms of the guanines G83 and G84, the PTRAJ module was used, with the “distance” command using the input file “ptraj_calc_K50_distance.in”:

ptraj_calc_K50_distance.in

trajin pitx2_prod1.crd.gz trajin pitx2_prod2.crd.gz trajin pitx2_prod3.crd.gz trajin pitx2_prod4.crd.gz trajin pitx2_prod5.crd.gz trajin pitx2_prod6.crd.gz trajin pitx2_prod7.crd.gz trajin pitx2_prod8.crd.gz trajin pitx2_prod9.crd.gz trajin pitx2_prod10.crd.gz

distance K50to85 :52@NZ :85@N7 out K50to85dist time 10 distance K50to86 :52@NZ :86@N7 out K50to86dist time 10 distance K50to85O :52@NZ :85@O6 out K50to85distO time 10 distance K50to86O :52@NZ :86@O6 out K50to86distO time 10

# ptraj pitx2.parm7 < ptraj_calc_K50_distance.in

# xmgrace K50to85dist K50to86dist

69

To calculate the side chain angles for the K50 side chain, the “dihedral” command was used in the PTRAJ module with the input file called “ptraj_calc_K50_angles.in”:

ptraj_calc_K50_angles.in

trajin pitx2_prod1.crd.gz trajin pitx2_prod2.crd.gz trajin pitx2_prod3.crd.gz trajin pitx2_prod4.crd.gz trajin pitx2_prod5.crd.gz trajin pitx2_prod6.crd.gz trajin pitx2_prod7.crd.gz trajin pitx2_prod8.crd.gz trajin pitx2_prod9.crd.gz trajin pitx2_prod10.crd.gz

dihedral chi_1 :52@C :52@CA :52@CB :52@CG out chi_1 dihedral chi_3 :52@CB :52@CG :52@CD :52@CE out chi_3 dihedral chi_4 :52@CG :52@CD :52@CE :52@NZ out chi_4 dihedral chi_2 :52@CA :52@CB :52@CG :52@CD out chi_2

# ptraj pitx2.parm7 < ptraj_calc_K50_angles.in

# xmgrace chi_1 chi_3 chi_4 chi_2

70 CHAPTER 3: Solution Structure of the K50 Class Homeodomain PITX2 Bound to DNA and Implications for Mutations that Cause Rieger Syndrome

Functional Analysis of Purified PITX2 Homeodomain

After expressing and purifying the PITX2 homeodomain as described in Chapter 2, it was important to verify that the protein being produced was functioning properly in that it binds the consensus TAATCC binding site, and some of the nonconsensus DNA sequences. Gel shift assays were performed, one of which can be seen in Figure 3.1. As can be seen, there is a shifted band in the lanes with protein where the protein binds the DNA, which is not seen in the lanes without protein. It was also believed that the protein was binding DNA because at high concentrations, the free protein irreversibly aggregates and precipitates out of solution. But while DNA is present, the protein remains soluble at 1 mM concentrations. A sample of the

PITX2 homeodomain was also sent to the University of Michigan for N-terminal analysis, which verified that the first four residues were correct and secondary thrombin cleavage was not occurring. These results indicate that the PITX2 homeodomain being expressed and purified

here is functioning properly, and structure determination can proceed.

71 1- No protein TAATCC 2- With protein TAATCC 3- No protein TAAGCC 4- With protein TAAGCC 5- No protein neg control 6- With protein neg control 7- No protein TAGTCC 8- With protein TAGTCC 9- No protein 1 2 3 4 5 6 7 8 9 10 TATTCC 10- With protein TATTCC

Figure 3.1: Gel shift assays of the PITX2 homeodomain. Gel shift assays were performed as described in Chapter 2 to examine the DNA-binding activity of the PITX2 homeodomain. The shifted bands in the lanes with protein indicate the protein is functioning properly in that it binds concensus and nonconsensus DNA binding sites as determined previously [Dave et al, 2000; Yuan et al, 1999;

Hjalt et al, 2001; Espinoza et al, 2002].

KD of the PITX2 Homeodomain Bound to its DNA Consensus Site

The KD of the PITX2 homeodomain alone bound to its consensus site had not been determined previously. This value was determined by Vrushank Dave, using protein we prepared. The Engrailed Q50K mutant binds to the consensus TAATCC site with an unusually high affinity, near the picomolar range [Ades & Sauer, 1994]. There is no evidence that natural

K50 class homeodomains have such a high affinity for DNA [Amendt et al, 1998; Ma et al,

1996]. The KD for the Q50K mutant of the Fushi tarazu homeodomain was found to be 0.63nM

[Percival-Smith et al, 1990], which is a much lower affinity than the EnQ50K mutant. The KD for the PITX2 homeodomain alone was found to be 2.6 +/- 0.38 nM (Figure 3.2), which is

72 comparable to the Fushi tarazu mutant, and also a much lower affinity than the Engrailed Q50K mutant.

AB

0. 5

0. 4 Bound

0. 3 Free Probe

0. 2

123 4 6 7

5 DNA Bound/Free 0. 1

0 00.511.52

DNA Bound (nM)

Figure 3.2: KD of the PITX2 homeodomain (Prepared by Vrushank Dave). Gel shift assay for Scatchard analysis to determine the affinity of recombinant Pitx2 homeodomain for the bicoid consensus DNA element. (A) The DNA probe concentrations used in this analysis were 1, 2, 4, 8, 16, 20 and 40 nM for lanes 1 to 7, respectively at a fixed protein concentration. (B) The KD value obtained was 2.6 ± 0.38 nM (mean ± standard deviation) from three independent DNA binding curves obtained from quantitative gel shift assays by measuring the bound and free fractions of the probes with a PhosphorImager. The data were analyzed using Microsoft Excel (linear regression analysis) to determine the KD value ( -1/KD = slope of the plot of Bound/Free against Bound DNA).

Analysis of Protein Folding by HSQC

The first NMR experiment that was performed was on a 15N-labeled sample and is called a Heteronuclear Single Quantum Correlation (HSQC) experiment. This 2D experiment is used to see the chemical shifts of N-H amide bonds in proteins, and can be used to analyze the folding of the protein. Chemical shifts of each amino acid are dependent on their environment, which has to do with the conformation of the protein. When the protein is well-folded, there is a good chemical shift dispersion. When the protein is not well-folded, the peaks tend to be clumped

73 together. The HSQC for the PITX2 homeodomain is shown in Figure 3.3. The good chemical shift dispersion indicates that the protein is well-folded under these experimental conditions when bound to DNA. The signal-to-noise ratio is appropriate for a complex of this size.

Approximately 59-60 resonances can be fully or partially resolved for the backbone amide groups. Eight of the nine Asn and Gln side chains are visible. Eleven out of the twelve arginine side chains are visible outside of this spectral view.

Figure 3.3: 15N HSQC for the PITX2 homeodomain bound to its concensus DNA site.

74

Resonance Assignments

Assignment of resonances of the atoms in the PITX2 homeodomain and its DNA consensus binding site were performed using experiments described in Chapter 2. These assignments are listed in Table 3.1. A 15N-HSQC labeled with the backbone assignments is presented in Figure 3.4.

Figure 3.4: 15N-HSQC labeled with backbone and side-chain assignments obtained through triple-resonance experiments. All red peaks, and arginine NH-QH peaks are folded in, with actual resonances listed in the following tables.

75 Residue N HN CA CB CO HA HB CG HG CD HD G-1 119.3 5.783 47.76 177.5 3.815 S-2 114.7 8.171 58.71 63.9 174.4 4.508 3.915 Q1 122.4 8.551 55.7 29.6 175.6 4.435 2.116,2.009 33.84 2.388 R2 123.8 8.462 55.85 31.57 176.3 4.464 1.827 27.23 1.720 43.7 3.293 R3 125.2 9.065 56.38 31.31 175.7 4.260 1.871,1.770 26.92 1.711 44.48 3.304 Q4 122.2 8.465 55.54 29.58 175.2 4.259 1.989 34.06 2.369 R5 124.2 8.622 56.45 32.84 175.2 4.272 1.593 34.14 1.694 2.366 T6 125.6 8.03 64.96 69.15 172.6 3.826 3.502 20.68 0.460 H7 125.7 8.669 53.64 29.9 174.7 4.778 3.180,3.089 Table 3.1: List F8 126.3 8.681 58.62 40.53 176.7 4.758 2.762,3.127 of chemical T9 113.7 9.151 60.39 70.97 175.6 4.530 4.823 21.8 1.393 S10 116.7 9.185 62.14 62.31 177.2 4.177 3.974 shifts for the Q11 121 8.409 59.61 27.97 178.8 4.005 2.147,1.981 33.83 2.453 PITX2 Q12 119.3 7.789 59.48 27.96 177.7 3.831 1.637 35 2.670 L13 117.1 8.293 57.93 41.6 179 3.609 1.799,1.524 26.97 1.660 25.38 0.940,0.799 homeodomain Q14 118.1 8.149 59.26 28.52 179.1 3.982 2.180 34.15 2.497,2.362 bound to DNA. Q15 119.8 7.57 58.49 29.53 179.8 4.174 2.006,1.865 35.46 2.352,2.211 L16 123.5 8.302 58.53 37.81 178.1 3.506 0.4867,-1.204 26.17 1.147 22.86 -0.811,0.4104 E17 119.2 8.247 59.03 29.16 179.1 4.358 1.951,2.177 34.51 2.350,2.517 A18 120.6 7.944 55.19 18 181.6 4.175 1.540 T19 113.5 7.932 66.99 68.32 176.1 4.088 4.570 21.83 1.515 F20 125.5 8.829 61.76 39.07 175.6 4.367 3.354,3.230 Q21 112.4 7.908 57.52 28.16 177.2 3.793 2.259,2.099 33.73 2.762,2.664 R22 116.8 7.393 57.1 31.6 176.6 4.358 1.980,1.906 27.56 1.777,1.656 43.53 3.237 N23 119.4 8.245 52.58 38.4 172.9 4.475 2.736 R24 121.5 8.39 57.6 31.06 174.9 3.788 1.281,0.9161 27.24 -0.283 42.7 1.648,1.699 Y25 114.1 7.722 55.74 39.12 4.647 2.543,3.127 P26 62.67 31.07 177.4 4.402 1.582,1.415 25.63 D27 124.2 8.457 52.51 40.62 175.6 4.475 3.224,2.825 M28 119.4 8.674 60.34 32.14 177.5 3.843 2.567,2.471 32.12 2.111 S29 113.4 8.376 61.83 62.47 177.5 4.244 3.890 T30 120.9 8.402 67.08 67.61 176.5 4.001 4.173 20.95 1.151 R31 121 8.944 61.14 32.1 178.6 3.859 2.092 1.553 2.438 E32 118.8 8.595 59.8 29.32 178.8 3.962 2.305,2.028 36.68 2.571,2.187 E33 121.2 7.629 59.62 29.29 178.7 3.988 2.166,2.009 36.41 2.309,2.009 I34 119.8 8.402 64.77 38.6 179.6 3.614 1.887 28.37 1.158,0.916 14.96, 10.916,0.8326 A35 124.3 8.538 55.71 16.97 178.7 3.659 1.365 V36 118.4 7.773 66.52 31.61 180.1 3.811 2.245 22.83, 21.153,0.961 W37 120.3 8.099 59.11 29.89 178 4.622 3.447 T38 104.5 7.999 61.65 71.14 174.9 4.379 4.265 22.37 1.207 N39 119 7.913 54.81 37.29 174.1 4.527 3.239,2.889 L40 120.2 8.626 53.08 47.04 175.9 5.039 1.687,1.458 23.86 1.102 25.96 0.897 T41 106.5 7.071 59.49 71.46 175.9 4.705 4.709 22.3 1.336 E42 124.1 9.445 61.56 28.39 177.8 3.616 1.930,2.362 37.3 2.624,2.111 A43 119.7 8.462 55.59 18.63 180.1 3.995 1.487 R44 116.9 7.656 59.07 31.67 180.1 4.296 2.249,2.143 27.9 1.980 43.46 3.405 V45 120.1 7.999 67.25 32.56 177.3 3.757 2.435 22.21, 21.150,1.035 R46 121.6 9.423 60.7 30.56 179.6 3.930 2.269,2.104 26.16 1.703,1.560 43.5 3.347 V47 120.8 8.097 66.92 32.43 176 3.626 2.273 24.32 1.107 W48 122.9 8.419 62.61 28.23 180.3 4.876 3.449 F49 118.3 9.039 63.67 39.42 177.5 3.714 3.280 K50 121.4 7.901 60.5 33.41 177.5 3.933 1.960,1.722 30.2 1.374,1.472 30.19 2.247 N51 119.4 8.669 56.22 38.35 177.8 4.487 2.846 R52 125.6 8.836 56.68 27.66 180.3 3.664 0.663 23.9 -0.620, -0.209 2.474 R53 120.3 8.607 61.17 31.6 178.2 4.313 2.242 1.989 43.07 2.452 A54 122 7.374 55.62 17.49 180.1 4.338 1.595 K55 120.1 7.703 59.58 32.95 178.2 4.055 1.815 25.03 1.498 W56 121.3 8.345 60.31 29.59 178.3 4.582 3.411 R57 117.6 8.457 59.59 31.26 178.7 3.859 1.949 28.98 2.166 44.08 3.385 K58 118.3 7.758 58.09 32.98 177.9 4.205 1.958 25.23 1.644 29.16 1.496 R59 116.9 7.959 56.26 30.98 176.3 4.339 1.863 27.09 1.624 43.18 3.103 E60 119.5 8.068 56.84 30.01 176.4 3.986 1.820 36.61 2.013 E61 120.1 7.877 57.36 30.07 175.9 3.869 1.782,1.491 36.47 2.075,1.918 F62 118.9 7.722 56.96 39.19 175.2 4.571 3.034 I63 123.5 7.794 60.97 38.56 175.7 4.121 1.786 27.32 1.4,1.104,0.832 12.62 0.831 V64 125.9 8.281 62.25 32.88 176.3 4.197 2.088 21.1 0.994,0.823 T65 118.8 8.335 61.27 70.09 173.5 4.435 4.295 21.33 1.171 D66 128.2 7.994 55.95 42.23 4.418 2.680,2.573 76

Chemical Shift Indices

α α The values of the resonance frequencies of H , C , and C’ atoms can be used to get a general idea of secondary structure by using chemical shift indices (CSI) [Wishart et al, 1992;

Spera & Bax, 1991; Luginbuhl et al, 1995]. Chemical shifts are highly characteristic of the chemical environment and vary based on secondary structure. Table 3.2 shows the analyses for these atoms. If a chemical shift is greater than a range defined for that amino acid type, a "1" is assigned to it. If the chemical shift is below the range, a "-1" is assigned. If it's within the range,

α α a "0" is assigned. A strip of four or more "1"s for C’ or C , or "-1"s for H , is indicative of a helical structure. Table 3.2 indicates that the secondary structure of the PITX2 homeodomain is very similar to other homeodomains in that it consists of three helices.

77

Residue CO HA CA Residue CO HA CA G-1 1 -1 1 E33 1 -1 1 S0 1 0 1 I34 1 -1 1 Q1 -1 0 -1 A35 1 -1 1 R2 0 0 -1 V36 1 -1 1 R3 -1 -1 0 W37 1 -1 1 Q4 0 -1 -1 T38 0 0 -1 R5 -1 0 1 N39 -1 -1 1 T6 -1 -1 1 L40 -1 1 -1 H7 0 0 -1 T41 0 1 -1 F8 1 1 1 E42 1 -1 1 T9 0 1 -1 A43 1 -1 1 S10 1 -1 1 R44 1 -1 1 Q11 1 -1 1 V45 0 -1 1 Q12 1 -1 1 R46 1 -1 1 L13 1 -1 1 V47 -1 -1 1 Q14 1 -1 1 W48 1 1 1 Q15 1 -1 1 F49 1 -1 1 L16 1 -1 1 K50 1 -1 1 E17 1 0 1 N51 1 -1 1 A18 1 -1 1 R52 1 -1 1 T19 1 -1 1 R53 1 0 1 F20 0 -1 1 A54 1 0 1 Q21 1 -1 1 K55 1 -1 1 R22 0 0 1 W56 1 -1 1 N23 -1 -1 -1 R57 1 -1 1 R24 -1 -1 1 K58 1 -1 1 Y25 0 -1 R59 0 0 0 P26 1 0 -1 E60 0 -1 0 D27 -1 -1 E61 0 -1 1 M28 1 -1 1 F62 0 -1 -1 S29 1 -1 1 I63 -1 1 -1 T30 1 -1 1 V64 -1 1 -1 R31 1 -1 1 T65 -1 0 -1 E32 1 -1 1 D66 -1 1

α α Table 3.2: Chemical shift indices for the H , C , and CO atoms of the PITX2 homeodomain. The residues highlighted in yellow indicate where the 3 helices are found to be in the Antennapedia homeodomain.

78

Aromatic Assignments

Assignment of the atoms in aromatic groups was performed using NMR experiments described in Chapter 2. The 2D HMQC-TOCSY with a 20 ms mixing time was the most useful in making assignments. Previous structural studies of homeodomains have indicated interactions between aromatic groups play an important structural role in the hydrophobic core of the protein

[Subramaniam et al, 2001]. Therefore, it was important to assign these atoms in order to obtain an accurate tertiary structure. The assignments for these aromatic groups are listed in Table 3.3.

Residue NE1 CD1/ 2 HD1/ 2 CE1/ 2 HE1/ 2 CZ HZ CE3 HE3 CZ2/ 3 HZ2/ 3 CH2 HH2 H7 119.8 7.120 137.0 8.315 F8 7.351 7.292 7.075 F20 126.7 7.585 132.2 7.384/ 7.446 122.0 7.170 Y25 133.6 7.065 117.8 6.793 W37 109.0 126.4 7.128 10.210 122.0 7.127 114.7/ 120.77.448/ 7.572 124.7 7.204 W48 109.3 128.7 6.966 9.301 7.429 113.7 7.125/ 7.091 7.217 F49 132.2 7.799 7.769 7.594 W56 110.6 127.5 7.473 10.350 130.5 7.041 114.7/ 119.87.256/ 7.694 122.0 7.169 F62 131.8 7.093 130.5/ 1307.318/ 7.318 7.295 Table 3.3: Assignments of atoms in aromatic groups of the PITX2 homeodomain.

79

Side Chain Assignments for Arginine, Asparagine, and Glutamine Residues

Side chains of arginine, asparagine and glutamine residues were assigned by looking at

NOESY spectra and matching large NOESY peaks to previous assignments. These assignments are shown in Tables 3.4, 3.5 and 3.6.

Residue Nε Hε R2 83.8 7.689 R3 84.7 7.478 R5 84.5 7.256 R22 83.0 7.652 R24 83.6 6.359 R31 83.2 8.186 R44 84.6 7.522 R46 80.5 9.330 R52 89.9 9.852 R53 86.4 7.495 R57 85.6 7.452 R59 85.8 7.396

Table 3.4: Chemical shift assignments of the arginine side-chains.

Residue ND2 HD21 HD22 N23 N39 112.3 7.520 6.878 N51 123.6 8.906 8.405

Table 3.5: Chemical shift assignments of the asparagine side-chains.

Residue NE2 HE21 HE22 Q1 108.6 6.583 7.514 Q4 113.0 6.893 7.634 Q11 112.0 7.663 6.787 Q12 110.7 6.716 7.982 Q14 111.7 6.835 7.472 Q15 112.0 6.787 7.663 Q21 113.5 7.910 7.083

Table 3.6: Chemical shift assignments of the glutamine side-chains.

80

Chemical Shift Assignments for the DNA Bound to the PITX2 Homeodomain

Assignments for the protons of the DNA were assigned as described in Materials and

Methods (Chapter 2). These assignments can be seen in Table 3.7.

Residue TCH3 2'H 2”H 1'H 6H 8H 3'H C5H A2H 4'H 5’H G67 2.665 2.757 5.988 7.966 4.834 4.237 4.168 C68 2.159 2.544 6.068 7.517 5.356 4.248 4.209 T69 1.590 6.061 7.426 4.164 4.195 C70 2.435 2.229 5.896 7.531 5.555 4.159 T71 1.738 2.249 2.363 6.078 7.618 4.049 5.564 A72 1.805 2.409 5.679 7.758 4.131 5.253 A73 2.313 2.883 6.173 7.438 T74 1.091 2.139 2.445 6.101 7.038 4.129 4.222 C75 2.461 2.619 5.891 7.532 5.471 4.211 C76 2.245 2.508 5.906 7.577 5.612 4.209 C77 2.151 2.486 6.079 7.536 4.781 5.610 4.149 4.161 C78 1.936 2.350 5.763 7.460 5.587 4.060 4.107 G79 2.362 2.605 6.170 7.932 4.669 4.062 4.178

C80 1.979 2.444 5.892 7.605 4.696 5.892 4.121 4.067 G81 2.733 2.777 5.800 7.870 4.978 4.043 4.352 G82 2.637 2.743 5.803 7.423 4.923 4.427 4.352 G83 2.245 2.694 5.926 7.665 4.378 4.065 G84 2.608 2.939 5.810 7.768 5.012 4.192 4.462 A85 2.462 2.999 5.952 8.062 5.008 4.427 4.458 T86 1.344 1.810 2.427 6.157 6.981 4.748 4.136 4.429 T87 1.538 2.096 2.410 5.678 7.178 4.912 4.147 A88 2.647 2.821 5.912 8.413 5.022 G89 2.110 2.175 6.145 7.402 4.912 4.045 4.465 A90 2.638 2.848 5.895 7.739 G91 2.638 2.747 5.932 7.631 4.917 4.424 C92 2.442 2.647 5.827 7.402 5.377 4.311

Table 3.7: Chemical shift assignments of the DNA binding site.

81 Protein-DNA NOEs

NOEs between the protein and the DNA were obtained and assigned as described in

Materials and Methods (Chapter 2). A table of these assigned NOEs, with the upper distance limits used, is shown in Table 3.8.

DNA Protein Upper Distance Limit (Å) T71 Q5’ R52 QB 5.00 A72 4'H R3 QD 4.00 A72 Q5’ R3 Hε 5.00 A72 Q5’ T6 HG 5.00 A72 Q5’ V47 QG1 6.00 A72 8H W48 HA 6.00 A73 2'H V47 QG1 5.00 A73 8H R44 Hε 6.00 A73 2"H V47 QG1 5.00 T74 Q5’ R2 HA 6.00 T74 Q5’ R2 QD 6.00 T74 4'H R44 QB 6.00 T74 2"H R44 QD 6.00 T74 6H V47 QG1 6.00 T74 TCH3 V47 QG1 6.00 G81 Q5’ Y25 HE 6.00 G82 Q5’ R31 QG 5.00 G82 4'H Y25 HE 5.00 G83 4'H F49 QB 6.00 G83 Q5’ F49 QB 6.00 G83 4'H R53 HG2 6.00 G83 Q5’ K50 HG2 6.00 G83 Q5’ K50 HG1 5.00 G84 8H R46 QD 6.00 G84 Q5’ K50 HG1 6.00 G89 8H R5 Hε 4.00 G89 8H R5 HN 5.00

Table 3.8: Table of protein-DNA NOEs

Tertiary Structure of the Pitx2 Homeodomain

Structure Determination. Assignment of the protein backbone and side chain 1H, 13C, and 15N resonances were obtained from heteronuclear spectra. Restraint data derived for the

PITX2 homeodomain-DNA complex are summarized in Table 3.9. Analysis of 15N and 13C

82 heteronuclear-edited NOESY spectra recorded at various mixing times provided 1259 intramolecular distance restraints comprising 513 intraresidue, 338 sequential, 300 medium- range (2-5 residues apart) and 108 long-range (>5 residues apart) NOE contacts. Torsional restraints for 55 φ and 43 ψ angles were obtained from a 3D HNHA experiment, and from using

Cα chemical shifts [Spera & Bax, 1991; Luginbuhl et al, 1995]. Overall, there are 19 restraints per residue, on average, for intramolecular protein NOEs. All of these protein restraints were used for structure calculation with the program CYANA [Guntert et al, 1997; Herrmann et al,

2002]. After the final round of structure calculation, the 20 structures with the lowest CYANA target function were used for docking to DNA and energy minimization. The final average

CYANA target function for the 20 structures was 2.05.

A total of 292 distance restraints between protons within the DNA were obtained from

13 15 13 13 15 C/ N-filtered NOE spectra. A series of 2D C(ω1)-edited, [ C, N](ω2)-filtered NOESY spectra provided 27 unambiguous intermolecular restraints between the protein and the DNA.

These restraints were entered into the program AMBER for docking and energy minimization, as described in Materials and Methods.

Quality of the NMR structure. The structure of the PITX2-DNA complex was calculated by a restrained molecular dynamics docking and energy minimization procedure starting from the coordinates of the PITX2 protein calculated from CYANA and canonical B-form DNA as described in Materials and Methods. The 20 structures with the lowest total energies were selected for conformer analysis. These structures exhibited mean AMBER energies of –6268 kcal mol-1 and mean Van der Waal’s and electrostatic energies of –399 and –3974 kcal mol-1 respectively. The mean AMBER energies given represent the intra-protein interaction energy.

83 Table 3.9. NMR structure statistics NMR constraints Protein Distance constraints 1259 Intraresidue 513 Sequential 338 Medium-range 300 Long-range 108 Dihedral constraints 98 Phi 55 Psi 43 DNA 292 Protein-DNA (intermolecular) 27 Total 1676

CYANA target function value (Å2)a 2.05 +/- 0.39

Number of violations (average per conformer) Distance violations (>0.30 Å) 0 Dihedral angle violations (>5.0º) 1

AMBER energies (kcal/mol)b Mean AMBER energy -6268 +/- 250 Van der Waals -399 +/- 33 Electrostatic -3974 +/- 336

Ramachandran plot (%)c Residues in most favored regions 80.1 Residues in additional allowed regions 14.8 Residues in generously allowed regions 2.3 Residues in disallowed regions 2.8

RMSD from the mean structure (Å) Protein (bb, residues 3-58) 1.38 All heavy atoms (residues 3-58) 1.95 Protein (bb, all residues) 1.85 DNA (residues 68-78, 81-91) 1.30 Complex (residues 3-58, 68-78, 81-91) 1.81

A total of 30 conformers were calculated and the 20 structures with the smallest residual CYANA target function values were subjected to docking and energy minimization. a The value given for the CYANA target function corresponds to the value before energy minimization (the CYANA target function is not defined after energy minimization, since the conformers no longer have ECEPP standard geometry). b The value given represents the intra-protein interaction energy. c For residues 3-58.

84

The superposition of the structures (Figure 3.5) demonstrates a well-defined tertiary structure for PITX2 bound to DNA. The structures have no distance violations greater than 0.3

Å, and only 1 angle violation greater than 5 degrees. Analysis of Ramachandran plots for the ensemble indicates that the structures generally show favorable backbone conformations within allowed conformational space, with 80.1% of the residues 3-58 within the most favored regions,

14.8% in additionally allowed regions, 2.3% in generously allowed regions, and 2.8% in disallowed regions for the 20 conformers (Table 3.9, Figure 3.6). The N- and C-termini are largely disordered. When superimposed, residues 3-58 have an average root-mean-square deviation (RMSD) from the mean structure of 1.38 Å for backbone (N, Cα, C’ and O), 1.95 Å for all heavy atoms, and 1.85 Å for the backbone when all residues are included. The global RMSD for all DNA heavy atoms (nucleotides 68-78, 81-91) is 1.30 Å. The RMSD for the entire complex (residues 3-58; nucleotides 68-78, 81-91) is 1.81 Å.

85 (a)

(b)

Figure 3.5: Ensemble of structures of the PITX2 homeodomain/DNA complex. (a) α Ensemble of 20 structures showing the protein backbone N, C , and C’ atoms and

the DNA backbone. Helix 1 is colored pink, helix 2 green, helix 3 purple, and the DNA strands are coral. Superimposition was performed using backbone atoms from

protein and DNA. (b) Alternate view of the structure, rotated by approximately 90 degrees.

86

Figure 3.6. Ramachandran plot, generated by PROCHECK [Laskowski et al., 1993, 1996], for the 20 structures in the ensemble, for residues 3-58; 80.1% of the residues are in the most favored regions, 14.8% in the additionally allowed regions, 2.3% in generously allowed regions, and 2.8% in disallowed regions.

Tertiary structure of the PITX2 homeodomain-DNA complex. The overall tertiary structure of the PITX2 homeodomain is similar to other homeodomains, supporting previous findings that this tertiary structure is conserved among homeodomains [Gehring et al, 1990;

Gehring et al, 1994; Scott et al, 1989; Billeter, 1996]. The tertiary structure of the PITX2 homeodomain is comprised of 3 alpha helices (Figure 3.7). Helix 1 (residues 10-20) is followed

87 by a loop region, and then helix 2 (residues 28-37) runs antiparallel to helix 1. Helix 2 and helix

3 (residues 42-58) form a helix-turn-helix motif. Helix 3 is approximately perpendicular to

helices 1 and 2, and fits into the major groove of the DNA. The N-terminus of the homeodomain

makes contacts within the minor groove of the DNA.

(a)

(b)

Figure 3.7. Structure of the PITX2 homeodomain-DNA complex. (a) Mean structure of the homeodomain. Helix 1 is colored pink, helix 2 green and helix 3 is purple. The DNA strands are colored coral. (b) Ribbon diagram of the mean structure of the PITX2 homeodomain-DNA complex.

88 The helices of the PITX2 homeodomain are held together by a core of eight tightly packed hydrophobic amino acids (F8, L13, L16, F20, L40, V45, W48, and F49). These amino acids are either invariant (W48 and F49) or highly conserved in all homeodomains [Gehring et al, 1994; Qian et al, 1989; Kornberg, 1993]. In a threading analysis performed previously for the

PITX2 homeodomain [Banerjee-Basu & Baxevanis, 1999], it was hypothesized that the tertiary structure of the PITX2 homeodomain would be similar to other homeodomains, mainly because many of these hydrophobic, aromatic amino acids that are present in other homeodomains are also present in the PITX2 homeodomain. The threading analysis threaded the PITX2 homeodomain sequence to the Engrailed homeodomain structure, so the overall tertiary structure ends up being very close to that of Engrailed. The threading analysis did not provide a PDB file that we could analyze in detail and is not necessarily indicative of true molecular structure. For this threading analysis, the focus was the role of Rieger mutations in causing disease, and there was no discussion of the role of K50 in determining the DNA-binding affinity and specificity of the homeodomain, which is something best addressed via an experimentally determined structure rather than a threading model. This study also did not analyze the K50 Rieger mutants, so there is no indication what the structure of this side chain was in their analysis. In the absence of an experimentally determined structure, the threading model was most useful for visualizing some of the intramolecular interactions that stabilize the tertiary structure, and the predicted interactions are consistent with our experimental data. While we cannot compare our PITX2 tertiary structure directly to that of the threaded structure, we can compare it to EnQ50K and other homeodomain structures. In our experimentally determined structure, the first helix is closer to the second helix when measured from the backbone nitrogen of L16 to the backbone nitrogen I34 and compared to the EnQ50K, Antennapedia, wild-type Engrailed, Fushi tarazu,

89 vnd/NK-2, and MATα2 homeodomains [Tucker-Kellogg et al, 1997; Billeter et al, 1993;

Kissinger et al, 1990; Qian et al, 1994b; Gruschus et al, 1997; Wolberger et al, 1991]. As far as this distance is concerned, PITX2 is an outlier compared to the other six homeodomains. This distance is a range of 9.60-10.70 Å for Antennapedia conformers, 9.43 Å for the crystal EnQ50K structure, 9.54 Å for wild-type Engrailed, 9.30-11.10 Å for Fushi tarazu, 8.67 Å for vnd/NK-2, and 10.9 Å for MATα2. But for PITX2, this distance range for the 20 conformers is only 7.55-

8.58 Å, which is an average of 1.8 Å closer. In view of my RMSD for the protein backbone atoms (residues 3-58) of 1.38, this result is still significant, when compared with the ranges of distances seen in the structures of the other homeodomains. This difference is especially significant when considering that the RMSD for helices 1 and 2 alone is only 0.78. The range for the distance between L16 and I34 for all of the other homeodomains together is 8.67-11.10

Å, and the PITX2 distance range is completely outside of this.

In addition to the narrower distance between the first and second helices in the PITX2 structure, there are several other differences between the PITX2 and EnQ50K structures. In particular, the third helix of PITX2 is positioned about 0.5 Å lower (closer to the N-terminus of helix 1 and C-terminus of helix 2) than in EnQ50K (Figure 3.8). This difference in orientation of the three helices causes slightly different contacts to be made between the first and third helices, and may provide an explanation for the decreased stability of this homeodomain. Unlike other homeodomains that are stable in the free form [Tsao et al, 1994; Qian et al, 1994b; Damante et al, 1994; Carra & Privalov, 1997; Otting et al, 1988; Yamamoto et al, 1992], the PITX2 homeodomain is unstable in the absence of DNA in that it irreversibly aggregates at micromolar concentrations, which suggests a possible lack of stable tertiary structure in the free form. This may be due to slightly different hydrophobic interactions within the core of the protein, and the

90 absence of other stabilizing interactions such as the salt bridge linking residues 19 and 30, which can be present in most homeodomains [Iurcu-Mustata et al, 2001] but is not possible in PITX2.

One difference seen here is that F49, which is nearly invariant among homeodomains, points slightly upwards towards the loop region of the homeodomain, instead of pointing towards the interior of the protein. The altered orientation of the first helix in relation to the third would cause a steric clash with F49 if it were in a similar orientation as other homeodomains. While there is still an interaction involving F49 and F20 within the hydrophobic core of the PITX2 homeodomain, the orientations of the side chains themselves are different. This differing orientation may lessen the strength of the interaction between the first and third helices, which may affect the stability of the protein in the absence of DNA. This difference in orientation may be due to any number of differing residues between the two homeodomains (see Table 1.2). One possibility is a proline residue that is found in the loop region between helices 1 and 2 in PITX2, but is not present in Engrailed or Fushi tarazu.

91

(a)

(b)

Figure 3.8. Overlay of PITX2 homeodomain and EnQ50K homeodomain structures. Cyan corresponds to the structure of the PITX2 homeodomain, and black corresponds to the structure of the Engrailed mutant homeodomain. (a) Helices 1 and 2 of PITX2 are approximately 1.8 Å closer to each other in Pitx2 than in other homeodomains. (b) Alternate view, rotated by approximately 90 degrees. Helix 3 is about 0.5 Å lower in Pitx2 than in EnQ50K.

92 In the PITX2 structure, the N- and C-terminal segments –2 to 3 and 60 to 68 (Figure 2.1) appear disordered (Figure 3.5), which is to be expected based on a lack of medium-range and long-range constraints for these residues. Relaxation analysis (Figure 3.9) indicates that residues

-2 to 2 and 59 to 66 are more mobile in solution, explaining the observed disorder and lack of restraint information for these regions.

12 3 15N R2 (1/sec) 15N R2

Residue number

Figure 3.9: R2 relaxation rate constants for the PITX2 homeodomain. Low R2 rate values correspond to a higher mobility in that area of the protein. The N- and C-termini are seen to be very mobile, which corresponds to the disorder seen in these regions in the tertiary structure. Residues in the core of the protein that appear more mobile are found in the loop regions.

Our study also reveals structural information about the DNA when it is bound to the protein. Distance restraints obtained from the experiments described above for assigning the

DNA were entered into AMBER during the docking procedure. Visual inspection of the structure of the PITX2 homeodomain-DNA complex indicates that there is a slight widening of the minor groove of the DNA compared to B-form DNA, and a concomitant narrowing of the major groove. Previous structures of protein-DNA complexes have indicated changes in DNA structure are possible upon protein binding [Jones et al, 1999]. A more thorough, quantitative,

93 analysis of the DNA structure when PITX2 is bound will not be possible until a high-resolution structure of the DNA is determined, using labeled DNA [Fernandez et al, 1999].

Protein-DNA recognition. Analysis of the filtered NOESY experiments produced 27 usable distance restraints between the protein and the DNA. These include contacts that have been seen in other biochemical and structural studies of homeodomains. Many of the residues that interact with the DNA are arginines, including R3 and R5 at the N-terminus, R31 in the second helix, and R46, R52 and R53 in the third helix (Figure 3.10). Other residues that were found to make DNA contacts are Y25 and F49. A number of NOESY peaks were also seen between K50 and the DNA, and these contacts are discussed further below.

(a)

T6 HG – A72 Q5’ R2 HA – T74 Q5’ QD – T74 Q5’

R3 QD – A72 4’H Hε –A72 Q5’

R5 Hε –G89 8H HN – G89 8H

94 (b)

Y25 HE – G81 Q5’ HE – G82 4’H

R31 QG – G82 Q5’

(c)

V47 QG1 – A72 Q5’ QG1 – A73 2’H QG1 – A73 2”H QG1 – T74 6H QG1 – T74 7H

W48 HA – A72 8H R44 Hε –A73 8H QB – T74 4’H QD – T74 2”H

95 (d)

R46 QD – G84 8H

F49 QB – G83 4’H QB – G83 Q5’

K50 HG2 – G83 Q5’ HG1 – G83 Q5’ HG1 – G84 Q5’

R53 HG2 – G83 4’H

R52 QB – T71 Q5’

Figure 3.10. Detailed view of the protein-DNA interface and protein-DNA contacts. Side- chains of the protein are illustrated in cyan. On the DNA, blue corresponds to guanine residues, green to cytosine, pink to adenine, and purple to thymine. (a) View of the protein-DNA NOE contacts between the N-terminus of the PITX2 homeodomain and the minor groove of the DNA. (b) View of the protein-DNA NOEs between Y25, R31, and the DNA. (c) and (d) View of protein-DNA NOE contacts between residues in the third helix and the major groove of the DNA.

A detailed picture of the protein-DNA interface is shown in Figure 3.10. This figure illustrates the orientations of some of the side chains that are important in DNA binding, particularly within the third helix. Figure 2.4 outlines the numbering of the DNA used in the following discussion. Figure 3.10a illustrates the protein-DNA NOE contacts seen within the N- terminal arm. NOE contacts were seen in the minor groove between R2, R3 and R5 and DNA residues A72, T74 and G89. Although the NOESY-derived distance constraints indicate contact between residues R3 and R5 and the minor groove, relaxation data (Figure 3.9) indicates that this region of the N-terminus does retain some degree of mobility; similar results were reported for the Even-skipped homeodomain, based on refined atomic B-factors [Hirsch & Aggarwal, 1995].

Broad linewidths were observed for the backbone NH resonances of His7 and Phe8, which are

96 indicative of slow timescale motions in this region of the homeodomain and could possibly render undetectable possible NOEs from these residues to the DNA.

In the second helix, R31 has a NOE contact to G82 Q5’, as can be seen in Figure 3.10b.

HBPLUS analysis (96) indicates that R31 is making a hydrogen bond contact with the phosphate backbone of this nucleotide. In the loop between helices 1 and 2, Y25 Hε is making NOE contacts with G81 and G82. In the third helix, V47 Qγ1 (Q refers to a pseudoatom representation) is making conserved NOE contacts to A72, A73 and T74. Residue W48 has a

α NOE contact between H and A72 8H. R44 is making contacts with DNA nucleotides A73 and

T74. HBPLUS analysis indicates that R44 is making a backbone hydrogen bond contact to the phosphate of T74. Residues 44, 47 and 48 are illustrated in Figure 3.10c. In the third helix, R46 and R52 appear to be making conserved contacts with the DNA backbone. R46 extends upwards, and R52 extends downwards to make these contacts (Figure 3.10d). R46 Qδ has an

NOE contact with G84 8H. R52 has an NOE contact with T71 Q5’. R53 makes an NOE contact with G83 4’H. All of these NOEs could be due to the close proximity of the atoms while the side chains form hydrogen bonds with backbone phosphate groups. NOEs are also seen between

F49 Qβ and G83. K50 will be discussed further below, but as can be seen in Figure 3.10d, there are NOE contacts between the K50 side chain, and atoms from G83 and G84.

Other residues that were found to be in close contact with the DNA, but without NOEs being seen in the NMR data, are N51, K55, R57, and K58. N51 is nearly invariant among homeodomains [Kornberg, 1993] and is found herein to make the same highly conserved interaction within the major groove with base A73. This residue has been shown in crystal structures to form a pair of hydrogen bonds with this adenine at the N7 and N6 positions, while

NMR studies have indicated possible rapidly-interchanging conformations [Tsao et al, 1994;

97 Qian et al, 1993]. NMR studies have shown this close interaction, but NOEs are not seen, possibly due to line-broadening effects [Qian et al, 1993]. While NOEs are not seen between

K55, R57 or K58 and the DNA, HBPLUS analysis of the complex indicates that there are possible interactions present. K55 may be forming a hydrogen bond with the phosphate of T71.

R57 may be contacting the phosphate of G84. K58 may be contacting the phosphate of C70.

Due to the usual sensitivity limitations in the edited/filtered NMR experiments employed to identify intermolecular NOEs, it is quite likely that a number of anticipated NOEs fall at or below the threshold for detection.

The role of lysine at position 50. No previous structures have been described for any native K50 class homeodomains. However, the X-ray crystal structure of the Q50K mutant of the Engrailed homeodomain bound to DNA has been reported [Tucker-Kellogg et al, 1997], and the side chain of K50 was found to project into the major groove of the DNA, making hydrogen bond contacts with the O6 and N7 atoms of the guanines at base pairs 5 and 6 of the complementary strand of the TAATCC binding site. Our structure of the PITX2 homeodomain marks the first experimentally determined structure of a native K50 class homeodomain, and is important for validating results seen in the studies of non-native proteins. When binding to the consensus site, the position of K50 is very similar to that seen in the EnQ50K structure, with the side chain of K50 extending outward and making contacts with the two guanines adjacent to the

TAAT core sequence on the antisense strand (Figure 3.10d). The Nζ of the K50 side chain is likely making hydrogen bond contacts to the O6 and N7 atoms of G83 and G84, according to analysis by HBPLUS [McDonald & Thornton, 1994].

NMR allows one to elucidate information about the mobility of the protein backbone and side chains. A key finding in the present study was that the side chain of K50 potentially

98 mediates recognition by fluctuating between multiple conformations. The conformational heterogeneity can be seen in Figure 3.11a. This preliminary evidence is based on averaging of

NOEs and broadening of resonances for this residue. The averaging of NOEs was dealt with as ambiguous distance constraints within the structure calculation in CYANA, and these constraints were satisfied in all structures of the family. When peaks from an H(CCO)NH-TOCSY experiment are compared between the K50 and K58 side chains (Figure 3.11b), peaks are easily seen for the K58 side chain resonances, but only the HA resonance is seen for the K50 side chain. The extra peaks in the K50 strip of Figure 3.11b are peaks from another residue on a different nitrogen plane that are strong enough to show up as residual peaks on this plane. The broadening of resonances for this side chain made it difficult to assign using typical heteronuclear-edited NMR spectra. Instead, assignments were made using NOESY spectra and eliminating assignments from nearby residues, until only K50 resonances were left. In principle, it is possible that the line-broadening of K50 side chain resonances could be caused by ring current effects from aromatic bases in the DNA, or by mobility of other nearby protons in the

DNA binding site. However, no anomalous line-broadening was observed for DNA proton resonances in the vicinity of the K50 side chain. In addition, results similar to those reported here have been seen in other DNA-binding proteins in which side chain mobility appears to cause line broadening of resonances (vide infra) [Tsao et al, 1994; Qian et al, 1993; Foster et al,

1997; Nishikawa et al, 2001]. These results, in combination with the multiple conformations observed for K50 in EnQ50K, provide compelling evidence that K50 is mobile. Backbone dynamics (Figure 3.9) did not show anything unusual for the backbone of K50, although this does not mean the side chain is not showing motional properties. Some degree of side chain mobility at the protein-DNA interface would be expected to confer an entropic advantage for

99 binding to the DNA. It has been estimated previously that the entropic cost of keeping a lysine side chain static during binding is 3 kcal mol-1 [Doig & Sternberg, 1995]. This possible entropic component cannot be assessed until a detailed thermodynamic study is performed for this complex. This hypothesis of K50 side chain mobility will be explored further in the future, but for now, it is complementary to the data for the EnQ50K mutant [Tucker-Kellogg et al, 1997].

The crystal structure indicates that there are two alternate conformations for the K50 side chain, one in which the side chain points to base pairs 5 and 6, and one in which the side chain is oriented slightly more towards base-pair 5. It must be pointed out that this x-ray structure was solved at cryogenic temperatures, so there is the possibility that there is a freezing out of a subset of conformational populations. It is also possible that these results indicate two static conformations for this side chain of EnQ50K, rather than a dynamic fluctuation between two conformations, as indicated by the B-factors for the K50 side chain in the mutant. The B-factors in this case provide no evidence for distinguishing between these possibilities. The B-factors are low for the side chain of K50 in the 1.9 Å crystal structure of EnQ50K, varying over the range

20.8 to 23.6, which are the lowest values in the protein, aside from the aromatic ring of F49. B- factors of about 20 indicate uncertainties of about 0.5 Å. Typically, B-factors of 60 or greater in high-resolution crystal structures indicate possible mobility of a side chain. So, according to the crystal results, the side chain position of K50 is well-defined in the crystal, in contrast to the possible mobility of the K50 side chain seen in our results. The true nature of the side chain may involve a combination of the states revealed by the two different experimental approaches, so that the K50 side chain has two predominant conformations, and fluctuates between these alternatives.

100

(a)

(b)

Figure 3.11. The K50 side-chain may be mobile. (a) View of the 20 conformers, with only the K50, G83, and G84 backbone and side-chain atoms shown to illustrate the extent of disorder of the K50 side-chain, implying possible mobility of this side-chain in interacting with the DNA. K50 atoms are shown in blue, and G83 and G84 atoms are shown in pink. Backbone atoms are bolder than side-chain atoms. (b) Strips from an H(CCO)NH-TOCSY spectrum showing proton resonances for the side-chains of K58 and K50. Line-broadening of resonances in the K50 side-chain is indicative of possible motion of this side-chain.

Although a more detailed characterization of the side chain dynamics in the PITX2 HD-

DNA interface must await future data, substantial support for our observation of flexibility in the

K50 side chain already exists from studies of related systems. Significant broadening of side chain resonances at the protein-DNA interface was observed in studies of homeodomain-DNA

101 complexes of Antennapedia [Qian et al, 1993] and NK-2 [Tsao et al, 1994]. Moreover, flexibility in lysine side chains appears to be a significant feature of various modes of protein-

DNA interactions. Foster and co-workers [Foster et al, 1997] have reported clear indications of substantial, conformational fluctuations in lysine side chains in the interface of the zinc-finger protein TFIIIA with its DNA binding site, including the observation of broadened resonances and multiple NOE contacts that strongly suggest rapid conformational averaging. Significant line-broadening effects were also reported for a lysine side chain in NMR studies of the telomeric DNA complex of trf1 [Nishikawa et al, 2001]. In addition to NMR studies, molecular dynamics simulations of wild-type [Billeter et al, 1996] and a Q50K mutant [Gutmanas &

Billeter, 2004] of the Antennapedia homeodomain bound to DNA provides further evidence in support of a dynamic homeodomain-DNA interface. For example, the Q50K simulations indicated that the side chain of K50 exhibited very pronounced mobility, with several arrangements of the lysine side chain torsion angles allowing for frequent contacts, both hydrogen bonds and hydrophobic interactions, with base-pairs 5 and 6 in the TAATCC binding site. In this case, the lysine in the Q50K mutant provides both entropic and enthalpic contributions to protein-DNA affinity. A general observation arising from the known structures of homeodomain-DNA complexes is that the region of position 50 is not in intimate contact with the bases of the major groove. Such a relatively unrestrained arrangement allows for relatively long-range contacts to be formed in multiple, possibly isoenergetic ways.

Previous studies have shown that the lysine at position 50 is critical for its binding to the

TAATCC DNA binding site [Kornberg, 1993]. In contrast, homeodomains with a glutamine at position 50 bind to TAATGG sites with a higher affinity. The glutamine at position 50 appears to have a more modest role. When this residue is mutated to an alanine, the Q50A mutant has a

102 very similar affinity and specificity as the wild-type protein, but when mutated to a lysine, the specificity changes [Fraenkel et al, 1998]. These studies, along with the current results, indicate that the interaction between K50 and the two guanines at positions 5 and 6 are vital to the affinity and specificity of the protein. The current model for specific homeodomain-DNA interactions consists of a fluctuating network of hydrogen bonds formed between polar groups of the protein and the DNA, and the interfacial water [Billeter et al, 1996]. These interactions are further complemented by hydrophobic contacts. The possible fluctuating hydrogen bonding interactions between K50 and the DNA and subsequent strict specificity of this class of homeodomains is consistent with this model. Investigation of side chain-base interactions has shown that lysine- guanine interactions are very common [Mandel-Gutfreund et al, 1995]. K50 homeodomains may have such a strong specificity for the TAATCC site because the orientation of the lysine is in an ideal position for the charged group to make hydrogen-bonding contacts with the two guanines.

In contrast, these hydrogen bonds cannot be made with cytosines, which are in these positions for the Q50 binding site TAATGG [Mandel-Gutfreund et al, 1995]. The N7 of guanine is the most electronegative region of the major groove [Saenger, 1984], and the favorable interactions that the lysine can make with both guanines in a mobile model may describe why K50 homeodomains are so specific for the TAATCC binding site, rather than other binding sites.

Analysis of residues mutated in Rieger syndrome. There have been 9 missense mutations found in the PITX2 homeodomain in Rieger syndrome and related disorders [Semina et al,

1996a; Priston et al, 2001; Kulak et al, 1998; Saadi et al, 2001; Heon et al, 1995; Alward et al,

1998; Chisholm & Chudley, 1983; Walter et al, 1996; Murray et al, 1992; Quentien et al, 2002b].

These mutations, along with their known biochemical effects, are listed in Table 1.1. The consequences of these mutations vary. Some mutations cause a total lack of DNA binding, while

103 others can still bind DNA, albeit with a decreased affinity. These consequences are directly reflected in the severity of the disease. A model of the PITX2 homeodomain structure was created previously by threading analysis, which allowed predictions to be made regarding the role of Rieger syndrome mutations in PITX2 dysfunction, though it is not necessarily an indication of the true molecular structure [Banerjee-Basu & Baxevanis, 1999].

The orientations of the side chains altered in Rieger syndrome patients are shown in

Figure 3.12. Analyzing these orientations provides insights into the role of each side chain, and how mutations in these positions could alter the structure and function of the protein. Future studies will focus on analyzing the mutant proteins by NMR spectroscopy. The side chain of highly conserved L16 points towards the interior hydrophobic core of the protein, and is probably involved in stabilizing both the formation of this core and the overall tertiary structure of the protein; the L16Q mutation would therefore be expected to destabilize or disrupt this hydrophobic core. The side chain of T30 extends outward from the second helix, away from the

DNA, so it does not appear to play a role in DNA recognition. Biochemical studies have shown that this mutant can still bind consensus DNA, but no longer activates transcription of a reporter gene [Amendt et al, 1998]. This residue may perform an activation function by interacting with other proteins, which could easily be disrupted by the effects of the proline mutation. An interesting observation is that in many homeodomains, residue 30 is involved in a salt bridge to residue 19, whereas this is not possible for PITX2. The side chain of R31, as described above, appears to contact the DNA backbone phosphate of G82. Therefore mutating this residue, even to another positively charged residue, may disrupt this interaction with the DNA and may disrupt a possible salt bridge with E42. The histidine side chain at this position in the mutant may not have favorable steric interactions with the DNA. The side chain of V45 points towards the

104 interior of the protein from the third helix. Like L16, this side chain appears to be involved in formation of the hydrophobic core of the protein. Unlike the L16Q mutant, the V45L mutant has the unusual characteristic of having a greatly heightened activation function, while having a reduced DNA-binding ability. It is possible that this mutant affects the protein in a way that alters these two functions separately, with a different fold of the protein that allows for a more efficient interaction with other proteins. For example, altered interactions of the PITX2 homeodomain with the C-terminal tail of the full-length PITX2 protein could have differential effects on DNA binding and activation [Amendt et al, 1999]. The DNA-binding functions of

R46, K50, R52 and R53 were discussed in detail above. Mutating these residues would disrupt many favorable interactions with the DNA, and biochemical studies have indicated these mutations interfere with DNA binding. Overall, these results are similar to the threading analysis, but provide a more direct and detailed understanding of the roles of these residues.

Figure 3.12. Ribbon diagram of the PITX2 homeodomain/DNA complex showing the positions of the side-chains for the residues known to be mutated in Rieger Syndrome and related disorders.

105 Many of the residues in the PITX2 homeodomain found to be altered in Rieger syndrome are involved in contacting the DNA. Other residues are involved in forming the hydrophobic core of the protein, which stabilizes the global fold. The analysis of mutations causing structural changes could be very relevant for the understanding and prediction of dysfunctions caused by mutations in homeodomains, as several homeodomains are known to be involved in various diseases [Boncinelli, 1997; Muragaki et al, 1996; Nakamura et al, 1996; D’Elia et al, 2001;

Borrow et al, 1996].

Concluding Remarks:

The structure previously determined for the Engrailed Q50K mutant [Tucker-Kellogg et al, 1997] provided some interesting insights into the possible role of lysine at position 50. The presence of hydrogen bonds between position 50 and the DNA had not been seen previously.

But many questions remained unanswered concerning the role of lysine in a native K50 homeodomain. For example, the Engrailed mutant has a dissociation constant of 0.0088 nM

[Tucker-Kellogg et al, 1997], representing an unusually high affinity for homeodomain-DNA interactions. Previous studies have indicated that proteins with excessively high affinities for

DNA or RNA can cause functional defects [Watanabe & Lambowitz, 2004; Monsalve et al,

1998]. The unusually high affinity of EnQ50K for DNA suggests that it may have properties that make it different from natural K50 homeodomains. Unlike the Engrailed mutant, the native

K50 class homeodomains PITX2 and Bicoid have properties that make them unstable in free forms, and have affinities within the normal nanomolar range [Amendt et al, 1998; Ma et al,

1996]. When DNA is not present, these proteins will irreversibly aggregate and precipitate out of solution at micromolar concentrations. These differences in biochemical properties between

106 the mutant and natural K50 proteins suggest the importance of understanding the structural properties of lysine at position 50 in the context of a native K50 class protein.

But the question still remains as to what causes these differences. The authors of the

EnQ50K structure found that the mutant bound to DNA more tightly and specifically than did the native protein [Tucker-Kellogg et al, 1997]. They hypothesized that this was due to very specific hydrogen bonds between the K50 side chain and the guanines at positions 5 and 6 on the anti-sense strand. In our study, we found that the native K50 homeodomain PITX2 has a slightly different tertiary structure, with helix 1 being closer to helix 2 than in other homeodomains, including the EnQ50K mutant. Helix 3 is angled about 0.5 Å closer to the N-terminus of helix 1 and C-terminus of helix 2 than EnQ50K. This appears to cause a difference in the way that helix

1 and helix 3 can interact, and previous studies have shown that this interaction between the helices stabilizes the global fold of the homeodomain [Gehring et al, 1994; Qian et al, 1989;

Kornberg, 1993]. Another Q50K mutant, this time of Fushi tarazu, is unable to bind non- consensus DNA sites that PITX2 and Bicoid are able to recognize [Zhao et al, 2000]. It is currently unknown whether the Engrailed mutant can bind non-consensus sites. These differences in affinity and specificity may involve any of the differing residues between these homeodomains. Positions 50 and 54 have been shown to be involved in recognizing non- consensus DNA sites [Pellizzari et al, 1997], and it is possible that other residues are also involved. Within the third helix, position 52 of Engrailed is a lysine. In PITX2, Bicoid and

Fushi tarazu, this residue is an arginine. We do not know whether having lysine residues at both positions 50 and 52 could contribute to the unnaturally tight binding of EnQ50K, but this is a possibility.

107 The current study of the solution structure of the PITX2 homeodomain reveals possible fluctuating interactions between position 50 and the DNA. It is possible that this mobile side chain may allow the protein to sample multiple DNA binding sites, and bind to the non- consensus sites, though at a slightly lower affinity. It will be interesting in the future to determine if other natural K50 class proteins share similar properties with PITX2. Future studies will focus on analyzing Rieger mutants of the PITX2 homeodomain, and analyzing the structural features of this protein when bound to non-consensus DNA binding sites. This will allow a greater understanding of the roles of specific residues in consensus and non-consensus DNA binding, and a greater understanding of how proteins can recognize multiple DNA sites to activate transcription of genes.

108 CHAPTER 4: DNA Recognition by the Human PITX2 Homeodomain: Molecular Dynamics Simulation of Wild-Type and Rieger Mutant Complexes

Introduction

Transcription, replication, recombination, and repair of genes are key processes essential to cellular function. These activities all require the non-covalent interaction of DNA-binding proteins with DNA. While there have been a large number of structural, functional and thermodynamic studies reported, molecular recognition between protein and DNA is complex and not yet fully understood. Among the mechanisms that affect protein-DNA recognition are hydrogen bonding, hydration, conformational changes, electrostatic effects, and changes in dynamics. While much has been learned about some of these mechanisms, others require further study, particularly molecular dynamics. Molecular dynamics (MD) simulations are a very useful tool used to examine the protein-DNA interface. These simulations are routinely performed on proteins, nucleic acids, and macromolecular complexes to record trajectories with a length in the nanosecond range [Hansson et al, 2002]. MD simulations have also been used in the past to study homeodomain-DNA interactions [Billeter et al, 1996; Duan & Nilsson, 2002; Gutmanas &

Billeter, 2004]. These studies looked at the importance of water at the protein-DNA interface and the role of residue 50 in discriminating between different DNA sequences. Another recent application of MD simulations has been in examining the properties of mutant proteins that are involved in human disease [Wu et al, 2003].

While MD simulations have been performed on Q50 homeodomains in which the glutamine was replaced with a lysine residue [Gutmanas & Billeter, 2004; Duan & Nilsson,

2002], these simulations have not been performed on a native K50 homeodomain structure.

Preliminary data has indicated that K50 may be mobile, based on line broadening, which indicates possible motion on a µs-ms timescale (See Chapter 3). Previous MD simulations on an

109 Antp Q50K mutant have indicated that the lysine may fluctuate between the two guanine residues adjacent to the TAAT core, preferring to spend 1 ns at each guanine before switching

[Gutmanas & Billeter, 2004]. NMR studies [Tsao et al, 1994] and molecular dynamics simulations [Billeter et al, 1996; Gutmanas & Billeter, 2004] have provided strong indications of a dynamic, fluctuating environment encompassing some of the key amino acid side chains at the interface, most importantly, the side chains of asparagine 51 and of the position 50 residue.

Billeter and co-workers [1996] proposed that, at least in the case of Antennapedia, the homeodomain achieves specificity through a fluctuating network of short-lived contacts that allow it to recognize DNA without the entropic cost that would result if side chains were immobilized upon DNA binding. It will be interesting to look at MD data on a native K50 structure, to determine how K50 behaves in its natural context (refer to Table 1.2).

The study by Wu et al [2003] used the Generalized Born model to represent solvent effects, rather than an explicit water model. In this study, we used explicit water in comparing mutant complexes to the wild-type complex and in looking at differences in hydration.

Hydration water is predominantly on the macromolecular surface, but a small number of water molecules may be located in the interior of complexes. In looking at protein-DNA complexes at an atomic level, previous studies have shown that contacts between DNA and protein can be explained in terms of direct hydrogen bonds, water-mediated hydrogen bonds, van der Waals, electrostatic, and hydrophobic contacts [Jayaram & Jain, 2004; Reddy et al, 2001]. Water molecules are observed quite often at the interface between protein and DNA in crystal structures, the classic case being the trp repressor-operator complex [Janin, 1999; Otwinowski et al, 1988; Schwabe, 1997]. These studies presented the idea of water molecules acting as extensions of protein side chains in mediating interactions with DNA. NMR studies of the Antp

110 homeodomain indicate that at least two side chains at the protein-DNA interface are close to water molecules [Fraenkel and Pabo, 1998]. A molecular dynamics simulation of the complex in a water bath implies the presence of up to five water molecules in the cavity at the interface between the recognition helix and the DNA [Billeter et al, 1996]. Water molecules at the protein-DNA interface are also visible in X-ray crystal structures at high resolution [Hirsch &

Aggarwal, 1995; Li et al, 1995; Wilson et al, 1995]. In the paired (S50) structure, position 50 forms hydrogen bonds to two water molecules, which then hydrogen bond to DNA bases

[Wilson et al, 1995]. This interfacial water is likely to allow for mobility of the protein side chains at the interface. Water-mediated contacts between protein and DNA have been seen in many other protein-DNA complexes [Kosztin et al, 1997; Davey et al, 2002; Chai et al, 2003;

Chiu et al, 2002]. In homeodomains, these water-mediated contacts are considered an essential contributor to specificity [Billeter, 1996; Wolberger, 1993]. Molecular dynamics simulations have the capability of determining where water-bridging interactions are occurring, and what the residence times of water molecules at the protein-DNA interface are. The water molecules allow for amino acids that are otherwise out of reach of the bases to contribute to a network of hydrogen bonds that is believed to be important for the specificity of DNA recognition

[Schwabe, 1997]. Both the protein and DNA specifically recognize each other's hydration pattern, and there appears to be a fluctuating network of bonding interactions between protein,

DNA and water molecules. This may reduce the entropic cost of forming a rigid interface.

The mutations found in the PITX2 homeodomain in Rieger syndrome have been presented in previous chapters. A previous study presented a threading analysis of some of the

Rieger mutant PITX2 homeodomains [Banerjee-Basu & Baxevanis, 1999]. This study was able to predict the structures of the mutants, but was unable to provide detailed information about

111 differences in protein-DNA recognition, including differences in specific contacts, and differences in hydration of the protein-DNA interface. In addition, this threading analysis found that the structures of the mutants did not vary significantly from wild-type, so the differences in binding and activation may be due to the properties mentioned above. MD study of these mutants can provide further insight into the specific structural role of each residue and into the cause of Rieger syndrome and other diseases involving DNA-binding proteins.

In this study, MD simulations were performed on the wild-type PITX2 homeodomain-

DNA complex, and the Rieger mutant complexes in H2O. The results indicate that motion of the

K50 side chain is on a time scale longer than what we can see by MD simulations, which is in contrast to results seen for a Q50K mutant. The results also show differences in DNA recognition between wild-type and mutant complexes. The role of hydration in this differential

DNA recognition is discussed.

Overall Behavior of the Trajectories

All of the trajectories were analyzed to ensure that the simulations proceeded properly, without huge variations in overall energy or, especially in the case of the wild-type complex, large variations in RMSD over the course of the simulations. This analysis provides a validation of the calculations. The RMSDs of the complexes versus the simulation time are shown in

Figure 4.1. These RMSDs are for both the protein and the DNA in the complex. The RMSDs of all of the complexes stay very close to the starting structure. The wild-type complex only drifts from the starting complex by about 0.55 Å, while the mutant complexes differ by no more than

0.75 Å. The RMSDs go up and then level off, indicating that the structures settle into a stable structure and then this stable structure remains intact for the remainder of the simulations. The stability of the energy of the systems versus simulation time was analyzed for all of the

112 trajectories, and this is shown in Figure 4.2. The total energies of each of the systems remain quite low during the course of the simulations. The energies fluctuate slightly during the simulations due to thermal fluctuations. It is interesting to note that while the wild-type and most of the mutant complexes have similar total energies, the energy of the R52C complex is significantly higher. The energy of the V45L complex is also slightly higher. These results will be discussed further below. RMSD (Å)

Time (ps)

Figure 4.1: RMSD values of the MD snapshots versus the starting NMR structure for all of the residues of both the protein and the DNA, using the protein backbone and the heavy atoms of the DNA bases, as a function of simulation time. The color scheme is as follows: wild-type complex is black, T30P is red, R31H is green, V45L is blue, R46W is magenta, K50Q is brown, R52C is yellow, and R53P is violet.

113 Etot (kcal/mol)

Time (ps)

Figure 4.2: Total energy levels of the MD snapshots as a function of simulation time. The color scheme is the same as in Figure 4.1.

Analysis of the Molecular Dynamics of the Wild-Type PITX2 HD-DNA Complex

As discussed above, there have been no molecular dynamics simulations performed previously for any native K50 homeodomain structure. The NMR structure of the PITX2 homeodomain is the first structure to be solved of a native K50 class homeodomain (See Chapter

3), and single amino acid substitutions in this domain are known to cause Rieger syndrome.

Therefore, to gain a better understanding of how these mutations cause disease, we performed

MD simulations on the wild-type and mutant complexes. To form a basis for comparison of the mutants, the results of the wild-type simulation were analyzed first. The average structure of the wild-type PITX2 HD-DNA complex during the simulation was analyzed using the program

NUCPLOT [Luscombe et al, 1997] to obtain a list of protein-DNA contacts seen in this complex during the MD simulation. A summary of these contacts can be seen in Figure 4.3. These contacts are discussed in great detail in Chapter 3. The unique nature of K50 must be

114 emphasized again, in that it forms hydrogen bonds with the O6 and N7 atoms of the two guanines on the antisense strand, adjacent to the TAAT core DNA binding site. R31, R46, R52 and R53, which are all mutated in Rieger syndrome, are also shown to make contacts with the

DNA in the wild-type complex.

Figure 4.3: NUCPLOT diagram of the average structure of the wild-type protein-DNA complex during the 2 ns trajectory, showing the contacts between protein side-chains and specific DNA bases. Atom names correspond with the nomenclature used in the PDB file. NZ corresponds to Nζ, NH1 and NH2 to Nε1 and Nε2, OG1 to Oδ1, ND2 to Nδ2, and NE2 to Nε2.

115

Hydration and water-mediated protein-DNA contacts.

When a protein recognizes and binds to a DNA molecule, it is recognizing not only the

DNA itself, including its charge and sequence properties, but it is also recognizing the hydration pattern of the water that is on the surface of the DNA in its natural solution-state environment.

In examining the molecular dynamics of a molecular system, water plays an important role in mediating the interplay between various members of the system. In a protein-DNA interaction, water will intercalate within the various crevices of the protein and the DNA, and even more importantly, the water is capable of entering the protein-DNA interface and conferring contacts between the protein and the DNA. These contacts are referred to as water bridges, and are vitally important in mediating the binding of the protein to the DNA.

Any MD analysis of a protein-DNA complex must take into account the role of hydration, in order to gain a full understanding of the dynamics of the complex. The hydration of the wild-type complex can be seen in Table 4.1. The term Nw refers to the average number of water molecules in the vicinity (within 3 Å) of the given residue and side chain. The term Nps refers to the maximum number of picoseconds that a given water molecule resides in the vicinity of the residue. This reflects water residence times. These numbers can be used to describe the presence of long-lived water molecules and increased hydration of the complex. The positions of the various side chains mutated in Rieger syndrome, and discussed below are illustrated in

Figure 3.12 for the NMR structure of PITX2. It was decided to investigate the hydration of these side chains mutated in Rieger syndrome for each simulation, because these are residues that all serve important roles in the homeodomain’s structure and function, so we wanted to determine if altering one residue can cause differences in hydration at the other residues. A view of the

116 positions of the side chains of interest can be seen in Figures 3.12, 4.4, and 4.6. Hydration at positions 16 and 45 were not included in the table, as these residues are shielded within the hydrophobic core, and there was virtually no hydration seen near these residues in any of the simulations. The results for the wild-type complex show that the water molecules in the vicinity of R52 and R53 have a long maximum lifetime in this trajectory. Further analysis by NUCPLOT

[Luscombe et al, 1997] indicates that there are many possible water bridges involving these two residues. R52 makes water-mediated contacts to the phosphates of T71 and A72 in this trajectory. The R52 side chain interacts with the DNA backbone and is not within the major groove. This allows for more water molecules to be in the vicinity of this side chain. Water bridges are also present between R53 Nε1 and Nε2, and T86 (O4 and N3 atoms) and A85 (N1 and

N6 atoms). The R53 side chain is in the major groove, and less water can access this side chain, which would explain why there are less water molecules near this side chain than R52. K50 has a relatively small number of water molecules in close proximity, and these have much shorter maximum lifetimes. Because K50 is believed to make direct hydrogen bonds with the O6 and

N7 atoms of G84 and G83, water-mediated contacts probably do not play as large of a role as with Q50 homeodomains. T30 is located in helix 2. Biochemical studies have indicated that this residue may be involved in activation, rather than DNA binding. It is relatively accessible to water, and these water molecules tend to have short residence times. R31 is also located in helix

2, and makes contacts with the DNA backbone. A water-bridge was seen in the NUCPLOT analysis for this residue’s Nε1 with the phosphate of G81. R46 is located at the major groove of the DNA and makes contacts with the phosphate backbone. It has a small number of water molecules in contact with it, and these molecules appear to have short residence times. A water bridge is seen between R46 Nε1 and the phosphate of DNA residue G83, and this bridge is short-

117 lived. A summary of some of the water at the interface can be viewed pictorially in Figure 4.4.

As you can see, the water molecules are present between the protein side chains and the DNA. A

view of the trajectory of a single water molecule can be seen in Figure 4.5, to give a better idea

of the pathway a water molecule can follow during the course of a trajectory. Within the 2 ns

trajectory, this particular water molecule starts well away from the complex, and ends up within

the major groove between the protein and DNA, before leaving again.

Table 4.1. Hydration in the WT and mutant trajectories

Trajectories WT T30P R31H V45L R46W K50Q R52C R53P Residue Nw Nps Nw Nps Nw Nps Nw Nps Nw Nps Nw Nps Nw Nps Nw Nps 50 2.81 230 6.34 880 3.20 1050 3.56 570 1.52 510 2.39 360 2.98 600 5.74 1360 30 5.81 110 6.91 20 4.69 140 6.20 110 5.28 100 5.98 80 8.12 220 7.64 90 31 3.89 790 5.98 1350 2.89 1950 2.84 350 7.31 1850 5.29 1130 5.21 1540 3.43 1890 46 2.58 110 4.27 510 3.89 1950 2.86 210 8.63 120 4.08 180 2.80 890 3.37 1310 52 8.01 1980 10.22 1740 8.54 1130 7.56 1990 7.51 200 8.44 1970 4.36 370 6.96 1740 53 3.71 1990 8.18 700 2.46 480 7.44 990 2.83 430 3.63 360 3.55 460 6.26 110

aWater molecules are within 3.0 Å of an atom from the protein side chains listed in this column. b For each complex the average number (Nw) of water molecules in the vicinity of the given protein side chain and the maximum number of picoseconds (Nps) that a given water molecule resides in this vicinity are given.

Figure 4.4: Outline of some of the water molecules at the protein- DNA interface that are in close contact with residues R31, K50 and R53, and form water-bridging interactions between the protein (particularly R31 and R53) and the DNA.

118

Figure 4.5: Snapshots of a single water molecule’s trajectory during the 2 ns simulation time, taken every 200 ps. The color scale starts at black, then goes to cyan, red, blue, green, grey, yellow, orange, pink, and purple. After 1.4 ns, this water molecule spends about 200 ps at the protein-DNA interface before leaving again.

Properties of Lys50.

The position of the K50 side chain in the average structure of the wild-type PITX2 HD-

DNA complex during the trajectory can be seen in Figure 4.6a. As can be seen, the lysine side chain extends upwards and interacts with the two guanines adjacent to the TAAT core DNA binding site (G83 and G84). A MD simulation performed by Gutmanas & Billeter [2004] of a

Q50K mutant of the Antp homeodomain indicated that over the first nanosecond of the trajectory, the lysine at position 50 prefers to form a hydrogen bond with base pair 5 (G84 in this study), while during the second nanosecond, it prefers to form a hydrogen bond with base pair 6

(G83 in this study). These results indicate that K50 fluctuates between the two base pairs, remaining at each for 1 ns. Our previous NMR results (see Chapter 3) show line broadening for the K50 side chain resonances, which indicates motion on a timescale longer than this (µs-ms).

119 This was one reason we wanted to closely re-create the MD simulations run by Gutmanas and

Billeter [2004], so we could determine if we obtained similar results for the native complex. In fact, we did not see the same clear switch in the lysine side chain binding to one guanine, and then the other. Throughout the entire trajectory, we saw hydrogen bonding to both G83 and

G84, with possible hydrogen bonding to G83 N7 and O6 present almost the entire time.

Hydrogen bonding to G84 O6 is present for 100% of the simulation, but hydrogen bonding to

G84 N7 is only present for 1.5% of the simulation. As can be seen in Figure 4.6b, K50 Nζ appears to be closer to the G83 N7, and this distance is in the range for hydrogen bonding interactions (3-4 Å), while the distance to G84 N7 is significantly greater (~4.5-5.0 Å). But when looking at the distance between Nζ and the O6 atoms of both guanines, the distances are essentially the same for G83 and G84 (~3.0 Å), which indicates no preference for one guanine over the other. To determine if the K50 side chain itself is mobile, we analyzed the angles of the

K50 side chain throughout the simulation (Figure 4.6c). While there are fluctuations in the angles, there are no large-scale changes in the chi angles throughout the 2 ns simulation, such as those seen by Gutmanas and Billeter [2004]. One explanation for these results is that K50 in the

PITX2 homeodomain is behaving similarly to the Q50K mutant, only on a slower timescale that cannot be seen in these simulations; therefore, the movement shows up as multiple populations of the side chain. This hypothesis is supported by previous data, which has shown line broadening for the K50 side chain [see Chapter 3]. Here, the side chain appears to have a slight preference for hydrogen bonding to G83 for the 2 ns trajectory. Because hydrogen bonds can be seen to both guanines during certain steps of the trajectory, it is possible that the lysine side chain is forming transient simultaneous hydrogen bonds to both guanines. This formation of two hydrogen bonds by one lysine side chain has been seen in the past [Mandel-Gutfreund et al,

120 1995], but would probably require the side chain to be less mobile than what we see with K50.

The actual state of the K50 side chain will be further clarified in the future with NMR experiments examining side chain dynamics.

(a)

(b) Distance (Å) Distance

Time (ps)

121

(c) Angle (º)

500 1000 1500 2000 Time (ps)

Figure 4.6: Properties of K50 during the MD simulation. (a) Positioning of the K50 side-chain in the average structure of the wild-type protein DNA complex during the 2ns trajectory. (b) Distance (Å) between K50 NZ and the N7 atoms of G83 (black) and G84 (red), and between K50 NZ and the O6 atoms of G83 (green) and G84 (blue) during the course of the 2 ns simulation. (c) Dihedral angles of the K50 side-chain during the course of the simulation. χ1 (black), χ2 (blue), χ3 (red), and χ4 (green).

Analysis of Mutant Complexes

Many differences were seen between the wild-type complex and the mutant complexes.

Differences were seen in level of hydration, water residence time, energy levels, side chain position, and protein-DNA contacts. These differences are illustrated in Table 4.1 and Figures

4.7-4.9, and described below for each complex. A simulation was not performed for the L16Q or

K50E complexes, as these are the only complexes in which it is known that there is no DNA binding or activation seen with the consensus DNA site in biochemical studies [Saadi et al,

122 2001]. The complexes focused upon are the ones that are believed to still have some DNA binding activity, albeit, at a lower level. The threading analysis performed previously for the

PITX2 homeodomain [Banerjee-Basu & Baxevanis, 1999] found that the mutant complexes, except for L16Q, all had similar threading scores to the wild-type complex, and therefore, they hypothesized that the overall tertiary structure was similar to wild-type for each of the mutants.

Therefore, differences in structure that lead to the differences in function may be localized to differences in protein-DNA interactions and hydration of the protein-DNA interface, and these are the things that are focused upon in the following discussion.

T30P Simulation.

Position 30 is in helix 2. Previous biochemical studies of the T30P mutant have indicated that this mutant can still bind DNA almost as well as wild-type, but the activation function is severely reduced [Amendt et al, 1998]. In the current analysis, we found a greater level of hydration around residues 30, 31, 46, 50, and 52, compared to the case of the wild-type homeodomain. The residence times were greater for residues 31, 46 and 50. Analysis of the protein-DNA contacts shows that contacts between R44 Nε1 and the phosphate backbone of T74, and between R53 Nε1 and the N7 atom of A85 are missing. There is a new possible non-bonded contact between R31 Cδ and the phosphate of G82. There are new possible water-mediated contacts between R46 and the phosphate backbone between G81 and G83. The addition of new contacts between R31 and the DNA, and new possible water-mediated contacts may compensate for the loss of other contacts, and this may explain why DNA-binding affinity is not lowered for this complex.

123 R31H Simulation.

R31 is in helix 2 and makes a contact with the DNA backbone (Figure 4.3). Replacement of this arginine with a histidine creates a much bulkier side chain at the protein-DNA interface, which likely creates steric hindrance. Analysis of the protein-DNA contacts indicates that the histidine is capable of contacting the phosphate of G82, which is to be expected based on its positive charge, but this interaction is only present in a small number of the snapshots of the trajectory (~25%). Levels of hydration are very similar in this complex as compared to the wild- type complex (Table 4.1). In contrast to the wild-type complex, maximum residence times for the water molecules are much higher near residues 31, 46, and 50. These residence times are lower for residues 52 and 53. Interestingly, while residence times are lower for R52, there is the addition of many possible water-mediated contacts between R52 and the phosphate of residue

C70. Analysis of protein-DNA contacts indicates a new contact between R52 Nε1 and T71 O3’.

The contact between N51 and A73 N7 is removed and there is a new non-bonded contact between N51 Nδ2 and C8 of A72. This slightly different positioning of the protein on the DNA may explain the lowering of DNA binding affinity. The increase in water residence times at the protein-DNA interface may be due to a destabilization of the protein-DNA interaction, with water-mediated interactions becoming more important at stabilizing the interaction with DNA.

V45L Simulation.

Position 45 is located in helix 3 and points towards the interior of the homeodomain, making up part of the hydrophobic core of the protein. The extension of the side chain by a methylene group is likely to create steric hindrance within the core of this protein, causing a destabilization of the homeodomain. Results of this simulation show that this extension of the side chain causes the backbone of the third helix to be pushed 1.15 Å outwards from helices 1

124 and 2. Levels of hydration are very similar to wild-type for the V45L mutant complex, with the exception of R53 where the level is significantly higher. Maximum water residence times are raised for positions 46 and 50, and lowered for positions 31 and 53. This complex has a much lower number of water-mediated contacts. Water-mediated contacts seen between K55 and the phosphate backbone of C70 are missing. All water-mediated contacts between R53 and T86 are missing. Instead, there are new water bridges between R53 and the phosphate of G84. The total energy for the complex during the simulation is raised compared to most of the other complexes.

The direct protein-DNA contacts for this complex can be seen in Figure 4.7. There are new contacts between W48 Cζ3 and the phosphate of A72, and N51 Nδ2 and A72 C8. The contact between R44 and the T74 phosphate is missing. All of the contacts between R53 and residues 85 and 86 are missing, and there is a new contact between R53 and the phosphate of G84. There is a contact missing between R52 and the phosphate of A72. The contact between R57 Nε and the phosphate of G84 is missing, as well as the contact between K50 Nζ and G83 O6. There is a new contact between R31 Nε1 and the phosphate of G81. The many differences in protein-DNA contacts likely explain the lowering in DNA binding seen for this mutant complex. What is interesting about this mutant is that while DNA binding is lowered, activation is greatly increased [Priston et al, 2001]. The lowered DNA binding is easily seen from the discussion of the many altered protein-DNA contacts above. But how is activation increased, when there is a lowered DNA binding? The possibility exists that there is still enough of a favorable interaction with the DNA for a lowered amount of binding to be seen, but that this highly destabilized interaction causes the complex to have a higher energy (see Figure 4.2), and causes residues to be exposed to possible interactions with transcriptional activators that otherwise would be

125 buried. This hypothesis will be explored further in the future, in structural and functional studies of this mutant in the full-length PITX2 protein.

Figure 4.7: NUCPLOT diagram of the protein-DNA contacts in the average structure of the V45L mutant protein-DNA complex during the 2 ns trajectory. Atom nomenclature is described in the caption for Figure 4.3.

126 R46W Simulation.

R46 is located in the third helix of the PITX2 homeodomain and makes contacts with the

DNA backbone of G82 and G83 (Figure 4.3). Figure 4.8 illustrates an overlay of the wild-type and R46W average structures during the simulation, with position 46 highlighted. In the wild- type case, the positively-charged arginine extends towards the negatively charged DNA backbone to make contacts. In the case of the mutant, replacement of the arginine with a tryptophan causes a bulky, hydrophobic side chain to be present at the protein-DNA interface, which disrupts the favorable interaction the arginine had with the DNA. Levels of hydration are higher for positions 31 and 46, with maximum water residence times being higher for positions

31 and 50, and lower for positions 52 and 53. In analyzing protein-DNA contacts, contacts between R2 Nε2 and the phosphate of A88, and R53 Nε1 and A85 N7 are missing. There are new contacts between Y25 OH and G82 O4’, and between M28 N and the phosphate of G82. The contact between R31 and the phosphate of G82 is shifted to G81. The contact between position

46 and G82 is obviously missing.

Figure 4.8: Overlay of wild-type and R46W mutant complexes, to illustrate the differences in side-chain positions of the two difference side- chains, and to show how tryptophan would cause an unfavorable steric interaction at the protein-DNA interface. 127

K50Q Simulation.

In the K50Q simulation, water residence times are higher for residues 31 and 46.

Maximum water residence times are higher for residue 31 and lower for residue 53. Analysis of protein-DNA contacts shows a new contact between R3 Nε and Nε2 and T87 O2, with the contact between the phosphate of A88 and R2 Nε2 missing. The direct hydrogen bond between N51 and

A73 is also missing. There are new contacts between Y25 Cε2 and G82 O3’, M28 N and the phosphate of G82, and the contact between R31 and the phosphate of G82 is shifted to G81.

There are new water-mediated contacts between the phosphate of G89 and R3 and R5. There is a new water-mediated contact between R3 Nε2 and T87. In this simulation, it appears that Q50

Oε1 is making water-mediated contacts with G83 and G84. It is unknown what the biochemical properties of this mutant are.

R52C Simulation.

Position 52 is located in the third helix and makes contacts with the phosphates of residues T71 and A72 (Figure 4.3). Replacement of this arginine with a cysteine disrupts this interaction. In our MD simulation for the R52C mutant, levels of hydration are higher for positions 30 and 31, and lower for residue 52. Maximum water residence times are higher for residues 30, 31, 46 and 50, and lower for residues 52 and 53. The total energy of the system for this complex is much higher than for the other complexes (Figure 4.2). Analysis of protein-DNA contacts indicates that the contact between R2 Nε2 and the phosphate of A88 is missing. The contact between R53 Nε1 and N7 of A85 is missing as well. The loss of contacts with no compensation by addition of new contacts may explain the higher energy of this complex.

128 R53P Simulation.

R53 is located in helix 3 and makes contacts with A85 N7 and T86 O4 in the wild-type complex simulation. In the simulation for R53P, levels of hydration are higher for residues 30,

46, 50 and 53, and lower for residue 52. Maximum water residence times are higher for residues

31, 46, and 50 and much lower for residue 53. The protein-DNA contacts for this mutant complex are summarized in Figure 4.9. There are new contacts between K50 O and A73 N6, and between A54 and the bases of C70. The contact between R3 Nε2 and T87 O2 is missing, along with the contact between R44 and A73. There are new contacts involving Y25, M28 and

R31 contacting the DNA backbone between residues 80 and 83. There are new possible water bridges between R46 and the phosphate of G82. There is also a new nonbonded contact between

D27 and the phosphate of G82.

129

Figure 4.9: NUCPLOT diagram of the protein-DNA contacts in the average structure of the R53P mutant protein-DNA complex during the 2 ns trajectory. Atom nomenclature is described in the caption for Figure 4.3.

Discussion

Specific interactions between protein and DNA depend largely on the complementarity of the binding surfaces of each macromolecule, which includes favorable intermolecular interactions, such as direct hydrogen bonds, water-mediated hydrogen bonds, van der Waals

130 contacts, electrostatic interactions, and hydrophobic contacts. There is evidence that many of these interactions are not static in homeodomains, but that there is movement of protein side chains in making interactions with the DNA [Billeter et al, 1996; Tsao et al, 1994; Gutmanas &

Billeter, 2004]. Some degree of side chain mobility at the protein-DNA interface would be expected to confer an entropic advantage for binding to the DNA, as the side chain would not be immobilized. In addition to consideration of mobility of the interacting surfaces, the role of water in protein-DNA interactions must be considered. Water molecules appear to act not only as a foundation to improve the complementarity of the interacting surfaces, but also as an assistant to reduce the entropic expense that arises when a highly specific macromolecular interaction requires a large number of interactions [Billeter et al, 1996]. Homeodomains provide good model systems for exploring these aspects of protein-DNA interactions.

The simulations characterized in this study may shed some light on the mechanisms that govern the specificity of homeodomains, and the structural roles that mutated residues play in causing human disease. Previous studies have hypothesized that motion plays a role in the way that K50 recognizes the DNA consensus site [Gutmanas & Billeter, 2004; Duan & Nilsson,

2002]. The molecular dynamics study that made this hypothesis looked at a Q50K mutant of the

Antp homeodomain, and they hypothesized that the K50 side chain spends 1 ns at each guanine adjacent to the TAAT core sequence, before switching to the other guanine [Gutmanas &

Billeter, 2004]. Results for a mutant may not apply to the native case. As shown with our mutants in this study, differences in residues at one position can cause many differences at other residues throughout the protein sequence. Therefore, the residue differences between

Antennapedia and PITX2 could cause many differences in the molecular dynamics of a lysine at position 50. In our previous study in which we presented the NMR solution structure of the

131 PITX2 homeodomain, we saw line broadening of the side chain resonances for K50, which indicates possible motion on a µs-ms timescale. Our current results in this MD study indicate that motion of the K50 side chain is on a scale longer than the 2 ns trajectory presented here. In addition, hydration levels were greater for the K50 side chain in the current study than the Q50K mutant. Because the K50 side chain makes direct hydrogen bonds with the guanines adjacent to the TAAT core DNA sequence, water-mediated contacts appear to not play as large of a role, as with Q50 homeodomains. For the wild-type Q50 homeodomain Antennapedia, water molecules spend a maximum residence time of 1.5 ns near position 50, and are shown to make water- bridges between the Q50 side chain and the DNA [Gutmanas & Billeter, 2004; Duan & Nilsson,

2002]. For the K50 homeodomain PITX2, the maximum water residence time is only 230 ps, which is comparable to the 367 ps seen for the Q50K mutant of Antennapedia [Gutmanas &

Billeter, 2004]. Therefore, one of the major differences between Q50 homeodomains and K50 homeodomains appears to be the importance of water in mediating the specificity of the homeodomain.

Nine missense mutations are found within the PITX2 homeodomain in Rieger syndrome and closely-related disorders. To try to elucidate how these mutations could affect the hydration of the PITX2 homeodomain and particularly the specific interactions with DNA, we have used

MD simulations over a 2 ns trajectory. Some of the complexes can still bind DNA, but at significantly lower levels than the wild-type, with many of the protein-DNA contacts being lost at the expense of DNA binding (R31H, V45L, R46W, R52C, R53P). It appears that with some mutations, the mutant complexes still bind DNA by using the additional DNA contacts that are still present, and supplement these interactions with further water-mediated contacts with the

DNA (R31H, R53P). Other mutant complexes do not have a way to compensate for the contact

132 that went away, and do not bind the TAATCC binding site at all (K50E). The mutant complex

V45L has mutations involving residues that are not involved in DNA binding, but are involved in forming the hydrophobic core of the protein. The results of this study show that this mutant loses many favorable interactions with the DNA, due probably to instability in the formation of the tertiary structure, which then is unable to properly dock onto the DNA. We will want to study the V45L mutant further in the future, as this mutant has the interesting biochemical properties of having less DNA binding, but a greatly heightened activation function. In our study, the total energy of this system was raised in relation to most of the other complexes, and this complex had a great number of missing and altered interactions with the DNA and with water-bridging molecules. Overall, mutant complexes appear to have lowered DNA binding levels due to a lower number of favorable interactions with the DNA. The complexes that are still able to bind DNA often do so by having a greater number of water-mediated contacts with the DNA.

During the 2 ns simulation time, the majority of the water molecules had residence times that were shorter than 100 ps. Some of the water molecules had much longer residence times, as long as 99.5% of the simulation time for water molecules near residues 52 and 53 in the wild- type simulation. Figure 4.5 illustrates the path of a single water molecule that spends around 200 ps in the protein-DNA interface of the wild-type simulation, showing how far a single water molecule can travel during the 2 ns trajectory. This MD data is in agreement with earlier NMR measurements for the Antennapedia homeodomain, in which a few water molecules have long residence times, in the nanosecond range [Qian et al, 1993]. The coupling of direct protein-DNA interactions, and water-mediated protein-DNA interactions is interesting. It seems that each are vitally important to the binding of the protein to the DNA, and when mutations occur that alter

133 the protein, differences in both the direct and water-mediated contacts are seen throughout the protein and the DNA. Throughout the simulation time of the wild-type and mutant complexes, there is no detectable change in any of the global structural characteristics of the complex, such as RMSD from starting structure and energy levels. But there are many differences between wild-type and mutant complexes when looking at the level of particular side chains in terms of their orientations, interactions with DNA, and hydration levels and residence times.

In conclusion, MD simulations suggest that lysine at position 50 in the PITX2 homeodomain contacts the two guanines on the antisense DNA strand adjacent to the TAAT core

DNA binding site, using motion on a time scale greater than the 2 ns trajectory utilized here.

Missense mutations in this homeodomain that lead to Rieger syndrome and closely related disorders cause differences in direct protein-DNA contacts, levels of hydration, and water residence times. There are also differences involving water-mediated contacts between the protein and the DNA.

134 CHAPTER 5: Thesis Summary and Future Directions

PITX2 is a transcription factor that is found in many developing tissues in vertebrate embryos. It has been shown to be expressed in the brain, heart, pituitary, mandibular and maxillary regions, eye, gut, limbs, and umbilicus [Semina et al, 1996; Gage et al, 1997;

Mucchielli et al, 1997; Hjalt et al, 2000]. Mutations in PITX2 are known to be a cause of Rieger syndrome [Xia et al, 2004; Priston et al, 2001; Lines et al, 2004; Phillips, 2002; Kulak et al,

1998], and many of these mutations result in single amino acid substitutions within the PITX2 homeodomain [Priston et al, 2001; Lines et al, 2004; Phillips, 2002; Kulak et al, 1998]. The

PITX2 homeodomain is a member of the K50 class of homeodomains, which have a lysine at position 50 and recognize a consensus DNA sequence of TAATCC [Hanes & Brent, 1989]. The only K50 homeodomain structure determined previously is an X-ray crystal structure of an altered specificity mutant, Engrailed Q50K (EnQ50K) [Tucker-Kellogg et al, 1997]. An issue concerning EnQ50K is the observation that it binds to the consensus TAATCC site with an unusually high affinity, which approaches the picomolar range [Ades & Sauer, 1994]. A KD was determined for the Q50K mutant of the Fushi tarazu homeodomain, and this value was found to be 0.63 nM [Percival-Smith et al, 1990], which is a much lower affinity than the EnQ50K mutant. The KD for the PITX2 homeodomain alone was found to be 2.6 +/- 0.38 nM.

In this thesis, the solution structure of the PITX2 homeodomain bound to its consensus

DNA site (TAATCC) has been determined by NMR spectroscopy. Although structures of several homeodomain/DNA complexes have been determined, this is the first structure of a native K50 class homeodomain. Analysis of the NMR structure of the PITX2 homeodomain indicates that the lysine at position 50 makes contacts with two guanines on the antisense strand of the DNA, adjacent to the TAAT core DNA sequence, consistent with the structure of

135 EnQ50K. Our evidence suggests that this side chain may make fluctuating interactions with the

DNA, which is complementary to the crystal data for EnQ50K. There are differences in the tertiary structure between the native K50 structure and that of EnQ50K, which may explain differences in affinity and specificity between these proteins. The information provided in this thesis will form the basis for many future studies focused in three different areas: nonconsensus

DNA binding, side chain dynamics of the K50 and other side chains, and analysis of Rieger mutant proteins.

Studies have shown that the PITX2 homeodomain can recognize DNA sites that deviate from the consensus site, and these sites are physiologically relevant [Dave et al, 2000; Yuan et al, 1999; Hjalt et al, 2001; Espinoza et al, 2002]. A list of these sites is presented in Table 1.3.

While it is unknown whether the EnQ50K mutant can bind nonconsensus DNA sites, a Q50K mutant of the Fushi tarazu homeodomain is unable to bind any non-consensus DNA binding sites tested [Zhao et al, 2000], which indicates that studies of nonconsensus DNA site recognition by

K50 class homeodomains must focus on the native members. Future structural studies will focus on analyzing the structures of the PITX2 homeodomain bound to multiple nonconsensus DNA sites. It will be interesting to determine how the PITX2 homeodomain adjusts itself structurally to bind multiple DNA sites. It will also be interesting to determine what allows for the PITX2 homeodomain to bind nonconsensus sites, while the Q50K mutant of the Fushi tarazu homeodomain is unable to bind any nonconsensus DNA sites. The structure presented in this thesis of the PITX2 homeodomain bound to the consensus DNA site will be used as the basis for these comparisons. As described in Chapter 1, amino acids at positions 47, 50, 51 and 54 are important in DNA recognition by homeodomains, and it may be the combination of residues at these positions that allows for these homeodomains to recognize a multitude of DNA sites

136 [Gruschus et al, 1997; Clarke, 1995]. Position 54 may be involved in recognition of variant

DNA sites [Dave et al, 2000; Gruschus et al, 1997]. It has been shown in the past that there is a functional interaction between residues at positions 47 and 51 [Pomerantz & Sharp, 1994]. A functional interaction has also been shown to exist between amino acids at positions 50 and 54

[Pellizzari et al, 1997], particularly that methionine is never present at position 54 when a lysine is at position 50. Position 47 is usually hydrophobic, position 51 is almost always asparagine, and position 54 varies [Billeter, 1996]. Bicoid and PITX2 have different combinations of residues at these positions when compared to each other (See Table 1.2), yet both recognize nonconsensus sites. But, Bicoid and PITX2 have different combinations of residues at these positions when compared to other well-studied homeodomains, which indicates that these two homeodomains can recognize nonconsensus sites partially because these two combinations of residues allow for them to do so. And as has been shown by Dave et al [2000], Bicoid and

PITX2 do not recognize the same nonconsensus sites, and each has its own unique properties in this sense. It will be interesting in the future to compare structures of these two homeodomains bound to nonconsensus sites to determine if there are any common characteristics.

We have performed molecular dynamics (MD) simulations on the solution structure of

PITX2 bound to its consensus DNA site. This is the first molecular dynamics simulation of a native K50 homeodomain. The results show a number of long-lived water molecules in the vicinity of R52 and R53, which form water-bridging interactions with the DNA. A number of water molecules are also shown to be in the vicinity of other arginines in contact with the DNA, and near K50. The results indicate that motion of the K50 side chain is on a time scale longer than what we can see by MD simulations, which is in contrast to results seen for a Q50K mutant previously. The line-broadening seen in these structural studies and described in Chapter 3 for

137 the K50 side chain also indicates that there is possibly motion of this side chain in interacting with the DNA. Future studies will focus on performing NMR dynamics experiments to analyze the possible motional properties of the K50 side chain and the timescale of this motion, to determine exactly how this side chain may fluctuate in interacting with positions 5 and 6 of the consensus site, and how this motion may be involved in its DNA recognition function and in the thermodynamics of the homeodomain-DNA interaction. The dynamics of the arginine side chains that are so important in contacting the DNA will also be examined, which will provide information on motion at the protein-DNA interface, and may paint a clearer picture of how motion of side chains is important in binding and recognition of DNA binding sites. This motion may also be important in recognition of nonconsensus sites.

Mutations in the human PITX2 gene are responsible for Rieger syndrome, an autosomal dominant disorder. Analysis of the residues mutated in Rieger syndrome indicates that many of these residues are involved in DNA binding, while others are involved in formation of the hydrophobic core of the protein. In this thesis, we analyzed the structural roles of residues that are mutated in Rieger syndrome, and performed molecular dynamics simulations of mutant complexes. The results of these simulations were compared to the wild-type case, and there are many differences in levels of hydration, water residence times, energy levels, water bridging interactions, and direct protein-DNA interactions. These results provide further insight into the mechanism by which K50 homeodomains bind DNA, and how Rieger mutations cause severe phenotypic consequences. Future studies will focus on making single site mutations to create

Rieger mutant homeodomains, and then performing structural studies on these mutants to determine more directly how these mutations alter the structure of the homeodomain and its interaction with DNA. These studies could provide information into how these mutants cause

138 disease, and into possible treatment options for not only this disease, but also the many others that are caused by mutations in homeodomains. We will initially focus on the mutant proteins that are known to be more stable and have some DNA-binding activity (see Table 1.1). In particular, we would like to analyze the V45L mutant, which has the unusual property of having a lowered DNA binding activity, but a greatly heightened activation function [Priston et al,

2001]. We would like to determine how this mutation changes the structure of the homeodomain itself in such a way to produce this phenotype. It may be necessary to determine the structure of the full-length PITX2 protein, and examine this mutant in the context of the full-length protein, which would include the transcriptional activation domains.

Unlike other homeodomains that are stable in the free form [Tsao et al, 1994; Qian et al,

1994b; Damante et al, 1994; Carra & Privalov, 1997; Otting et al, 1988; Yamamoto et al, 1992], the PITX2 homeodomain is unstable in the absence of DNA in that it irreversibly aggregates at micromolar concentrations, which suggests a possible lack of stable tertiary structure in the free form. This may be due to slightly different hydrophobic interactions within the core of the protein, and the absence of other stabilizing interactions such as the salt bridge linking residues

19 and 30, which can be present in most homeodomains [Iurcu-Mustata et al, 2001], but is not possible in PITX2. Because some of the Rieger mutant proteins have lower or no DNA binding activity, we would like to determine experimental conditions that would make the free PITX2 protein more stable at high concentrations, and possibly determine the structure of the protein in the free form. This would allow us to examine how the homeodomain changes its structure in binding the DNA. Experimental conditions that stabilize the free PITX2 homeodomain are also likely to stabilize some of the Rieger mutants with lowered DNA binding, which would allow for efficient structure determination of these proteins.

139 CHAPTER 6: Literature Cited

Ades, S.E., & Sauer, R.T. Differential DNA-binding specificity of the Engrailed homeodomain:

the role of residue 50. Biochemistry 33, 9187-9194 (1994)

Aishima, J., & Wolberger, C. Insights into nonspecific binding of homeodomains from a

structure of MATα2 bound to DNA. Proteins 51, 544-551 (2003).

Akarsu, A.N., Akhan, O., Sayli, B.S., Sayli, U., Baskaya, G., & Sarfarazi, M. A large Turkish

kindred with syndactyly type II (synpolydactyly). 2. Homozygous phenotype? J. Med. Genet.

32, 435-441 (1995).

Alward, W. L. M., Semina, E.V., Kalenak, J.W., Heon, E., Sheth, B.P., Stone, E.M., & Murray,

J.C. Autosomal dominant iris hypoplasia is caused by a mutation in the Rieger syndrome

(RIEG/PITX2) gene. Am. J. Opthalmol. 125, 98-100 (1998).

Amendt, B. A., Sutherland, L. B., Semina, E. V. & Russo, A. F. The molecular basis of Rieger

syndrome. J. Biol. Chem. 273, 20066-20072 (1998).

Amendt, B. A., Sutherland, L. B. & Russo, A. F. Multifunctional role of the Pitx2 homeodomain

protein C-Terminal tail. Mol. Cell. Biol. 19, 7001-7010 (1999).

Arakawa, H., Nakamura, T., Zhadanov, A.B., Fidanza, V., Yano, T., Bullrich, F., Shimizu, M.,

Blechman, J., Mazo, A., Canaani, E., & Croce, C.M. Identification and characterization of

the ARP1 gene, a target for the human acute leukemia ALL1 gene. Proc. Natl. Acad. Sci. 95,

4573-4578 (1998).

Banerjee-Basu, S. & Baxevanis, A. D. Threading analysis of the Pitx2 homeodomain: Predicted

structural effects of mutations causing Rieger syndrome and iridogoniodysgenesis. Hum.

Mut. 14, 312-319 (1999).

Banerjee-Basu, S., Moreland, T., Hsu, B.J., Trout, K.L., & Baxevanis, A.D. The homeodomain

140 resource: 2003 update. Nuc. Acids Res. 31, 304-306 (2003).

Berleth, T., Burri, M., Thoma, G., Bopp, D., Richstein, S., Frigerio, G., Noll, M., & Nusslein-

Volhard, C. The role of localization of bicoid RNA in organizing the anterior pattern of the

Drosophila embryo. EMBO 7, 1749-1756 (1988).

Billeter, M., Qian, Y., Otting, G., Muller, M., Gehring, W.J., & Wuthrich, K. Determination of

the three-dimensional structure of the Antennapedia homeodomain from Drosophila in

solution by 1H nuclear magnetic resonance spectroscopy. J. Mol. Biol. 214, 183-197 (1990).

Billeter, M., & Wuthrich, K. Model studies relating nuclear magnetic resonance data with the

three-dimensional structure of protein-DNA complexes. J. Mol. Biol. 234, 1094-1097 (1993).

Billeter, M. Homeodomain-type DNA recognition. Prog. Biophys. Mol. Biol. 66, 211-225

(1996).

Billeter, M., Guntert, P., Luginbuhl, P. & Wuthrich, K. Hydration and DNA recognition by

homeodomains. Cell 85, 1057-1065 (1996).

Bisgrove, B. W. & Yost, H. J. Classification of left-right patterning defects in zebrafish, mice,

and humans. Am. J. Med. Genet. 101, 315-323 (2001).

Boncinelli, E. Homeobox genes and disease. Curr. Op. Genet. Dev. 7, 331-337 (1997).

Borrow, J., Shearman, A.M., Stanton, V.P. Jr., Becher, R., Collins, T., Williams, A.J., Dube, I.,

Katz, F., Kwong, Y.L., Morris, C., Ohyashiki, K., Toyama, K., Rowley, J., & Housman, D.E.

The t(7;11)(p15;p15) translocation in acute myeloid leukaemia fuses the genes for

nucleoporin NUP98 and class I homeoprotein HOXA9. Nat. Genet. 12, 159-167 (1996).

Breeze, A.L. Isotope-filtered NMR methods for the study of biomolecular structure and

interactions. Prog. NMR Spectros. 36, 323-372 (2000).

Brennan, R.G., & Matthews, B.W. Structural basis of DNA-protein recognition. Trends

141 Biochem. Sci. 14, 286-290 (1989).

Briata, P., Ilengo, C., Corte, G., Moroni, C., Rosefeld, M.G., Chen, C., & Gherzi R. The Wnt/β-

catenin-->Pitx2 pathway controls the turnover of Pitx2 and other unstable mRNAs. Mol.

Cell 12, 1201-1211 (2003).

Burz, D. S., Rivera-Pomar, R., Jackle, H. & Hanes, S. D. Cooperative DNA-binding by Bicoid

provides a mechanism for threshold-dependent gene activation in the Drosophila embryo.

EMBO J. 17, 5998-6009 (1998).

Campione, M., Ros, M.A., Icardo, J.M., Piedra, E., Christoffels, V.M., Schweickert, A., Blum,

M., Franco, D., & Moorman, A.F. Pitx2 expression defines a left cardiac lineage of cells:

evidence for atrial and ventricular molecular isomerism in the iv/iv mice. Dev. Biol. 231,

252-264 (2001).

Carra, J.H., & Privalov, P.L. Energetics of folding and DNA binding of the MAT alpha 2

homeodomain. Biochemistry 36, 526-535 (1997).

Case, D.A., Pearlman, D.A., Caldwell, J.W., Cheatham, T.E., Ross, W.S., Simmerling, C.L.,

Darden, T.A., Merz, K.M., Stanton, R.V., Cheng, A.L., Vincent, J.J., Crowley, M., Tsui, V.,

Radmer, R.J., Dvan, Y., Pitera, J., Massova, I., Seibel, G.L., Singh, U.C., Weiner, P.K., &

Kalman, P.A. AMBER7, University of California, San Francisco (1996).

Ceska, T.A., Lamers, M., Monaci, P., Nicosia, A., Cortese, R., & Suck, D. The X-ray structure

of an atypical homeodomain present in the rat liver transcription factor LFB1/HNF1 and

implications for DNA binding. EMBO J. 12, 1805-1810 (1993).

Chai, J., Wu, J.W., Yan, N., Massague, J., Pavletich, N.P., & Shi, Y. Features of a Smad3 MH1-

DNA complex. J. Biol. Chem. 278, 20327-20331 (2003).

Chisholm, E.A., & Chudley, A.E. Autosomal dominant iridogoniodysgenesis with associated

142 somatic anomalies: four-generation family with Rieger's syndrome. Br. J. Ophthalmol. 67,

529-534 (1983).

Chiu, T.K., Sohn, C., Dickerson, R.E., & Johnson, R.C. Testing water-mediated DNA

recognition by the Hin recombinase. EMBO J. 21, 801-814 (2002).

Clarke, N.D. Covariation of residues in the homeodomain sequence. Protein Sci. 4, 2269-2278

(1995).

Clarke, N.D., Kissinger, C.R., Desjarlais, J., Gilliland, G.L. & Pabo, C.O. Structural studies of

the engrailed homeodomain. Protein Sci. 3, 1779-1787 (1994).

Clevers, H. Inflating numbers by Wnt. Mol. Cell 10, 1260-1261 (2002).

Cox, M., van Tilborg, P.J., de Laat, W., Boelens, R., van Leeuwen, H.C., van der Vliet, P.C., &

Kaptein, R. Solution structure of the Oct-1 POU homeodomain determined by NMR and

restrained molecular dynamics. J. Biomol. NMR 6, 23-32 (1995).

Cox, C. J., Espinoza, H.M., McWilliams, B., Chappell, K., Morton, L., Hjalt, T.A., Semina,

E.V., & Amendt, B.A. Differential regulation of gene expression by PITX2 isoforms. J. Biol.

Chem. 277, 25001-25010 (2002).

Crawford, M. J., Lanctot, C., Tremblay, J.J., Jenkins, N., Gilbert, D., Copeland, N., Beatty, B., &

Drouin, J. Human and murine PTX1/Ptx1 gene maps to the region for Treacher Collins

Syndrome. Mamm. Genome 8, 841-845 (1997).

Damante, G., Tell, G., Leonardi, A., Fogolari, F., Bortolotti, N., DiLauro, R., & Formisano, S.

Analysis of the conformation and stability of rat TTF-1 homeodomain by circular dichroism.

FEBS Lett. 354, 293-296 (1994).

Dave, V., Zhao, C., Yang, F., Tung, C. & Ma, J. Reprogrammable recognition codes in Bicoid

homeodomain-DNA interaction. Mol. Cell. Biol. 20, 7673-7684 (2000).

143 Davey, C.A., Sargent, D.F., Luger, K., Maeder, A.W., & Richmond, T.J. Solvent mediated

interactions in the structure of the nucleosome core particle at 1.9Å resolution. J. Mol. Biol.

319, 1097-1113 (2002).

Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J., & Bax, A. NMRPipe: A

multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277-

293 (1995).

D'Elia, A.V., Tell, G., Paron, I., Pellizzari, L., Lonigro, R., & Damante, G. Missense mutations

of human homeoboxes: a review. Hum. Mutat. 18, 361-374 (2001).

Doig, A.J., & Sternberg, M.J.E. Side-chain conformational entropy in protein folding. Protein

Sci. 4, 2247-2251 (1995).

Driever, W. & Nusslein-Volhard, C. A gradient of Bicoid protein in Drosophila embryos. Cell

54, 83-93 (1988a).

Driever, W. & Nusslein-Volhard, C. The Bicoid protein determines position in the Drosophila

embryo in a concentration-dependent manner. Cell 54, 95-104 (1988b).

Driever, W. & Nusslein-Volhard, C. The Bicoid protein is a positive regulator of hunchback

transcription in the early Drosophila embryo. Nature 337, 138-143 (1989).

Duan, J., & Nilsson, L. The role of residue 50 and hydration water molecules in homeodomain

DNA recognition. Eur. Biophys. J. 31, 306-316 (2002).

Espinoza, H. M., Cox, C. J., Semina, E. V. & Amendt, B. A. A molecular basis for differential

developmental anomalies in Axenfeld-Rieger syndrome. Hum. Mol. Genet. 11, 743-753

(2002).

Espinoza, H.M., Gannga, M., Vadlamudi, U., Martin, D.M., Brooks, B.P., Semina, E.V., Murray,

J.C., & Amendt, B.A. Protein Kinase C phophorylation modulates N- and C-terminal

144 regulatory activities of the PITX2 homeodomain protein. Biochemistry (2005).

Essner, J. J., Branford, W. W., Zhang, J. & Yost, H. J. Mesendoderm and left-right brain, heart

and gut development are differentially regulated by pitx2 isoforms. Development 127, 1081-

1093 (2000).

Farrow, N.A., Muhandiram, R., Singer, A.U., Pascal, S.M., Kay, C.M., Gish, G., Shoelson, S.E.,

Pawson, T., Forman-Kay, J.D., & Kay, L.E. Backbone dynamics of a free and a

phosphopeptide-complexed Src Homology 2 domain studied by 15N NMR relaxation.

Biochemistry 33, 5984-6003 (1994).

Fausti, S., Weiler, S., Cuniberti, C., Hwang, K.J., No, K.T., Gruschus, J.M., Perico, A.,

Nirenberg, M., & Ferretti, J.A. Backbone dynamics for the wildtype and a double

H52R/T56W mutant of the vnd/NK-2 homeodomain from Drosophila melanogaster.

Biochemistry 40, 12004-12012 (2001).

Fernandez, C., Szyperski, T., Billeter, M., Ono, A., Iwai, H., Kainosho, M., & Wuthrich, K.

Conformational changes of the BS2 operator DNA upon complex formation with the

Antennapedia homeodomain studied by NMR with 13C/15N-labeled DNA. J. Mol. Biol.

292, 609-617 (1999).

Flomen, R.H., Gorman, P.A., Vatcheva, R., Groet, J., Barisic, I., Ligutic, E., Sheer, D., &

Nizetic, D. Rieger Syndrome locus: a new reciprocal translocation t(4;12)(q25;q15) and a

deletion del(4)(q25q27) both break between markers D4S2945 and D4S193. J. Med. Genet.

34, 191-195 (1997).

Flomen, R. H., Vatcheva, R., Gorman, P.A., Baptista, P.R., Groet, J., Barisic, I., Ligutic, I., &

Nizetic, D. Construction and analysis of a sequence-ready map in 4q25: Rieger syndrome

can be caused by haploinsufficiency of RIEG, but also by breaks ~90kb

145 upstream of this gene. Genomics 47, 409-413 (1998).

Foster, M.P., Wuttke, D.S., Radhakrishnan, I., Case, D.A., Gottesfeld, J.M., & Wright, P.E.

Domain packing and dynamics in the DNA complex of the N-terminal zinc fingers of

TFIIIA. Nat. Struct. Biol. 4, 605-608 (1997).

Fraenkel, E. & Pabo, C. O. Comparison of X-ray and NMR structures for the Antennapedia

homeodomain-DNA complex. Nat. Struct. Biol. 5, 692-697 (1998).

Fraenkel, E., Rould, M. A., Chambers, K. A. & Pabo, C. O. Engrailed Homeodomain-DNA

Complex at 2.2 A Resolution: A detailed view of the interface and comparison with other

Engrailed structures. J. Mol. Biol. 284, 351-361 (1998).

Franco, D., & Campione, M. The role of Pitx2 during cardiac development. Trends Cardiovasc.

Med. 13, 157-163 (2003).

Frohnhofer, H. G. & Nusslein-Volhard, C. Organization of anterior pattern in the Drosophila

embryo by the maternal gene bicoid. Nature 324, 120-125 (1986).

Gage, P. J. & Camper, S.A. Pituitary homeobox 2, a novel member of the bicoid-related family

of homeobox genes, is a potential regulator of anterior structure formation. Hum. Mol. Genet.

6, 457-464 (1997).

Gage, P. J., Suh, H. & Camper, S.A. Dosage requirement of Pitx2 for development of multiple

organs. Development 126, 4643-4651 (1999a).

Gage, P. J., Suh, H. & Camper, S. A. The bicoid-related Pitx gene family in development.

Mamm. Genome 10, 197-200 (1999b).

Ganga, M., Espinoza, H.M., Cox, C.J., Morton, L., Hjalt, T.A., Lee, Y., & Amendt, B.A. PITX2

isoform specific regulation of ANF expression: synergism and repression with Nkx2.5. J.

Biol. Chem. 278, 22437-22445 (2003).

146 Gao, Q., & Finkelstein, R. Targeting gene expression to the head: the Drosophila orthodenticle

gene is a direct target of the Bicoid morphogen. Development 125, 4185-4193 (1998).

Gehring, W. Cell heredity and changes of determination in cultures of imaginal discs in

Drosophila melanogaster. J. Embryol. Exp. Morphol. 15, 77-111 (1966).

Gehring, W. J., Affolter, M. & Burglin, T. Homeodomain proteins. Annu. Rev. Biochem. 63,

487-526 (1994).

Gehring, W.J., Muller, M., Affolter, M., Percival-Smith, A., Billeter, M., Qian, Y.Q., Otting, G.,

& Wuthrich, W. The structure of the homeodomain and its functional implications, Trends

Genet. 6, 323-329 (1990).

Goddard, T.D., Kneller, D.G. SPARKY 3, University of California, San Francisco.

Graham, A. & McGonnell, I. Limb development: Farewell to arms. Curr. Biol. 9, 368-370

(1999).

Grant, R. A., Rould, M. A., Klemm, J. D. & Pabo, C. O. Exploring the role of glutamine 50 in

the homeodomain-DNA interface: crystal structure of Engrailed (Gln50-->Ala) complex at

2.0 A. Biochemistry 39, 8187-8192 (2000).

Green, P. D., Hjalt, T.A., Kirk, D.E., Sutherland, L.B., Thomas, B.L., Sharpe, P.T., Snead, M.L.,

Murray, J.C., Russo, A.F., & Amendt, B.A. Antagonistic regulation of Dlx2 expression by

PITX2 and Msx2: implications for tooth development. Gene Expr. 9, 265-281 (2001).

Gruschus, J.M., Tsao, D.H., Wang, L.H., Nirenberg, M., & Ferretti, J.A. Interactions of the

vnd/NK-2 homeodomain with DNA by nuclear magnetic resonance spectroscopy: basis of

binding specificity. Biochemistry 36, 5372-5380 (1997).

Grzesiek, S., & Bax, A. Improved 3D triple-resonance NMR techniques applied to a 31-Kda

protein. J. Magn. Reson. 96, 432-440 (1992a).

147 Grzesiek, S., & Bax, A. Correlating backbone amide and side-chain resonances in larger

proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114, 6291-6293

(1992b).

Guntert, P., Mumenthaler, C., & Wuthrich, K. Torsion angle dynamics for NMR structure

calculation with the new program DYANA. J. Mol. Biol. 273, 283-298 (1997).

Guntert, P., Qian, Y.Q., Otting, G., Muller, M., Gehring, W., & Wuthrich, K. Structure

determination of the Antp (C39-->S) homeodomain from nuclear magnetic resonance data in

solution using a novel strategy for the structure calculation with the programs DIANA,

CALIBA, HABAS and GLOMSA. J. Mol. Biol. 217, 531-540 (1991).

Gutmanas, A., & Billeter, M. Specific DNA recognition by the Antp homeodomain: MD

simulations of specific and nonspecific complexes. Proteins: Struct. Funct. Bioinfo. 57, 772-

782 (2004).

Hanes, S. D. & Brent, R. DNA specificity of the Bicoid activator protein is fetermined by

homeodomain recognition helix residue 9. Cell 57, 1275-1283 (1989).

Hansson, T., Oostenbrink, C., & van Gunsteren WF. Molecular dynamics simulations. Curr.

Opin. Struct. Biol. 12, 190-196 (2002).

Harrison, S.C. A structural taxonomy of DNA-binding domains. Nature 353, 715-719 (1991).

Heon, E., Sheth, B.P., Kalenak, J.W., Sunden, S.L., Streb, L.M., Taylor, C.M., Alward, W.L.,

Sheffield, V.C., & Stone, E.M. Linkage of autosomal dominant iris hypoplasia to the region

of the Rieger Syndrome locus (4q25). Hum. Mol. Genet. 4, 1435-1439 (1995).

Herrmann, T., Guntert, P., & Wuthrich, K. Protein NMR structure determination with automated

NOE assignment using the new software CANDID and the torsion angle dynamics algorithm

DYANA. J. Mol. Biol. 319, 209-227 (2002).

148 Hirsch, J.A., & Aggarwal, A.K. Structure of the even-skipped homeodomain complexed to AT-

rich DNA: new perspectives on homeodomain specificity. EMBO J. 14, 6280-6291 (1995).

Hjalt, T. A. & Murray, J. C. The human BARX2 gene: genomic structure, chromosomal

localization, and single nucleotide polymorphisms. Genomics 62, 456-459 (1999).

Hjalt, T. A., Semina, E. V., Amendt, B. A. & Murray, J. C. The Pitx2 protein in mouse

development. Dev. Dyn. 218, 195-200 (2000).

Hjalt, T. A., Amendt, B. A. & Murray, J. C. PITX2 regulates procollagen lysyl hydroxylase

(PLOD) gene expression: implications for the pathology of Rieger syndrome. J. Cell Biol.

152, 545-552 (2001).

Holmberg, J., Liu, C., & Hjalt, T.A. PITX2 gain-of-function in Rieger syndrome eye model.

Am. J. Path. 165, 1633-1641 (2004).

Ikura, M., Kay, L.E., & Bax, A. A novel approach for sequential assignment of H-1, C-13, and

N-15 spectra of larger proteins: heteronuclear triple-resonance 3-dimensional NMR

spectroscopy: application to calmodulin. Biochemistry 29, 4659-4667 (1990).

Iurcu-Mustata, G., Ban Belle, D., Wintjens, R., Prevost, M., & Rooman, M. Role of salt bridges

in homeodomains investigated by structural analyses and molecular dynamics simulations.

Biopolymers 59, 145-159 (2001).

Janin, J. Wet and dry interfaces: The role of solvent in protein-protein and protein-DNA

recognition. Structure 7, R277-R279 (1999).

Jayaram, B., & Jain, T. The role of water in protein-DNA recognition. Annu. Rev. Biophys.

Biomol. Struct. 33, 343-361 (2004).

Jones, S., van Heyningen, P., Berman, H.M., & Thornton, J.M. Protein-DNA interactions: a

structural analysis. J. Mol. Biol. 234, 1070-1083 (1999).

149 Kathiriya, I. S. & Srivastava, D. Left-right asymmetry and cardiac looping. Am. J. Med. Genet.

97, 271-279 (2000).

Katz, L.A., Schultz, R.E., Semina, E.V., Torfs, C.P., Krahn, K.N., & Murray, J.C. Mutations in

PITX2 may contribute to cases of omphalocele and VATER-like syndromes. Am. J. Med.

Genet. 130A, 277-283 (2004).

Kay, L.E., Keifer, P., & Saarinen, T. Pure absorption gradient enhanced heteronuclear single

quantum correlation spectroscopy with improved sensitivity. J. Am. Chem. Soc. 114, 10663-

10665 (1992).

Kay, L.E., Xu, G.Y., Yamazaki, T. Enhanced-sensitivity triple-resonance spectroscopy with

minimal H2O saturation. J. Magn. Reson. Ser. A 103, 129-133 (1994).

Kioussi, C., Briata, P., Baek, S.H., Rose, D.W., Hamblet, N.S., Herman, T., Ohgi, K.A., Lin, C.,

Gleiberman, A., Wang, J., Brault, V., Ruiz-Lozano, P., Nguyen, H.D., Kemler, R., Glass,

C.K., Wynshaw-Boris, A., & Rosenfeld, M.G. Identification of a Wnt/Dvl/B-Catenin ->

Pitx2 pathway mediating cell-type-specific proliferation during development. Cell 111, 673-

685 (2002).

Kissinger, C. R., Liu, B., Martin-Blanco, E., Kornberg, T. B. & Pabo, C. O. Crystal structure of

an Engrailed homeodomain-DNA complex at 2.8 A: a framework for understanding

homeodomain-DNA interactions. Cell 63, 579-590 (1990).

Kitamura, K., Miura, H., Yanazawa, M., Miyashita, T. & Kato, K. Expression patterns of Brx1

(Rieg gene), Sonic hedgehog, Nkx2.2, Dlx1 and Arx during zona limitans intrathalamica and

embryonic ventral lateral geniculate nuclear formation. Mech. Dev. 67, 83-96 (1997).

Kitamura, K., Miura, H., Miyagawa-Tomita, S., Yanazawa, M., Katoh-Fukui, Y., Suzuki, R.,

Ohuchi, H., Suehiro, A., Motegi, Y., Nakahara, Y., Kondo, S., & Yokoyama, M. Mouse

150 Pitx2 deficiency leads to anomalies of the ventral body wall, heart, extra- and periocular

and right pulmonary isomerism. Development 126, 5749-5758 (1999).

Klemm, J.D., Rould, M.A., Aurora, R., Herr, W. & Pabo, C.O. Crystal structure of the Oct-1

POU domain bound to an octamer site: DNA recognition with tethered DNA-binding

modules. Cell 77, 21-32 (1994).

Koradi, R., Billeter, M., & Wuthrich, K. MOLMOL: a program for display and analysis of

macromolecular structures. J. Mol. Graph. 14, 51-55 (1996).

Kornberg, T. B. Understanding the Homeodomain. J. Biol. Chem. 268, 26813-26816 (1993).

Kosztin, D., Bishop, T.C., & Shulten, K. Binding of the to DNA. The role of

waters. Biophys. J. 73, 557-570 (1997).

Kozlowski, K. & Walter, M. A. Variation in residual PITX2 activity underlies the phenotypic

spectrum of anterior segment developmental disorders. Hum. Mol. Genet. 9, 2131-2139

(2000).

Kulak, S. C., Kozlowski, K., Semina, E. V., Pearce, W. G. & Walter, M. A. Mutation in the

RIEG1 gene in patients with iridogoniodysgenesis syndrome. Hum. Mol. Genet. 7, 1113-

1117 (1998).

La Rosee, A., Hader, T., Taubert, H., Rivera-Pomar, R., & Jackle, H. Mechanism and Bicoid-

dependent control of hairy stripe 7 expression in the posterior region of the Drosophila

embryo. EMBO J. 16, 4403-4411 (1997).

Lamonerie, T., Tremblay, J.J., Lanctot, C., Therrien, M., Gauthier, Y., & Drouin, J.. Ptx1, a

bicoid-related homeobox TF involved in transcription of the pro-opiomelanocortin gene.

Genes Dev. 10, 1284-1295 (1996).

Lanctot, C., Moreau, A., Chamberland, M., Tremblay, M. L. & Drouin, J. Hindlimb patterning

151 and mandible development require the Ptx1 gene. Development 126, 1805-1810 (1999).

Laskowski, R.A., MacArthur, M.W., Moss, D.S., & Thornton, J.M. PROCHECK: a program to

check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283-291 (1993).

Laskowski, R.A., Rullmann, J.A.C., MacArthur, M.W., Kaptein, R., & Thornton, J.M. AQUA

and PROCHECK-NMR: programs for checking the quality of protein structures solved by

NMR. J. Biomol. NMR 8, 477-486 (1996).

Lebel, M., Gauthier, Y., Moreau, A. & Drouin, J. Pitx3 activates mouse tyrosine hydroxylase

promotor via a high-affinity binding site. J. Neurochem. 77, 558-567 (2001).

Lee, W., Revingtonn, M.J., Arrowsmith, C., & Kay, L.E. A pulsed field gradient isotope-filtered

3D 13C HMQC-NOESY experiment for extracting intermolecular NOE contacts in molecular

complexes. FEBS Lett. 350, 87-90 (1994).

Leiting, B., De Francesco, R., Tomei, L., Cortese, R., Otting, G., & Wuthrich, K. The three-

dimensional NMR-solution structure of the polypeptide fragment 195-286 of the

LFB1/HNF1 transcription factor from rat liver comprises a nonclassical homeodomain.

EMBO J. 12, 1797-1803 (1993).

Lewis, E.B. A gene complex controlling segmentation in Drosophila. Nature 276, 565-570

(1978).

Li, T., Stark, M.R., Johnson, A.D., & Wolberger, C. Crystal structure of the MATa1/MATα2

homeodomain heterodimer bound to DNA. Science 270, 262-269 (1995).

Lin, C.R., Kioussi, C., O’Connell, S., Briata, P., Szeto, D., Liu, F., Izpisua-Belmonte, J.C., &

Rosenfeld, M.G. Pitx2 regulates lung asymmetry, cardiac positioning and pituitary and tooth

morphogenesis. Nature 401, 279-282 (1999).

Liu, C., Liu, W., Lu, M., Brown, N. A. & Martin, J. F. Regulation of left-right asymmetry by

152 thresholds of Pitx2c activity. Development 128, 2039-2048 (2001).

Liu, W., Selever, J., Lu, M., & Martin, J.F. Genetic dissection of Pitx2 in craniofacial

development uncovers new functions in branchial arch morphogenesis, late aspects of tooth

morphogenesis and cell migration. Development 130, 6375-6385 (2003).

Lu, M.-F., Pressman, C., Dyer, R., Johnson, R. L. & Martin, J. F. Function of Rieger syndrome

gene in left-right asymmetry and craniofacial development. Nature 401, 276-278 (1999).

Luginbuhl, P., Szyperski, T., & Wuthrich, K. Statistical basis for the use of 13C chemical shifts

in protein structure determination. J. Magn. Reson. Ser. B. 109, 229-233 (1995).

Luscombe N.M., Laskowski R.A., & Thornton J.M. NUCPLOT: A program to generate

schematic diagrams of protein-nucleic acid interactions. Nuc. Acids Res. 25, 4940-4945

(1997).

Ma, X., Yuan, D., Diepold, K., Scarborough, T., & Ma, J. The Drosophila morphogenetic

protein Bicoid binds DNA cooperatively. Development 122, 1195-1206 (1996).

Ma, X., Yuan, D., Scarborough, T. & Ma, J. Contributions to gene activation by multiple

functions of Bicoid. Biochem. J. 338, 447-455 (1999).

Mammi, I., De Giorgio, P., Clementi, M. & Tenconi, R. Cardiovascular anomaly in Rieger

Syndrome: Heterogeneity or contiguity? Acta Ophthalmol. Scand. 76, 509-512 (1998).

Mandel-Gutfreund, Y., Schueler, O., & Margalit, H. Comprehensive analysis of hydrogen bonds

in regulatory protein-DNA complexes: in search of common principles, J. Mol. Biol. 253,

370-382 (1995).

Marcil, A., Dumontier, E., Chamberland, M., Camper, S.A., & Drouin, J. Pitx1 and Pitx2 are

required for development of hindlimb buds. Development 130, 45-55 (2003).

Marion, D., Driscoll, P.C., Kay, L.E., Wingfield, P.T., Bax, A., Gronenborn, A., & Clore, M.

153 Overcoming the overlap problem in the assignment of H-1-NMR spectra of larger proteins

by use of 3-dimensional heteronuclear H-1-N-15 Hartmann-Hahn multiple quantum

coherence and nuclear Overhauser multiple quantum coherence spectroscopy-application to

interleukin-1-beta. Biochemistry 28, 6150-6156 (1989).

McDonald, I.K., & Thornton, J.M. Satisfying hydrogen bonding potential in proteins. J. Mol.

Biol. 238, 777-793 (1994).

McGinnis, W., Garber, R. L., Wirz, J., Kuroiwa, A. & Gehring, W.J.A Homologous protein-

coding sequence in Drosophila homeotic genes and its conservation in other metazoans. Cell

37, 403-408 (1984).

Monsalve, M., Calles, B., Mencia, M., Rojo, F., & Salas, M. Binding of phage 29 protein p4 to

the early A2c promoter: recruitment of a repressor by the RNA polymerase. J. Mol. Biol.

283, 559-569 (1998).

Mucchielli, M., Mitsiadis, T.A., Raffo, S., Brunet, J., Proust, J., & Goridis, C. Mouse

Otlx2/RIEG expression in the odontogenic epithelium precedes tooth initiation and requires

mesenchyme-derived signals for its maintenance. Dev. Biol. 189, 275-284 (1997).

Muhandiram, D.R., & Kay, L.E. Gradient-enhanced triple-resonance 3-dimensional NMR

experiments with improved sensitivity. J. Magn. Reson. Ser. B 103, 203-216 (1994).

Muragaki, Y., Mundlos, S., Upton, J., & Olsen, B.R. Altered growth and branching patterns in

synpolydactyly caused by mutations in HOXD13. Science 272, 548-551 (1996).

Murray, J.C., Bennett, S.R., Kwitek, A.E., Small, K.W., Schinzel, A., Alward, W.L., Weber,

J.L., Bell, G.I., & Buetow, K.H. Linkage of Rieger Syndrome to the region of the epidermal

growth factor gene on . Nat. Genet. 2, 46-49 (1992).

Nakamura, T., Largaespada, D.A., Lee, M.P., Johnson, L.A., Ohyashiki, K., Toyama, K., Chen,

154 S.J., Willman, C.L, Chen, I.M., Feinberg, A.P., Jenkins, N.A., Copeland, N.G., &

Shaughnessy, J.D. Jr. Fusion of the nucleoporin gene NUP98 to HOXA9 by the chromosome

translocation t(7;11)(p15;p15) in human myeloid leukaemia. Nat. Genet. 12, 154-158 (1996).

Niessing, D., Dreiver, W., Sprenger, F., Taubert, H., Jackle, H., & Rivera-Pomar, R.

Homeodomain position 54 specifies transcriptional versus translational control by Bicoid.

Mol. Cell 5, 395-401 (2000).

Nishikawa, T., Okamura, H., Nagadoi, A., Konig, P., Rhodes, D., & Nishimura, Y. Solution

structure of a telomeric DNA complex of human TRF1. Structure 9, 1237-1251 (2001).

Otting, G., Qian, Y.Q., Muller, M., Affolter, M., Gehring, W., & Wuthrich, K. Secondary

structure determination for the Antennapedia homeodomain by nuclear magnetic resonance

and evidence for a helix-turn-helix motif. EMBO 7, 4305-4309 (1988).

Otting, G., Qian, Y.Q., Billeter, M., Muller, M., Affolter, M., Gehring, W.J., & Wuthrich, K.

Protein-DNA contacts in the structure of a homeodomain-DNA complex determined by

nuclear magnetic resonance spectroscopy in solution. EMBO J. 9, 3085-3092 (1990).

Otting, G., & Wuthrich, K. Heteronuclear filters in two-dimensional [1H,1H]-NMR

spectroscopy: combined use with isotope labeling for studies of macromolecular

conformation and intermolecular interactions. Quart. Rev. Biophys. 23, 39-96 (1990).

Otting, G., Liepinsh, E., Farmer, B.T. 2nd, & Wuthrich, K. Protein hydration studied with

homonuclear 3D 1H NMR experiments. J. Biomol. NMR 1, 209-215 (1991).

Otwinowski, Z., Schevitz, R.W., Zhang, R.G., Lawson, C.L., Jochiamiak, A., Marmorstein,

R.Q., Luisi, B.F., & Sigler, P.B. Crystal structure of trp repressor/operator complex at

atomic resolution. Nature 335, 321-329 (1988).

Palmer, A.G. 3rd. Dynamic properties of proteins from NMR spectroscopy. Curr. Opin.

155 Biotechnol. 4, 385-391 (1993).

Pearlman, D.A., Case, D.A., Caldwell, J.W., Ross, W.S., Cheatham, T.E., deBolt, S., Ferguson,

D., Seibel, G., & Kollman, P.A. AMBER, a computer program for applying molecular

mechanics, normal mode analysis, molecular dynamics and free energy calculations to

elucidate the structures and energies of molecules. Comp. Phys. Commun. 91, 1-41 (1995).

Pellizzari, L., Tell, G., Fabbro, D., Pucillo, C., & Damante, G. Functional interference between

contacting amino acids of homeodomains. FEBS Lett. 407, 320-324 (1997).

Percival-Smith, A., Muller, M., Affolter, M., & Gehring, W.J. The interaction with DNA of

wild-type and mutant fushi tarazu homeodomains. EMBO J. 9, 3967-3974 (1990).

Perveen, R., Lloyd, C., Clayton-Smith, J., Churchill, A., van Heyningen, V., Hanson, I., Taylor,

D., McKeown, C., Super, M., Kerr, B., Winter, R., & Black, G.C.M. Phenotypic variability

and asymmetry of Rieger syndrome associated with PITX2 mutations. Invest. Ophthalmol.

Vis. Sci. 41, 2456-2460 (2000).

Pervushin, K., Wider, G. & Wuthrich, K. Deuterium relaxation in a uniformly 15N-labeled

homeodomain and its DNA complex. J. Am. Chem. Soc. 119, 3842-3843 (1997).

Piper, D. E., Batchelor, A. H., Chang, C., Cleary, M. L. & Wolberger, C. Structure of a HoxB1-

Pbx1 heterodimer bound to DNA: role of the hexapeptide and a fourth homeodomain helix

in complex formation. Cell 96, 587-597 (1999).

Pomerantz, J. L. & Sharp, P. A. Homeodomain determinant of major groove recognition.

Biochemistry 33, 10851-10858 (1994).

Priston, M., Kozlowski, K., Gill, D., Letwin, K., Buys, Y., Levin, A.V., Walter, M.A., & Heon,

E. Functional analyses of two newly identified PITX2 mutants reveal a novel molecular

mechanism for Axenfeld-Rieger syndrome. Hum. Mol. Genet. 10, 1631-1638 (2001).

156 Qian, Y. Q., Billeter, M., Otting, G., Muller, M., Gehring, W.J., & Wuthrich, K. The structure of

the Antennapedia homeodomain determined by NMR spectroscopy in solution: comparison

with prokaryotic repressors. Cell 59, 573-580 (1989).

Qian, Y.Q., Otting, G., Billeter, M., Muller, M., Gehring, W., & Wuthrich, K. Nuclear magnetic

resonance spectroscopy of a DNA complex with the uniformly 13C-labeled Antennapedia

homeodomain and structure determination of the DNA-bound homeodomain. J. Mol. Biol.

234, 1070-1083 (1993).

Qian, Y. Q., Resendez-Perez, D., Gehring, W. J. & Wuthrich, K. The des(1-6) Antennapedia

homeodomain: comparison of the NMR solution structure and the DNA-binding affinity

with the intact Antennapedia homeodomain. Proc. Natl. Ac. Sci. 91, 4091-4095 (1994a).

Qian, Y.Q., Furukubo-Tokunaga, K., Resendez-Perez, D., Muller, M., Gehringn, W.J., &

Wuthrich, K. Nuclear magnetic resonance solution structure of the fushi tarazu

homeodomain from Drosophila and comparison with the Antennapedia homeodomain. J.

Mol. Biol. 238, 333-345 (1994b).

Qiu, M., Bulfone, A., Martinez, S., Meneses, J.J., Pedersen, R.A., & Rubenstein, J.L. Null

mutation of Dlx-2 results in abnormal morphogenesis of proximal first and second branchial

arch derivatives and abnormal differentiation in the forebrain. Genes Dev. 9, 2523-2538

(1995).

Quentien, M., Manfroid, I., Moncet, D., Gunz, G., Muller, M., Grino, M., Enjalbert, A., &

Pellegrini, I. Pitx factors are involved in basal and hormone-regulated activity of the human

prolactin promoter. J. Biol. Chem. 277, 44408-44416 (2002a).

Quentien, M., Pitoia, F., Gunz, G., Guillet, M., Enjalbert, A., & Pellegrini, I. Regulation of

Prolactin, GH, and Pit-1 gene expression in anterior pituitary by Pitx2: an approach using

157 Pitx2 mutants. Endocrinology 143, 2839-2851 (2002b).

Riise, R., Storhaug, K. & Brondum-Nielsen, K. Rieger syndrome is associated with PAX6

deletion. Acta Opthalmol. Scand. 79, 201-203 (2001).

Rivera-Pomar, R., Lu, X., Perrimon, N., Taubert, H. & Jackle, H. Activation of posterior gap

gene expression in the Drosophila blastoderm. Nature 376, 253-256 (1995).

Rivera-Pomar, R., Niessing, D., Schmidt-Ott, U., Gehring, W. J. & Jackle, H. RNA binding and

translational suppression by bicoid. Nature 379, 746-749 (1996).

Saadi, I., Semina, E.V., Amendt, B.A., Harris, D.J., Murphy, K.P., Murray, J.C., & Russo, A.F.

Identification of a dominant negative homeodomain mutation in Rieger syndrome. J. Biol.

Chem. (2001).

Saadi, I., Kuburas, A., Engle, J.J., & Russo, A.F. Dominant negative dimerization of a mutant

homeodomain protein in Axenfeld-Rieger syndrome. Mol. Cell. Biol. 23, 1968-1982 (2003).

Saenger, W. Principles of Nucleic Acid Structure, Springer-Vertag, New York (1984).

Sattler, M., Schleucher, J., & Griesinger, C. Heteronuclear multidimensional NMR experiments

for the structure determination of proteins in solution employing pulsed field gradients.

Prog. NMR Spectros. 34, 93-158 (1999).

Schott, O., Billeter, M., Leiting, B., Wider, G., & Wuthrich, K. The NMR solution structure of

the non-classical homeodomain from the rat liver LFB1/HNF1 transcription factor. J. Mol.

Biol. 267, 673-683 (1997).

Schubert, S.W., Kardash, E., Khan, M.A., Cheusova, T., Kilian, K., Wegner, M., &

Hashemolhossein, S. Interaction, cooperative promoter modulation, and renal colocalization

of GCMa and PITX2. J. Biol. Chem. 279, 50358-50365 (2004).

Schwabe, J.W.R. The role of water in protein-DNA interactions. Curr. Op. Struct. Biol. 7, 126-

158 134 (1997).

Scott, M.P., & Weiner, A.J. Structural relationships among genes that control development:

between the antennapedia, ultrabithorax and fushi tarazu loci of

Drosophila. Proc. Natl. Acad. Sci. USA 81, 4115-4119 (1984).

Scott, M. P., Tamkun, J. W. & Hartzell, G. W. The structure and function of the homeodomain.

Biochim. Biophys. Acta 989, 25-48 (1989).

Semina, E.V., Reiter, R., Leysens, N.J., Alward, ,W.L.M., Small, K.W., Datsonn, NN.A., Siegel-

Bartelt, J., Bierke-Nelson, D., Bitoun, P., Zabel, B.U., Cary, J.C., & Murray, J.C. Cloning

and characterization of a novel bicoid-related homeobox transcription factor gene, RIEG,

involved in Rieger syndrome. Nat. Genet. 14, 392-398 (1996a).

Semina, E.V., Datson, N.A., Leysens, N.J., Zabel, B.U., Carey, J.C., Bell, G.I., Bitoun, P.,

Lindgren, C., Stevenson, T., Frants, R.R., van Ommen, G., & Murray, J.C. Exclusion of

epidermal growth factor and high-resolution physical mapping across the Rieger syndrome

locus. Am. J. Hum. Genet. 59, 1288-1296 (1996b).

Semina, E. V., Reiter, R. S. & Murray, J. C. Isolation of a new homeobox gene belonging to the

Pitx/Rieg family: expression during lens development and mapping to the aphakia region on

mouse chromosome 19. Hum. Mol. Genet. 6, 2109-2116 (1997).

Semina, E. V., Ferrell, R.E., Mintz-Hittner, H.A., Bitounn, P., Alward, W.L.M., Reiter, R.S.,

Funkhauser, C., Daack-Hirsch, S., & Murray, J.C. A novel homeobox gene PITX3 is mutated

in families with autosomal-dominant cataracts and ASMD. Nat. Genet. 19, 167-170 (1998).

Semina, E. V., Murray, J. C., Reiter, R., Hrstka, R. F. & Graw, J. Deletion in the promoter region

and altered expression of Pitx3 homeobox gene in aphakia mice. Hum. Mol. Genet. 9, 1575-

1585 (2000).

159 Simon, M.D., Sato, K., Weiss, G.A., & Shokat, K.M. A phage display selection of engrailed

homeodomain mutants and the importance of residue Q50. Nucl. Acids Res. 32, 3623-3631

(2004).

Skelton, N.J., Palmer, A.G., Akke, M., Kordel, J., Rance, M., & Chazin, W.J. Practical aspects

of two-dimensional proton-detected 15N spin relaxation measurements. J. Magn. Res. Ser. B,

102, 253-264 (1993).

Slijper, M., Boelens, R., Davis, A.L., Konings, R.N., van der Marel, G.A., van Boom, J.H., &

Kaptein, R. Backbone and side chain dynamics of lac repressor headpiece (1-56) and its

complex with DNA. Biochemistry 36, 249-254 (1997).

Small, S., Blair, A., & Levine, M. Regulation of even-skipped stripe 2 in the Drosophila embryo.

EMBO J. 11, 4047-4057 (1992).

Smidt, M.P., van Schaick, H.S., Lanctot, C., Tremblay, J.J., Cox, J.J., van der Kleij, A.A.,

Wolterink, G., Drouin, J., & Burback, J.P. A homeodomain gene Ptx3 has highly restricted

brain expression in the mesencephalic dopaminergic neurons. Proc. Natl. Acad. Sci. USA 94,

13305-13310 (1997).

Spera, S., & Bax, A. Empirical correlation between protein backbone conformation and Cα and

Cβ 13C chemical shifts in protein structure determination. J. Am. Chem. Soc. 113, 5490-5492

(1991).

St. Amand, T. R., Zhang, Y., Semina, ,E.V., Zhao, X., Hu, Y., Nguyen, L., Murray, J.C., &

Chen, Y. Antagonistic signals between BMP4 and FGF8 define the expression of Pitx1 and

Pitx2 in mouse tooth-forming anlage. Dev. Biol. 217, 323-332 (2000).

Struhl, G., Struhl, K. & Macdonald, P.M. The gradient morphogen Bicoid is a concentration-

dependent transcriptional activator. Cell 57, 1259-1273 (1989).

160 Stuart, A.C., Borzilleri, K.A., Withka, J.M., & Palmer, A.G. Compensating for variations in the

1H-13C scalar coupling constants in isotope-filtered NMR experiments. J. Am. Chem. Soc.

121, 5346-5347 (1999).

Subramaniam, V., Jovin, T.M., & Rivera-Pomar, R.V. Aromatic amino acids are critical for

stability of the Bicoid homeodomain. J. Biol. Chem. 276, 21506-21511 (2001).

Suh, H., Gage, P. J., Drouin, J. & Camper, S. A. Pitx2 is required at multiple stages of pituitary

organogenesis: pituitary primordium formation and cell specification. Development 129,

329-337 (2002).

Szeto, D. P., Rodriguez-Esteban, C., Ryan, A.K., O’Connell, S.M., Liu, F., Kioussi, C.,

Gleiberman, A.S., Izpisua-Belmonte, J.C., & Rosenfeld, M.G.. Role of the Bicoid-related

homeodomain factor Pitx1 in specifying hindlimb morphogenesis and pituitary development.

Genes Dev. 13, 484-494 (1999).

Talluri, S., & Wagner, G. An optimized 3D NOESY-HSQC. J. Magn. Reson. Ser. B 112, 200-

205 (1996).

Tejada M.L., Jia Z., May D., & Deeley R.G. Determinants of the DNA-binding specificity of the

Avian homeodomain protein, AKR. DNA Cell Biol. 18, 791-804 (1999).

Tell, G., Acquaviva, R., Formisano, S., Fogolari, F., Pucillo, C., & Damante, G. Comparative

stability analysis of the thyroid transcription factor 1 and Antennapedia homeodomains:

evidence for residue 54 in controlling the structural stability of the recognition helix. Int. J.

Biochem. Cell Biol. 31, 1339-1353 (1999).

Thomas, B.L., Liu, J.K., Rubenstein, J.L., & Sharpe, P.T. Independent regulation of Dlx2

expression in the epithelium and mesenchyme of the first branchial arch. Development 127,

217-224 (2000).

161 Toro, R., Saadi, I., Kuburas A., Nemer, M., & Russo, A.F. Cell-specific activation of the Atrial

Natriuretic Factor promoter by PITX2 and MEF2A. J. Biol. Chem. 279, 52087-52094

(2004).

Toyota, M., Kopecky, K.J., Toyota, M.O., Jair, K., Willman, C.L., & Issa, J.J. Methylation

profiling in acute myeloid leukemia. Blood 97, 2823-2829 (2001).

Treisman, J., Gonczy, P., Vashishtha, M., Harris, E. & Desplan, C. A single amino acid can

determine the DNA binding specificity of homeodomain proteins. Cell 59, 553-562 (1989).

Tremblay, J.J., Goodyer, C.G., & Brouin, J. Transcriptional properties of Ptx1 and Ptx2

isoforms. Neuroendocrinology 71, 277-286 (2000).

Tron, A.E., Bertoncini, C.W., Palena, C.M., Chan, R.L., & Gonzalez, D.H. Combinatorial

interactions of two amino acids with a single base pair define target site specificity in plant

dimeric homeodomains proteins. Nuc. Acids Res. 29, 4866-4872 (2001).

Tsai, C., & Gergen, J.P. Gap gene properties of the pair-rule gene runt during Drosophila

segmentation. Development 120, 1671-1683 (1994).

Tsao, D.H., Gruschus, J.M., Wang, L.H., Nirenberg, M., & Ferretti, J.A. Elongation of helix III

of the NK-2 homeodomain upon binding to DNA: a secondary structure study by NMR.

Biochemistry 33, 15053-15060 (1994).

Tucker-Kellogg, L., Rould, M.A., Chambers, K.A., Ades, S.E., Sauer, R.T., & Pabo, C.O.

Engrailed (Gln50->Lys) homeodomain-DNA complex at 1.9A resolution: structural basis for

enhanced affinity and altered specificity. Structure 5, 1047-1054 (1997). van Heijenoort, C., Penin, F., & Guittet, E. Dynamics of the DNA binding domain of the fructose

repressor from the analysis of linear correlations between the 15N-1H bond spectral densities

obtained by nuclear magnetic resonance spectroscopy. Biochemistry 37, 5060-5073 (1998).

162 Voss, T.C., & Day, R.N. Editorial: Pitx-2 mutants and somatolactotroph gene regulation—

deciphering the combinatorial code. Endocrinology 143, 2836-2838 (2002).

Vuister, G.W., & Bax, A. Quantitative J correlations: a new approach for measuring

homonuclear three-bond J(HNHα) coupling constants in 15N-enriched proteins. J. Am. Chem.

Soc. 115, 7772-7777 (1993).

Walter, M.A., Mirzayans, F., Mears, A.J., Hickey, K., & Pearce, W.G. Autosomal-dominant

iridogoniodysgenesis and Axenfeld-Rieger syndrome are genetically distinct. Ophthalmology

103, 1907-1915 (1996).

Wang J., Cieplak P., & Kollman P.A. How well does a restrained electrostatic potential (RESP)

model perform in calculating conformational energies of organic and biological molecules?

J. Comput. Chem. 21, 1049-1074 (2000).

Wang, Y., Zhao, H., Zhang, X., & Feng, H. Novel identification of a four-base-pair deletion

mutation in Pitx2 in a Rieger syndrome family. J. Dent. Res. 82, 1008-1012 (2003).

Watanabe, K., & Lambowitz, A.M. High-affinity binding site for a group II intron-encoded

reverse transcriptase/maturase within a stem-loop structure in the intron RNA. RNA 10,

1433-1443 (2004).

Wei, Q. & Adelstein, R. S. Pitx2a expression alters actin-myosin cytoskeleton and migration of

HeLa cells through Rho GTPase signaling. Mol. Biol. Cell 13, 683-697 (2002).

Westmoreland, J. J., McEwen, J., Moore, B. A., Jin, Y. & Condie, B. G. Conserved function of

Caenorhabditis elegans UNC-30 and mouse Pitx2 in controlling GABAergic neuron

differentiation. J. Neurosc. 21, 6810-6819 (2001).

Wilson, D.S., Geunther, B., Desplan, C., & Kurian, J. High resolution crystal structure of a

paired (pax) class homeodomain dimer on DNA. Cell 82, 709-719 (1995).

163 Wilson, D.S., Sheng, G., Jun, S., & Desplan, C. Conservation and diversification in

homeodomain-DNA interactions: A comparative genetic analysis. Proc. Natl. Acad. Sci.

USA 93, 6886-6891 (1996).

Wilson, J.E., Connell, J.E., Schlenker, J.D., & Macdonald, P.M. Novel genetic screen for genes

involved in posterior body patterning in Drosophila. Dev. Genet. 19, 199-209 (1996).

Wimmer, E.A., Simpson-Brose, M., Cohen, S.M., Desplan, C., & Jackle, H. Trans- and cis-

acting requirements for blastodermal expression of the head gap gene buttonhead. Mech.

Dev. 53, 235-245 (1995).

Wishart, D.S., Sykes, B.D., & Richards, F.M. Chemical shift index: a fast and simple method

for the assignment of protein secondary structure through NMR spectroscopy. Biochemistry

31, 1647-51 (1992).

Wittekind, M., & Mueller, L. HNCACB, a high-sensitivity 3D NMR experiment to correlate

amide-proton and nitrogen resonances with the α-carbon and β-carbon resonances in

proteins. J. Magn. Reson., Ser. B 101, 201-205 (1993).

Wolberger, C., Vershon, A.K., Liu, B., Johnson, A.D., & Pabo, C.O. Crystal structure of a MAT

alpha 2 homeodomain-operator complex suggests a general model for homeodomain-DNA

interactions. Cell 67, 517-528 (1991).

Wolberger, C. Transcription factor structure and DNA binding. Curr. Op. Struct. Biol. 3, 3-10

(1993).

Wu, J.H., Gottlieb, B., Batist, G., Sulea, T., Purisina, E.O., Beitel, L.K., & Trifiro, M. Bridging

structural biology and genetics by computational methods: An investigation into how the

R774C mutation in the AR gene can result in complete Androgen Insensitivity

Syndrome. Hum. Mut. 22, 465-475 (2003).

164 Wuthrich, K. NMR of Proteins and Nucleic Acids. New York: John Wiley & Sons, 1986.

Wuttke, D.S., Foster, M.P., Case, D.A., Gottesfeld, J.M., & Wright, P.E. Solution structure of

the first three zinc fingers of TFIIIA bound to the cognate DNA sequence: determinants of

affinity and sequence specificity. J. Mol. Biol. 273, 183-206 (1997).

Yamamoto, K., Yee, C.C., Shirakawa, M., & Kyogoku, Y. Characterization of the bacterially

expressed Drosophila engrailed homeodomain. J. Biochem. Tokyo 111, 793-797 (1992).

Yu, X., St Amand, T.R., Wang, S., Li, G., Zhang, Y., Hu, Y., Nguyen, L., Qiu, M., & Chen, Y.

Differential expression and functional analysis of Pitx2 isoforms in regulation of heart

looping in the chick. Development 128, 1005-1013 (2001).

Yuan, D., Ma, X. & Ma, J. Sequences outside the homeodomain of Bicoid are required for

protein-protein interaction. J. Biol. Chem. 271, 21660-21665 (1996).

Yuan, D., Ma, X. & Ma, J. Recognition of multiple patterns of DNA sites by Drosophila

homeodomain protein Bicoid. J. Biochem. 125, 809-817 (1999).

Zerbe, O., Szyperski, T., Ottinger, M., & Wuthrich, K. Three-dimensional 1H-TOCSY-relayed

ct-[13C,1H]-HMQC for aromatic spin system identification in uniformly 13C-labeled proteins.

J. Biomol. NMR 7, 99-106 (1996).

Zhang, O., Kay, L.E., Olivier, J.P., & Forman-Kay, J.D. Backbone 1H and 15N resonance

assignments of the N-terminal SH3 domain of drk in folded and unfolded states using

enhanced-sensitivity pulsed field gradient NMR techniques. J. Biomol. NMR 4, 845-858

(1994).

Zhao, C., Dave, V., Yang, F., Scarborough, T. & Ma, J. Target selectivity of Bicoid is dependent

on nonconsensus site recognition and protein-protein interaction. Mol. Cell. Biol. 20, 8112-

8123 (2000).

165