THE DOUBLESEX FACTOR:

STRUCTURAL AND FUNCTIONAL STUDIES OF A SEX-

DETERMING FACTOR

by

JAMES ROBERT BAYRER

Submitted in partial fulfillment of the requirements

For the degree of Doctor of Philosophy

Dissertation Advisor: Dr. Michael A. Weiss

Department of Pharmacology

CASE WESTERN RESERVE UNIVERSITY

January, 2006 CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve the dissertation of

______

candidate for the Ph.D. degree *.

(signed)______(chair of the committee)

______

______

______

______

______

(date) ______

*We also certify that written approval has been obtained for any proprietary material contained therein. Table of Contents

List of Tables iii

List of Figures iv

Acknowledgements vii

List of Abbreviations viii

Abstract xi

Chapter I 1

"Introduction, review of the literature, and statement of purpose"

Chapter II 60

"Expression, crystallization and preliminary X-Ray and NMR characterization of the

Drosophila Doublesex"

Chapter III 83

"Dimerization of Doublesex is mediated by a cryptic UBA Domain: Implications for sex- specific regulation"

Chapter IV 126

"Sex-specific gene regulation: Intersexual Drosophila development due to misfolding of a novel UBA domain"

Chapter V 163

"Hydrogen exchange reveals a stabile dimeric core of Doublesex CTD"

Chapter VI 190

"Summary and future directions"

Appendix I 226

"Residue environments"

i Appendix II 254

"Three helix domains"

Appendix III 271

"Doublesex CTD sequence homologs"

Appendix IV 272

"UBA Domains and electrostatics"

Appendix V 275

“Expression and purification of intact Doublesex”

Appendix VI 277

"Expression and purification of intact Intersex"

Appendix VII 281

"Characterization of the putative Intersex-binding groove"

Appendix VIII 292

"Yeast One-hybrid control studies"

Appendix IX 296

"Residual energy in dimeric systems"

Appendix X 297

"Salt bridge and hydrogen bond interactions in the Doublesex CTD"

Appendix XI 301

"Dimerization-coupled folding of a sex-specific UBA domain in a transcription factor"

Bibliography 315

ii List of Tables

Table I-1 DNA binding and oligomerization properties of Doublesex 37

Table II-1 X-ray data-collection and analysis statistics 71

Table II-2 Triple-resonance experiments 72

Table III-1 X-ray data collection and refinement statistics 98

Table III-2 Model-building and refinement statistics 99

Table III-3 Yeast two-hybrid analyses 100

Table V-1 Hydrogen exchange parameters for native CTDF-p 179

Table V-2 Hydrogen exchange parameters for G398A CTDF-p 180

iii List of Figures

Figure I-1 Sex determining hierarchy in 38

Figure I-2 DSXF homologs 40

Figure I-3 Larval segmentation and imaginal disc location 42

Figure I-4 Segmental origins and dimorphic features of the genital disc 44

Figure I-5 Dsx alternative mRNA splicing and organization 46

Figure I-6 Regulation of sxl transcripts 48

Figure I-7 Organization of the yp fbe region and co-regulator binding sites 50

Figure I-8 DM GMSA of the dsxA binding site 52

Figure I-9 DM domain (residues 41-81) solution structure 54

Figure I-10 Courtship ritual of Drosophila melanogaster 56

Figure I-11 Common α-helical dimerization motifs 58

Figure II-1 Crystals of CTDF-p 73

Figure II-2 X-ray diffraction pattern of native CTDF-p 75

Figure II-3 Harker section from anomalous scattering Patterson map 77

Figure II-4 1D 1H NMR spectra of native CTDF and 398A CTDF-p 79

Figure II-5 1H-15N HSQC spectra of CTDF and CTDF-p 81

Figure III-1 cascade in D. melanogaster and domain

organization of DSX 100

Figure III-2 Sequence alignment of DSX homologs in insect alleles 102

Figure III-3 Comparison of CTDF-p secondary structure with that predicted previously

104

Figure III-4 Structure of CTDF-p 106

iv Figure III-5 Unusual structure and packing of bent helix α2 108

Figure III-6 Alignment of CTDF-p and Classical UBA Domains 110

Figure III-7 Stereo diagrams salt bridge interactions in CTDF-p 112

Figure III-8 Stereo view of protomeric UBA mini-core 114

Figure III-9 Hydrophobic core and key dimer interface residues 116

Figure III-10 Dimerization interface and potential Ub-binding surface 118

Figure III-11 Surface representation of CTDF-p dimer color-coded according to extent

of sequence conservation among insect dsx alleles 120

Figure III-12 Ribbon model depicting the environments ofL373, M377, and I395

122

Figure III-13 Ribbon stereo comparison of CUE and DSX CTDF-p dimerization motifs

124

Figure IV-1 Sexual differentiation cascade in D. melanogaster and domain

organization of DSX 139

Figure IV-2 Structure of CTDF-p 141

Figure IV-3 Biochemical and Cell-Based Studies of CTDF 143

Figure IV-4 Serial Dilution of CTDF-p 145

Figure IV-5 Schematic of yeast one-hybrid system 147

Figure IV-6 CD Studies 149

Figure IV-7 Comparison of of 1H-15N HSQC spectra of CTDF-p and CTDF 151

Figure IV-8 The distal portion of the female-specific tail of DSXF is flexible in

solution 153

Figure IV-9 NMR Studies of native CTDF-p and G398A substitution 155

v Figure IV-10 Sequence Conservation of female and male CTD sequences 157

Figure IV-11 Surface representation of CTDF-p dimer color-coded according to

extent of sequence conservation among insect dsx alleles 159

Figure IV-12 Environment of G398 and G398D mutant side chain in hypothetical isolated CTDF-p protomer 161

Figure V-1 Thermal denaturation of native CTDF-p monitored by CD 180

Figure V-2 Protection factors for slowly exchanging amides for native and 398A

substitution 182

Figure V-3 Stereo diagram of Group I residues 184

Figure V-4 Stereo diagram of Group II and III residues 186

F Figure V-5 CD-detected guanidine hydrochloride titration of native CTD -p in D2O

at 30 °C 188

Figure VI-1 Schematic of putative DSX binding sites in promoter regions of

suspected DSX-regulated 212

Figure VI-2 Crystal contacts suggest potential tetramer surfaces 214

Figure VI-3 Fluorescence studies of the CTDF-p and 216

Figure VI-4 NMR footprinting studies of the CTDF-p and Ub 218

Figure VI-5 Reverse footprint of 15N-Ub with unlabeled CTDF-p 220

Figure VI-6 Model of di-Ub binding a UBA monomer 222

Figure VI-7 Organization of a sex-specific transcription complex 224

vi Acknowledgements

I would first like to thank my committee for their support and assistance throughout my

studies at Case: Tony Berdis, Mike Weiss, John Mieyal, George Dubyak, and Peter

Harte. Within the Weiss Lab, Nelson Phillips imparted invaluable hands-on education and support. Dr. Wan and Narendra Narayana were great company on our long synchrotron trips and always willing to assist with crystallography. Dr. Li has wonderful, unbridled support and enthusiasm for science and for all students. Rupi Singh has provided great personal and professional support, and given me confidence for my next phase of training. Dr. Hua and especially Yanwu Yang have been very kind in help with set-up and interpretation of our NMR data. I especially want to thank Wei Zhang, my partner-in-crime and friend during this thesis work.

I am very grateful to my advisor, Mike Weiss, who has served as my mentor in personal and professional growth. His scientific rigor and expectations serve as a constant challenge to better myself, and have instilled a spirit of independence and continual learning that will serve me throughout the rest of my career.

Finally, I would like to thank my wife and my family for their love and support throughout my training. I’ve had some great days and more than a few truly lousy ones at the bench, but their constant love and encouragement kept me going. Thank you!

vii List of Abbreviations

CCD, charge coupled device CD, circular dichroism CTD, C-terminal domain CUE, coupling of ubiquitin to ER degredation DSX, Doublesex fbe, fat body enhancer FRU, HPLC, high-performance liquid chromatography HSQC, heteronuclear single-quantum coherence HX, hydrogen exchange IPTG, isopropyl-β-D-thiogalactoside IX, intersex, Ub, ubiquitin NMR, Nuclear Magnetic Resonance SAD, single wavelength anomalous dispersion/diffraction SEC, size-exclusion chromatography SeMet, selenomethionine SXL, Sex Lethal TRA, Transformer Ub, ubiquitin UBA, ubiquitin-associated domain Y1H, yeast one-hybrid Y2H, yeast two-hybrid yp, yolk protein

viii THE DOUBLESEX TRANSCRIPTION FACTOR: STRUCTURAL AND FUNCTIONAL STUDIES OF A SEX-DETERMINING FACTOR

Abstract

by

JAMES ROBERT BAYRER

Doublesex (DSX) is a transcription factor responsible for the regulation of sexual

differentiation in Drosophila. Alternate splicing gives rise to male- and female-specific

isoforms. A potent modulator of the yolk protein gene (yp), the male isoform (DSXM)

represses transcription of yp whereas the female isoform (DSXF) is an activator. DSX

contains two recognized domains, an N-terminal DNA-binding domain (shared between

isoforms) and a C-terminal domain (CTD) that is responsible for oligomerization and

presumably transcriptional regulation. The CTDs contain sex-specific C-terminal

sequences with opposite gene-regulatory properties. We have solved the crystal structure of the core CTD dimerization domain to a resolution of 1.6 Å using single-wavelength

anomalous dispersion (SAD) phasing. The crystal structure reveals a novel dimeric

arrangement of ubiquitin-associated (UBA) folds. To our knowledge this is its first

report in a transcription factor, and the first structure of a dimeric UBA domain.

Dimerization is mediated by a non-canonical hydrophobic interface extrinsic to the

putative Ub-binding surface. The unexpected observation of a UBA fold in DSX extends

the repertoire of α-helical dimerization elements in transcription factors.

ix Intersexual development of XX Drosophila (karyotypic females) is associated with mutation G398D, encoded in female-specific exon 4. G398 lies within CTD. We demonstrate that the intersexual mutation blocks dimerization leading to non-native aggregation. Analysis of diverse substitutions at 398 indicates that only alanine (found in

the male isoform) is tolerated. The structure of the dimer suggests that side chains larger

than alanine would clash at the interface; dimerization of the G398D variant would be

further destabilized by an uncompensated buried charge. The instability of the variant

monomer suggests that folding and dimerization are coupled. Such coupling ― although

a general feature of leucine zippers and other helical dimerization elements ― is novel

among UBA domains. We extend these studies by hydrogen exchange analysis,

demonstrating long-lived resonances in the dimer interface. We further present evidence

for ubiquitin-binding by CTDF, suggesting a previously unappreciated role for ubiquitin

in sex determination. Our results define a new role for the UBA fold in transcription and

rationalize the impaired sexual differentiation of a model organism.

x 1

CHAPTER I

INTRODUCTION

REVIEW OF THE LITERATURE

STATEMENT OF PURPOSE

INTRODUCTION

Conserved mechanisms of sexual differentiation

Sexual differentiation in metazoans is triggered by a wide array of master signals. In mammals, for example, the presence of a gene on the Y chromosome (sex determining

region of the y chromosome, sry) leads to male differentiation. In birds, the trigger may

be the dosage of Z chromosomes (male are ZZ, females ZW (Smith and Sinclair 2004).

The nematode and the fruit fly family Drosophila employ

chromosomal counting mechanisms (Cline and Meyer 1996). Other insects, however,

utilize male (or sometimes female) determining factors (Marin and Baker 1998;Saccone

et al. 2002).

Despite these divergent upstream signals, commonalities of sex determination are being increasingly recognized. A highly conserved zinc-binding module, the DM domain

(named for Doublesex and MAB-3) has been shown to function in sex determination in a host of organisms. In Drosophila, the DM-containing doublesex (dsx) resides at the end of a sex determining RNA-splicing hierarchy (reviewed below). Animals lacking dsx function develop as intersexes, showing rudimentary differentiation of both male and 2 female characteristics. C. elegans lacking functional mab-3 develop as hermaphrodites.

Amazingly, substitution of the male isoform of dsx (dsxM) (but not the female isoform, dsxf) can rescue development of male specific structures in chromosomally male C. elegans (Raymond et al. 1998). Deletion of the mouse DM-containing gene results in XY gonadal dysgenesis and azospermia; female mice appear unaffected (Raymond et al. 2000). DM genes in humans are arrayed along chromosomal arm 9p in a manner suggestive of HOX genes. Deletions in this region are associated with human sex reversal, gonadoblastoma formation, and severe mental retardation (Raymond et al.

1999;Ounap et al. 2004).

While the DM domain is highly conserved, flanking regions within these generally are not (Volff and Schartl 2002). In insects, the C-terminal regions of the proteins are sex-specific, owing to alternative mRNA splicing. As a whole, the DM domain and the C-terminal dimerization and activation domains are highly conserved in insects, whereas only the DM domain is conserved more broadly (Figure I-1, p. 38). The of dsx gives rise to isoforms with identical DNA-binding properties but opposing gene regulatory activity. Dsx thus functions as a genetic switch between two fates: male and female. This important ramification in development is dictated by thirty amino acids for the female and 152 amino acids in the male.

In order to understand the mechanism of this genetic switch, we have endeavored to solve the crystal structure of the C-terminal DSXF dimerization and activation domain (CTDF).

Of key interest is the relationship between the domain's three dimensional structure and 3

that of other known domains, particularly in light of poor sequence conservation outside

of insects. The long term goal of complete structural characterization of DSX and transcriptional co-activators will enable the design of novel dsx alleles that can be directed to disrupt dimer structure or co-factor interactions. These alleles, targeted by homologous recombination, will allow for a synergy between structural and developmental biology that we predict will lead to a deepening of our understanding in this fundamental field.

In the remaining sections of this chapter, we review the sexual differentiation hierarchy and its role in dosage compensation, along with our current knowledge of the molecular and biochemical properties and functions of dsx. A brief introduction into the life cycle of Drosophila and development of the genital disc (where functionality of dsx is most striking) is provided. As the CTDF is predicted to be predominantly alpha helical, we

conclude with a brief review of common helical dimerization motifs.

4

REVIEW OF THE LITERATURE

A. Drosophila development, phenotypes, and sex determination hierarchy

Life-cycle of Drosophila and development of the genital disc

The life cycle of Drosophila melanogaster takes place over ca. 10-12 days at 25 °C. The

fertilized egg undergoes a series of syncytial divisions until it reaches the blastoderm stage, ca. 3.5 hrs after fertilization. At around 17 hrs post fertilization the first of three

larval instars is formed. During larval development, groups of cells begin organizing into

17 distinct imaginal discs that will eventually form the adult structure (Figure I-3, p. 42).

These imaginal discs can be identified by their diploid nuclei and cellular division,

appearing as islands of undifferentiated epithelial-like cells. Typical larval cells increase

in size but do not undergo cellular division. Their nuclei become polytene (the

chromosomes undergo replication without nuclear division). A third type of cell group,

abdominal histoblast nests, fated to produce all outer abdominal segments in the adult

save the eighth segment, which is contributed by the genital disc. While the imaginal

disc cells divide during the larval stages, the abdominal histoblast nests are quiescent

until the pupal stage. The first and second instar stages last about 1 day, while the third

larval instar persists for ca. 3 days. Upon pupal formation (2.5-3 days post fertilization),

most of the larval tissue undergoes histolysis. The imaginal discs at this point reorganize

and differentiate into their final structures as the adult Drosophila begins to take shape.

After ca. 4 days of metamorphosis, the juvenile fly emerges.

5

The genital imaginal disc begins forming within ca. 12 hrs of embryonic development, consisting of 12-15 cells and continues to grow during the larval instar stages. At the end of the first larval instar, the disc has increased in size to ca. 64 cells. By the third larval instar, the genital disc is clearly sexually dimorphic (Figure I-4, p. 44). In females, embryonic abdominal segment A8 has overgrown segment A9; the converse is true in males. Segment A10 gives rise to the sexually dimorphic analia. The genitalia and analia are collectively known as the terminalia.

The homeotic (HOX) genes abdominal-A, abdominal-B, and caudal (abd-A, abd-B, and cad, respectively) give rise to the segment identity in the developing disc. While abd-a is responsible for the formation of the internal female genitalia and cad the analia, abd-b is the major determinant of the external genitalia. Ectopic expression of abd-b is sufficient to transform part of the head into genitalia, whereas deletion of abd-b results in the genitalia developing as leg or antennal structures (Postlethwait et al. 1972;Estrada and

Sanchez-Herrero 2001). As the external genitalia are highly sexually dimorphic, and can be transformed by expression of either isoform of dsx, it is likely that dsx serves to integrate pathways of position/segmentation and sex (Keisman et al. 2001).

Information for this preceding section was in part derived from the following texts and websites: Genetic Analysis of Animal Development (Wilkins 1993), Imaginal Discs: the

Genetic and Cellular Logic of Pattern Formation (Held 2002), Development of the

genitalia in Drosophila melanogaster (Estrada et al. 2003), and The Interactive Fly

(http://www.sdbonline.org/fly/aimain/1aahome.htm). 6

Sexually dimorphic features of Drosophila and the intersexual phenotype

Sexual dimorphism in Drosophila Melanogaster can be seen grossly in many tissues,

despite the overall similar appearance of the organism (to human observers). The easiest

observable difference is in the abdominal pigmentation. In female flies, a posterior stripe

of dark pigmentation is observable in segments 1 to 6. A similar posterior stripe of pigmentation is found in males in segments 1 to 4, but segments 5 and 6 are completely colored. Aside from the abdominal pigmentation, sexual dimorphism is exhibited more subtly in other tissues. Males have sex combs on their forelegs (9-14 sex comb teeth), whereas females have a transverse row of 6-8 bristles. Drosophila Melanogaster has sexually dimorphic anal plates, and male flies have external genitalia consisting of a genital arch, phallus apparatus, and apodeme. Male and female flies exhibit differing internal genitalia as well, with male flies having ejaculatory bulb, vas deferens, and male accessory glands. Female flies have ovaries, spermathecal duct, uterus, and the female accessory glands.

True null alleles of dsx turn both chromosomal sexes into indistinguishable intersexes with rudimentary forms of both male and female genitalia (Baker and Ridge, 1980).

Abdominal pigmentation is intermediate with male-like patterning in the posterior three- fourths of segment 5 and complete male-like pigmentation of segment 6. On the forelegs,

5-7 leg bristles can be found in an intermediate position. The anal plates of the intersexes are fused. A male genital arch is present as is a reduced phallus apparatus, however the apodeme is absent. The internal genitalia show both male and female differentiation. 7

Control of sexual differentiation - the sex-determining hierarchy

Doublesex (DSX) is a downstream component of the well-described sex-determining

hierarchy of Drosophila melanogaster (Figure I-1, p. 38). In this genus sex is determined

by the ratio of X chromosomes to autosomes. When the ratio is 2:2 (XX genotype), the

sex lethal gene (encoding an RNA splicing factor) is activated, leading to the expression

of Transformer (TRA), a second RNA-splicing factor. TRA, with TRA2, regulates the

female splicing isoform of DSX (DSXF). When the ratio of X chromosomes to

autosomes is 1:2 (XO genotype), sex lethal is not expressed, and the male isoform of

DSX (DSXM) is produced by default. Messenger RNAs encoding DSXM and DSXF contain the same first three exons; the C-terminal region of DSXF is encoded by exon

four whereas the C-terminal region of DSXM is encoded by exons five and six (Figure I-

5A, p. 38; (Burtis and Baker 1989). Thus, as a consequence of sex-specific splicing,

male and female isoforms are identical for the first 397 residues, but differ for the remainder of the proteins.

The first step in the sex-determining cascade, activation of SXL in XX individuals, requires a chromosomal counting mechanism initially and autoregulation later in development. A number of transcriptional regulators encoded on the X chromosome are expressed early in embryogenesis. These factors, known as numerators, combine to initiate transcription of sxl at its early promoter, Pe. This transcript of sxl encodes an initial burst of functional protein. Soon thereafter, around nuclear cycle 14, transcription of sxl from its early promoter is turned off. Transcription then switches to a second 8

maintenance promoter, Pm, creating a second sxl transcript found in both sexes. This

transcript contains an exon encoding a stop codon that must be spliced out for functional

protein to be produced. This exon is spliced out by SXL remaining from the Pe-produced transcripts in female (XX) individuals, allowing continued expression of functional SXL.

Since male (XO) individuals lack the initial SXL protein, only non-functional, truncated

SXL protein is produced. Thus the sex-specific regulation of sxl is a self-propagating all- or-none event (Figure I-6, p. 48).

The role of SXL as the branch-point in sex-determination in Drosophila was determined partly through the efforts of Cline and coworkers in 1979. Two mutations were identified, sxlf1 and sxlM1, a recessive loss-of-function allele and dominant gain-of-

function allele. The recessive allele caused female-specific lethality while the dominant

allele resulted in male-specific lethality. Mosaic individuals were identified, allowing the

sex-determining effects of SXL to be discovered: the SXLM1 feminized male cells while

SXLF1 masculinized female cells.

What are the numerator genes and are there autosomal denominator genes?

Numerator genes are responsible for the activation of sxl in genetically female

individuals; therefore, loss of these genes should result in sex-specific lethality in females only. Since denominator genes act against sxl activation, their loss should result in sex- specific lethality in males alone. A set of genes have been identified that fill the requirements for numerator genes. These include the sisterless family of genes, sisterlessA (sisA), sisterlessB (sisb, also known as scute), and sisterlessC (sisC), along 9

with runt and sxl itself. The sisterless family and runt act at the level of transcription to

effect SXL expression binding in the Pe promoter (Figure I-6A, p. 48). SisA and SisB both encode basic leucine zipper proteins (bZIPs) that are expressed widely in the

embryo. The SisC protein is not well understood. Mutations in SisC effect sxl

transcription in the middle section of the embryo only, however these mutations

synergize with mutations in the other sisterless proteins to effect sxl expression across the

embryo as a whole. runt encodes a protein homologous to the viral transcription factor

PEPB2, and like SisC is more restricted in its effects.

Sxl, while not involved in its own transcription, meets the criteria as a numerator gene by

its presence on the X chromosome. Since maintenance of the sex lethal signal requires

pre-made protein, residence on the X chromosome ensures two doses in XX individuals,

and the continued production of functional SXL transcribed from the sxl Pm promoter.

The Scute protein is known to encode a basic helix-loop-helix (bHLH) motif and is involved in neurogenesis in both sexes. Due to a special upstream regulatory sequence, described by Wrischnik et al. in 2003, Scute also functions as a numerator gene.

Interestingly, the amino acid sequence is plastic in regard to sex determination -- housefly

Scute was able to rescue sex determination in fruit flies lacking endogenous Scute

(Wrischnik et al. 2003).

10

The transcription factor Deadpan is the autosomal denominator element

Deadpan (dpn), another bHLH protein that is involved sex-non-specifically in neuronal

functions also acts as a weak denominator gene (Younger-Shepherd Cell. 1992). Studies

by Hoshijimi et al. demonstrated that DPN binds directly to a response element within

the sxl Pe impairing transcription even in the presence of the pro-transcription complex of da/sis-b.

Multiple roles for Sex Lethal in Drosophila development

SXL autoregulates its expression in female Drosophila. As mentioned previously, SXL protein produced from the Pe transcript is required for female-specific splicing of the sxl

Pm transcript. SXL forms a RNA-splicing complex on the Pm pre-mRNA, recruiting in

among other factors SNF, VIR, and FL(2)D. This complex removes a male-specific exon

that would otherwise encode a stop codon, resulting in early termination of SXL translation, resulting in non-functional protein. It should be noted that this is the default state in male individuals (Figure I-6B, p. 48).

SXL also plays an important role in dosage compensation, the effects of which give rise to sex-specific lethality and hence its name. In Drosophila, dosage compensation for the

X chromosome is achieved not through X-inactivation, as for female mammals, but rather through X-hyperactivation in male flies. The primary determinant for this activation is the gene male-specific lethal2 (msl2). Once again, sex-specific RNA splicing controls protein expression. The default splicing of the msl2 transcript, presumably by house- keeping splicing machinery, removes an intron containing a stop codon. When SXL is 11

present, this intron remains in place and non-functional MSL2 is produced. Thus, male

flies that carry the gain-of-function mutant allele sxlM1 splice the stop codon-containing

exon into msl2 transcripts, and perish from a lack of X chromosome dosage

compensation. Conversely, female flies defective in SXL production such as those

homozygous for sxlf1 are unable to inactivate msl2 expression and die from

hyperactivation of both X chromosomes. Intuitively, one would expect female flies

defective in both sxl and msl2 would not exhibit this lethality, as msl2 would continue to

be produced with the stop codon containing intron, and the MSL complex would not

form. However, the msl2 mutant (and msl mutations in general) do not rescue sxl null

females due to alternate dosage compensation control of runt by SXL and DA.

The final function of SXL in Drosophila development is in the sex-specific splicing of the RNA-splicing factor Transformer (tra), the next component of the sex-determining hierarchy. This SXL function appears to be more passive than its roles in autoregulation

and dosage compensation; SXL binds to a 3' splice site on the tra pre-mRNA, preventing

access by the RNA-splicing machinery and thus promotes splicing at a site further

downstream the tra pre-mRNA in female individuals. Interestingly, this "roadblock"

type splicing results in a mixture of male (default) and female (SXL-dependent) splicing

mRNAs being produced. The male mRNA results in a non-functional, truncated protein.

Tra functions along with its partner transformer2 (tra2, expressed in both males and

females) to promote the female splicing isoform of dsx. Tra accomplishes this by binding to the upstream of two 3' splice sites, preventing the default (male) splicing of the 12

dsx mRNA. Importantly, tra also functions to block the male splicing of fruitless (fruM), which prevents the differentiation of the male-specific muscle of Lawrence and the hardwiring of male sexual orientation.

B. Cellular and molecular functions of Doublesex

Revising the Sex-differentiation model: Doublesex acts both cell-autonomously and instructively during Drosophila development

Drosophila larvae contain groups of cells referred to as imaginal discs. These discs undergo rapid growth and differentiation upon pupation, forming the adult fly, while the larval tissue is almost completely hydrolyzed (see life cycle of Drosophila, see pg. 4).

The genital imaginal disc is a compound disc composed of segments A8, A9, and A10

(Figure I-4, p. 44). Historically, the first two segments were mapped to gender-specific development (Bryant et al. 1978). That is, the A8 segment grows to form the female genitalia whereas the A9 segment forms the male genitalia. The A10 segment produces the analia, which has sex-specific morphology. Because the A8 segment in male flies was not seen to contribute to adult structures, it became known as the “repressed female genital primordium” or RFP. The converse situation exists in female flies, with A9 being known as the “repressed male genital primordium” (RMP). The analia producing segment is known as the anal primordium (AP). Early work suggested that doublesex, residing at the end of the sex-determination cascade, works to repress the inappropriate primordium during development in a cell-autonomous manner (Baker and Ridge 1980).

Individuals null for dsx function have inappropriate development of both male and female genital primordia. Thus, only the segments corresponding to the sex of the fly (A8 for 13 female, A9 for male) were permitted to develop. This "permissive" and "non-permissive" role of dsx in development was predominant in the field until cloning of the gene allowed for more detailed studies showing active transcriptional regulation (Burtis and Baker

1989;Coschigano and Wensink 1993;An and Wensink 1995a).

This model was in part supported by studies of mosaic individuals composed of both XX and XO cells as well as cell culture assays (Nothiger et al. 1987;Epper and Nothiger

1982). Since cells rely on chromosomal counting to determine sex, and not on hormonal cues, loss of an X chromosome in a cell of an XX individual will switch from female to male identity. An examination of such individuals with mosaic cells in the genital primordium demonstrated the side-by-side growth and development of male and female genitalia. Furthermore, when the genital imaginal disc was fragmented and allowed to grow in culture, while some fractions led to adult structures, others did not. It was therefore assumed that the non-developing fractions derived from the repressed primordium (RMP in XX cells, RFP in XO cells (Epper and Nothiger 1982).

Recent work has forced a revision of this classical model (Keisman and Baker 2001),

(Keisman et al. 2001) (Sanchez et al. 2001). An instructional (rather than permissive/non-permissive) role for DSX in the genital disc was shown by Keisman and

Baker who noted the sex-specific deployment of dachshund (dac), previously known to be necessary for proper leg development (Reviewed in (Mann and Carroll 2002).

Expression of dac was shown to be sexually dimorphic -- central expression in female 14

genital discs, lateral in male. In females, dac expression is correlated with expression of wingless (wg), a member of the wnt signaling pathway. Female genital discs of

wgCX3/wgCX4 individuals (hypomorphic for wg function) showed a complete loss of dac

expression. In male genital discs, loss of wg function was correlated with increased dac

expression centrally. Keisman and Baker thus concluded that wg activates dac

expression in female cells, but decreases its expression in male cells (Keisman and Baker

2001). An examination of decapentaplegic (dpp, related to TGFβ and known to

antagonize wg function) demonstrated additional sex-specific regulation. The expression

of dpp is similar in male and female genital discs showing a lateral distribution. Loss-of-

function analysis in dppd12/dppd14 genital discs is correlated with a loss of dac expression in male discs, whereas dac expression in female discs is unchanged. Furthermore, ectopic expression of dpp in female genital discs is associated with repression of dac expression.

The specific wg and dpp responses could be mediated by two different pathways: positional (bithorax control; (Casares et al. 1997) or the sex-determining hierarchy

(Keisman and Baker 2001). Keisman and Baker ectopically expressed tra cDNA

(capable of transforming an XO individual into a pseudofemale (McKeown et al. 1988)

(Waterbury et al. 1999) in cells within a male genital disc. Clones expressing the tra cDNA led to female cellular identity of that clone. When these clones reside in the lateral portion of the disc, expression of dac is lost. Neighboring cells not expressing tra cDNA retain dac expression. Conversely, centrally located tra+ clones ectopically

express dac. Thus, feminization of cells within the male genital disc leads to aberrant 15

expression of dac (Keisman and Baker 2001). The converse experiment was performed

using disruption of tra-2 expression via dsRNA-mediated interference, previously shown

to transform XX individuals into pseudomales (Fortier and Belote 2000). The behavior of these masculinized clones depends on location: cells in the RMP begin to grow as in a male genital disc, while clones in the female primordium generally mimic the non- responsive state of the RFP. Although their results suggest the sex-determining hierarchy determines dac expression in the genital disc, they do not prove this differential regulation is due to DSX. For this determination, dac expression was examined in context of dsx loss-of-function alleles (dsxD+R3/dsxM+R15). dac expression in male discs showed ectopic central expression and normal lateral expression, suggesting DSXM is

required to repress central dac expression but not required for its lateral expression.

Expression in the female genital disc was ambiguous, but suggested DSXF is not required

to repress dac expression laterally, nor enhance expression centrally. The authors

conclude that DSX regulates wg and dpp signaling to modulate dac expression. Without

DSX, both wg and dpp are free to activate dac expression. In the presence of DSXF, dpp

becomes a repressor of dac and possibly potentiates wg-mediated dac expression. DSXM, however, turns wg into a repressor of dac expression without effecting dpp activation of dac (Keisman and Baker 2001).

In a complementary study, Guerrero and co-workers again demonstrated DSX control of wg and dpp (Sanchez et al. 2001). DSXF expression in RMP blocks dpp expression. As

in Keisman and Baker's work, expression of DSXM (mediated by tra2 mutant cells) in the

RMP led to overgrowth of the cells and ectopic expression of dpp. The authors suggest 16

expression of DSXF represses dpp expression in the RMP, blocking its activation by

hedgehog (Hh). An examination of dsx null XO genital discs shows expression of the wg

target gene distal-less (dll) in the RFP, whose expression is blocked in the presence of

functional DSXM. DSXF thus serves to block Hh mediated activation of dpp in the RMP,

whereas DSXM serves to block wg function in the RFP. It is further shown that the

activity of DSXF is continuously required for repression of dpp in the RMP during larval

development and during pupation for proper genital disc growth (Sanchez et al. 2001).

DSXM, however, was shown to be required only during a brief period after the end of the

first larval instar to yield irreversible development of the RMP in XX individuals.

An A/P organizer directs genital imaginal disc growth

Unusual growth of some tra+ cells in male genital discs and tra2- cells in female genital discs led Keisman and Baker to suggest some aspect of genital disc development are organized from a central region, analogous to development in other imaginal discs

(Keisman and Baker 2001). This idea broke with previous models of complete cellular autonomy in sex determination. Baker and coworkers reasoned that since sex determination is cell autonomous, female clones in the male primordium should adopt the repressed state characteristic of that primordium in females. As predicted, these female clones do not contribute normally to male structures; however, the clones frequently grow substantially and contribute to a morphologically normal male genital primordium in the larval genital disc. Occasionally, female clones in the male primordium are associated with severe reductions in the size of the corresponding genital primordium (A9 17

segment) in the larval genital disc. Baker and co-workers tested the wg and dpp cells organized along the A/P border in the genital disc by selectively masculinizing or feminizing the cells through targeted tra cDNA (to feminize male cells) or tra2-IR (to

masculinize female cells). Switching of cell sex within the A/P region (but not outside)

directed development of the genital disc to the new sex of the organizer cells (Keisman et al. 2001).

DSX has thus been shown to act at two separate levels. First, DSX directs the morphogenic signaling specifying sex to the genital disc (Keisman and Baker 2001);

(Sanchez et al. 2001). Second, these morphogenic signals are interpreted by DSX in a

cell-autonomous manner, directing individual cell growth and proliferation (Keisman et

al. 2001). Estrada et al. have shown that the homeogene abdominal B (Abd-B) is required

for the proper differentiation of the Drosophila genital disc (Estrada and Sanchez-Herrero

2001). As previously mentioned, in knock-out flies the genital disc develops into leg, or

more rarely, antenna tissue. Additionally, work by Carroll and co-workers demonstrated

that sexually dimorphic pigmentation (the result of bric-a-brac, bab) was influenced by both Abd-B and DSXF (Kopp et al. 2000). These results give insight into possible

mechanisms of DSX activity, and point towards a central role of integrating positional

and sex information during fly development.

A new model for genital disc development

Using cell-specific markers, cells originating from the RMP in female flies were shown

to develop into the parovaria, female accessory glands originally thought to be derived 18

from the anal primordium (A10). Likewise, cells originating from the RFP in male flies

were shown to develop into a miniature 8th tergite (Keisman et al. 2001). Thus, RFP and

RMP were shown to be misnomers, the A8 and A9 segments are not oppressed in either

sex, but rather develop into discrete adult structures.

More recent results further refine this model, demonstrating that cells outside the

recognized imaginal disc can be recruited into the genital disc. Ahmad and Baker

demonstrate branchless (bnl), the Drosophila fibroblast growth factor (FGF) gene, is

deployed sex-specifically in the male genital disc (Ahmad and Baker 2002). Cells in the

male genital disc expressing bnl recruit embryonic mesodermal cells expressing a

fibroblast growth factor receptor breathless (btl) to the male disc, where they lose

mesodermal markers and transform into epithelial-like cells. The expression of bnl, and

subsequent recruitment of the btl-positive mesodermal cells, can be blocked by

expression of DSXF. This work demonstrated yet another role of DSXF in genital disc development, but more fundamentally, the male genital disc is capable of growing through expansion of imaginal precursor cells as well as through addition of larval cells.

Doublesex and behavior

Courtship behavior in Drosophila is a well-established sequence of events occurring in a

specific order necessary for successful mating (Figure I-10, p. 56; (O'Dell 2003),

(Greenspan and Ferveur 2000). Courtship begins with orientation and following, where the male approaches a receptive female. The next step involves tapping of the female with the male's frontal leg. The male then produces a species-specific courtship song by 19

asymmetrically extending and vibrating one wing. The courtship culminates with licking of the female genitalia with the proboscis and then copulation. The courted female can display receptive or rejective behaviors. A receptive female parts her wings to expose her dorsal abdomen to allow mounting, followed by opening of the vaginal plate to allow for genital contact. Rejective behaviors depend on the sexual history of the female. A virgin female will typically avoid contact by raising her abdomen. A fertilized female will lower her abdominal tip and extrude her ovipositors (and occasionally her eggs as well).

The role of the sex determining hierarchy in behavior is complex, involving both dsx and

fruitless (fru), an alternatively spliced BTB-containing transcription factor, as well as the

nuclear receptor dissatisfaction (dsf). Male splicing of fru (frum) is required for male courtship activity as well as development of a male specific structure, the muscle of

Lawrence (MOL) (Usui-Aoki et al. 2000). Male and female individuals mutant for dsf display different phenotypes. Female individuals are less receptive towards male advances and exhibit an egg-laying defect. Male individuals, however, are bisexual and exhibit copulation defects (Finley et al. 1997;Finley et al. 1998). Both fru and dsf are expressed in the adult CNS. Masculinization of female neurons by expression of frum is sufficient to induce male-like courtship behaviors (Manoli et al. 2005). The role of dsx in behavior is less clear. While dsx is expressed in the Drosophila CNS, and male- and female-specific behaviors map to different regions of the brain, courtship phenotypes are limited. Male Drosophila expressing a transgenic copy of dsxf are courted more

frequently than their wild type peers, slower to reject these advances, and form courtship 20

chains (Waterbury et al. 1999). XY dsx null individuals also elicit higher levels of

courtship from wild type males. Both cases may be due to higher levels of female

pheromone production (Waterbury et al. 1999;Villella and Hall 1996). An interesting

and subtle defect in XY dsx- flies is a lack a component of the male courtship song (sine-

song), indicating at least some active role for DSXM in normal male courtship (Villella

and Hall 1996). Female deficits in courtship due to dsx mutations are less clear, perhaps

in part due to their less active role in courtship (O'Dell 2003).

Examination of the plasticity of sexual identity in Drosophila has led to opposite conclusions (Belote and Baker 1987;Arthur et al. 1998). Studies using a heat-shock promoter-tra transgene indicated that XX and XO individuals raised at non-permissive temperatures had male courtship behavior permanently programmed during puparium formation (Arthur et al. 1998). Contradictory results showing some plasticity of courtship behavior utilized a temperature-sensitive variant of tra. The use of tra in both

studies confounds the results with respect to dsx as fru splicing is also affected. It is

interesting to note, however, that while male behavior could be programmed during a

brief period after which continuous expression of tra has no effect, female behavior

requires its continued expression (Arthur et al. 1998). This result suggests that DSXF is required for courtship in the adult female fly.

21

C. Other down-stream factors required for sexual differentiation

Intersex: an obligate sex-specific co-activator

DSXF requires Intersex (IX) for proper gene regulation and female development. That

the two worked either sequentially in a pathway or in parallel was clear from earlier

studies of mutant flies. Individuals null for dsx are phenocopies of ix nulls. The cloning of ix in 2002 revealed a small protein (188 amino acids) lacking an obvious DNA-binding motif (Garrett-Engele et al. 2002). IX was shown to associate with DSXF, but not DSXM, by yeast two-hybrid assays, GMSA, and co-immunoprecipitation. These biochemical results are in accord with prior genetic analysis demonstrating that ix, while required for female differentiation, is dispensable for male development (Waterbury et al. 1999).

IX is expressed equally in both sexes and shows homology to mammalian transactivators

(Garrett-Engele et al. 2002). More recent studies demonstrated the existence of a human homolog IX as part of mediator complex (Sato et al. 2003). These findings have led to speculation that IX may function with DSXF as a transactivating domain, perhaps acting

as a bridge to mediator, or perhaps "fills in" for a "missing" native transactivating domain

found in DSXM to effect gene regulation (Siegal and Baker 2005).

Hermaphrodite

Hermaphrodite (her) functions as both an early activator of sxl and in parallel with DSXF in the regulation of some sexually dimorphic tissue (Pultz and Baker 1995;Li and Baker

1998). Like ix, her is expressed equally in both sexes although the sexual differentiation function in males remains unclear (female nulls are phenocopies of dsxf nulls); splicing is 22

sex-non-specific. HER contains four putative Zn-finger regions and is thought to act as a

transcription factor (Li and Baker 1998b). HER appears to be capable of activating yp

expression independently of DSXF. This activation is enhanced by DSXF and repressed

by DSXM. Aside from these sex-specific functions, HER possesses essential

functionality in both sexes (Pultz et al. 1994;Pultz and Baker 1995).

D. Biochemical studies of Doublesex

A principal target of the sex determining hierarchy is doublesex (dsx): expression of male- and female-specific transcription factors (DSXM and DSXF) in turn directs most

aspects of somatic sexual differentiation (Burtis and Baker 1989). The DSX isoforms are

encoded by mRNAs sharing the first three exons; the C-terminal segment of DSXF is encoded by exon 4 whereas that of DSXM is encoded by exons 5 and 6. Male- and

female DSX isoforms are thus identical for the first 397 residues, but differ thereafter

(Figure I-5, p. 44, (Burtis and Baker 1989). The isoforms share an N-terminal DNA- binding motif, the DM domain (Raymond et al. 1998). Broad conservation of this domain in metazoan proteins related to sexual differentiation suggests that mechanisms

of sexual dimorphism are in part universal (Raymond et al. 1998). The respective C-

terminal domains of DSX (CTDF and CTDM) – containing common and sex-specific

sequences – mediate dimerization (Erdman et al. 1996;An et al. 1996) and potential

recruitment of transcriptional co-regulatory factors (Garrett-Engele et al. 2002).

Dimerization in each case enhances specific DNA binding (Cho and Wensink 1998).

Here we review the early characterization of the doublesex proteins.

23

Binding of Doublesex to the fat body enhancer

The dsx gene was cloned in 1989 by Burtis and Baker (Burtis and Baker 1989). The full-

length proteins were subsequently expressed in E. coli using the vector pT7-7 for use in

gel-shift and footprinting assays by Wensink and co-workers (Burtis et al. 1991). The

yolk protein genes, expressed in the female adult fat body and known to be sex-

specifically regulated in male and female individuals, were incubated with the soluble

portion of E. coli extracts containing DSXF and DSXM. Presence of the DSX proteins in

the extracts was verified by SDS-polyacrylamide gels. The full-length proteins were

noted to migrate abnormally. Gel mobility shift assays (GMSA) with fragments derived from the fat body were shifted by both DSX isoforms. To further identify the binding regions, DNase I footprinting assays were undertaken with DSX-responsive fragments from the GMSA experiments. The male and female isoforms yield identical footprints: three binding sites were observed, termed dsxA, dsxB, and dsxC. Of these three sites, dsxA was presumed to be the highest affinity site followed by dsxB and dsxC on the basis of protein concentration studies. A dependence on ionic strength was also noted, where increasing concentrations of KCl decreased markedly the dsxC footprint and to a lesser extent the dsxB footprint. In the range of salts tested, the dsxA site showed little variation. Importantly, DSXM and DSXF yielded identical footprints, suggesting the identical N-terminal region of the proteins contains the DNA-binding region. This early

work also identified the core consensus sequence for DSX recognition: ACAA. This

work further identified the 127 bp fat body enhancer (fbe) to which both DSXM and

DSXF bound, marking the first time DSX was shown to have DNA-binding properties.

24

Binding to the fbe by the Doublesex proteins was further elucidated by Coschigano and

Wensink (Coschigano and Wensink 1993). Again, E. coli produced Doublesex was

footprinted with the fbe. Relative affinities of the sites were roughly determined to be

dsxA > dsxB >> dsxC. Specific recognition of the binding sites was suggested by

mutational analysis of the fbe. DSX binding sites were deleted individually and in pairs

by substitution. Substitution of dsxA (ACTACAATGTTGCAATC with

TGATATCCCACCGTTCG) resulted in a complete loss of footprint. Substitution of dsxC (GGTGCTGCTAAGTCATCAGTGGGGTCAGCTATAGGTAGGCCCCG with

CAGAGATCCTCTAGACTCGAGACGGGGCCTACGAAGATCTGATC) also resulted in a loss of footprint at all concentrations tested. Substitution analysis of dsxB

(AGTGATTACAAA with TCACTAGTGTT) resulted in a loss of footprint at most concentrations of DSX. A small footprint is detectable at the highest concentration however. The substituted dsxB may thus contain low-affinity DSX-binding site (ACTAG or ACAA on the complimentary strand) not initially recognized. The promiscuity of the

DSX binding was not known at the time, and may have allowed sufficient occupancy of

the mutated dsxB site to confound these studies. Although the authors determined that

DSX does not bind DNA cooperatively, a re-evaluation of the footprint assays calls this

determination into question. The dsxB substitution appears to have some minor effects

on binding at dsxA, though not at dsxC. Indeed, later studies by the Wensink group of

dimer-dimer binding cooperativity utilizing a dsxA-dsxA construct demonstrate that DSX

is capable of cooperative binding (discussed below).

25

In addition to further defining the binding of Doublesex to the fbe, Coschigano and

Wensink demonstrated regulatory activity in vivo using an hsp/lac construct placed downstream of the fbe and incorporated into the Drosophila genome via P-element- mediated germ-line transformation (Coschigano and Wensink 1993). β-galactosidase

activity in female individuals (who naturally produce DSXF) indicated that DSXF increases expression from the fbe, whereas male individuals repress activation.

Expression levels in intersex individuals producing both male and female Doublesex isoforms demonstrated an intermediate-level of activation, indicating that DSXF and

DSXM can compete in vivo for the fbe binding sites. Furthermore, constructs containing

only the dsxA binding site were able to achieve around 80% activity of the wild type promoter. Surprisingly, the triple dsx binding site knockout yielded weak staining. As some yolk protein expression is observed in intersex individuals lacking endogenous

Doublesex, it appears as though basal activity persists through the action of other regulatory proteins.

Studies of the fbe in transgenic Drosophila suggest a sex- and tissue-specific organization

regulated by multiple factors near the dsxA binding site. This region of the fbe, termed the o-r enhancer (An and Wensink 1995a) was in part implicated by the weak β- galactosidase staining of the intersex flies discussed above (Coschigano and Wensink

1993). In the An and Wensink model, the o-r enhancer is composed of dsxA flanked by adult enhancer factor 1 (aef1) (upstream) and an unidentified bzip1 binding site followed by ref1 (Figure I-7, p. 50). Activation through the bzip1 and ref1 binding sites directs expression in nongonadal cells as well as ovarian somatic cells. The addition of aef1 26

restricts expression outside of the fat body. Sex-specificity is directed by the DSX proteins. Although the DmC/EBP bZIP protein encoded by slow border cells (slbo) was

considered a candidate for yp regulation, its under-representation in fat body cells

suggests regulation by an as yet unidentified bZIP family member. Binding competition

assays between DSXF and AEF1 demonstrated that binding to the enhancer by the two

proteins is mutually exclusive. This supports a model where AEF1 is required for

repression outside of the fat body, but not necessary for fat body expression where

negative regulation would presumably fall under the purview of DSXM for male flies, and is out-competed by higher expression levels of DSXF in females. The An and Wensink

model suggests a positive synergistic interaction between DSXF and a bZIP protein to

facilitate fbe activation. DSXM, however, would block bZIP binding (roadblock

regulation, presumably through its larger C-terminal tail) or perhaps induce bZIP into an

inactive conformation.

Characterization of DSX DNA-binding

The prior studies of intact DSX binding to the fbe suggested the DNA-binding domain

was located in the common N-terminal sequence (Burtis et al. 1991). Successive N- and

C-terminal truncation constructs were expressed via the pT7-7 vector in E. coli to further define the DNA-binding segment through GMSA employing the dsxA binding site

(Erdman and Burtis 1993). A 66 amino acid segment (residues 39-104) was found to define the minimal DNA-binding fragment. Secondary structure prediction suggested

helical propensity of the C-terminal segment of the domain. A mutation in this region,

R91Q (dsxEFK43), results in intersex development of Drosophila. Later structural studies 27

of the DNA-binding domain confirmed the presence of a nascent DNA recognition helix

(Zhu et al. 2000); (Narendra et al. 2002). Sequence analysis of the N-terminal half of the domain noted the presence of a large number of cysteine and histidine residues separated by hydrophobic stretches, suggesting a metal-binding domain. Mutation of Cys47

(C47A, C47H) and Cys68 (C68D, C68Y) in GST-DSX fusion proteins abolished DNA- binding in vitro. Additionally, an analysis of dsx null alleles found four contained altered sequences in the identified DNA-binding domain. Furthermore, three of these mutations altered cysteine or histidine residues (dsx100.41, H50Y; dsx128-1, H59Y; dsx122-1, C70Y). A

spectrophotometric assay based on the metallochromic dye 4-(2-pyridylazo) resorcinol

(PAR) suggested the presence of bound zinc in the domain. Erdman and Burtis further

predicted a molar ratio of ca. 1-2 moles zinc per mole protein. This prediction was

confirmed by later structural studies, where Cys47, His50, His59, Cys68, and Cys70 were

also shown to coordinate zinc atoms (Zhu et al. 2000).

Sequence specificity for the DSX proteins has been investigated by examination of DSX

footprints (An and Wensink 1995b) and by random oligonucleotide selection assays

(Erdman et al. 1996); (Yi and Zarkower 1999). The DSX binding site consists of a

pseudopalindromic sequence centered around a central A/T basepair. The consensus

binding site determined by Burtis and co-workers,

(G/A)NNAC(A/T)A(T/A)GTNN(C/T), is similar to that obtained by Yi and Zarkower.

Interestingly, the dsxC binding site, the weakest DSX affinity site in the fbe, contains a

"disallowed" variant at position -3 (A to G) according to the selection data (Figure I-7, p.

50). DSX is thus capable of tolerating a large number of substitutions in its binding site. 28

Dimer-dimer cooperativity could provide a mechanism to enhance DNA-binding scrutiny

(Cho and Wensink 1998); Chapter VI). The pseudopalindromic nature of the sequence

led to speculation that DSX recognizes DNA as a dimer (Erdman et al. 1996).

A quantitative measure of DSX DNA-binding demonstrated full length DSXF and DSXM recognize the dsxA target site with equally high affinity (Kapp 0.2 nM, (Cho and Wensink

1997). In these studies, dimeric DSX was shown to bind specifically to dsxA and not to

unrelated competitor DNA. At higher DSX concentrations, DNA-tetramer complexes

were observed, although they quickly dissociated into DNA-dimer complexes upon

binding of additional DNA (Cho and Wensink 1997). The identical DNA-binding

constants (Table I-1, p. 37) of the isoforms further suggest that the differential gene regulation is not a function of DNA binding, but rather the sex-specific C-terminals.

Study of the isolated DM domain (Erdman et al. 1996),(Cho and Wensink 1998),(Zhu et al. 2000), and (Narendra et al. 2002) demonstrated an affinity of ca. 8 nM, 35-fold lower than intact DSX, suggesting a thermodynamic link between dimerization and DNA- binding (Cho and Wensink 1998). The DM domain itself binds DNA as a dimer, as evidenced by discrete 2:1 and 1:1 complexes in the presence of excess DNA (Figure I-8, p. 52; (Zhu et al. 2000). Cho and Wensink demonstrated that full length DSX, but not the isolated DM domain, exhibit dimer-dimer cooperative DNA binding when presented with two high affinity dsxA sites spaced 40 bp apart (Cho and Wensink 1998). Furthermore, this dimer-dimer cooperativity showed sex-specific differences, DSXM had a DNA

binding cooperativity over twice that of DSXF (cooperativity coefficients 5.4 and 2.6, 29

respectively). A novel interpretation of the binding isotherms allowed for the

determination of protein dimerization constants. The male isoform exhibits greater

dimerization relative to the female isoform (0.05 nM versus 0.16 nM, Table I-1, p. 37;

(Cho and Wensink 1998).

Previous studies in our laboratory have demonstrated DSX is a minor groove-binding

protein (Zhu et al. 2000). Major groove perturbing base analogs (nebularine, uridine, and

5-methylcytosine) do not affect DM binding to the dsxA target site. Adenine substitution

by diaminopurine, however, prevents DNA binding. As expected with a minor groove

binding protein, methylphosphonate interference showed protein-DNA contacts to be

nearly symmetric about the core sequence, demonstrating half-site recognition.

Interestingly, permutation gel electrophoresis shows DSX binds DNA without inducing

the sharp bend common with minor groove binding proteins. These data are consistent with the An and Wensink model of enhancer organization where DSX and a bZIP protein

bind simultaneously (An and Wensink 1995a).

DSX oligomerization is dependent on sex-non-specific and sex-specific sequences

DSX oligomerization was reported by both Wensink and Burtis laboratories using insect

cell-derived DSX (Cho and Wensink 1996) and yeast two-hybrid approaches (An et al.

1996); (Erdman et al. 1996). DSX expressed in insect cells migrated as a single species

at a molecular weight consistent with dimer formation (124 and 91 kDa for DSXM and

DSXF, respectively) by gel filtration chromatography and glycerol-gradient sedimentation

(Cho and Wensink 1996). The Stokes radii were also calculated (1.91 and 1.88 for 30

DSXM and DSXF, respectively), indicating an overall elliptical shape for the intact

protein. Glutaraldehyde crosslinking studies in the absence of DNA demonstrated

tetramer and higher oligomer formation. Tetramer binding to single DNA sites was

observed, but UV crosslinking assays demonstrated that only two of the subunits

contacted DNA at any one point, in agreement with the Burtis studies (Erdman et al.

1996). To further define the oligomeric sequences, separate yeast two-hybrid assays

were undertaken. Truncation constructs in each case identified a major C-terminal dimerization element and a minor N-terminal element. Structural studies of the N- terminal domain, however, did not observe dimeric species over the concentration range of 60-140 μM by analytical ultracentrifugation, or at the relatively high (ca. 1 mM) concentrations used for NMR analysis (Zhu et al. 2000). The C-terminal dimerization domain (CTD) was shown to extend from sex-non-specific into sex-specific regions

(298-427 and 298-547 for female and male isoforms, respectively (Erdman et al. 1996);

350-412 and 350-527 for female and male isoforms, respectively (An et al. 1996); Figure

I-5B, p. 46). Dimerization of the CTDF-p fragment (residues 350-412) as a GST-fusion

was capable of binding full length DSXF (An et al. 1996).

Sequence analysis by both groups led to differing secondary structure predictions. The

CTD sequence shows helical propensity. When plotted in a helical format, a non-polar

potential dimer interface is observed across three separate helicogenic sequences. One

hypothesis suggests these helices can wrap in a coiled-coil motif to mediate dimerization

(An et al. 1996). A second proposal by Burtis and co-workers suggested a more complicated dimerization motif must be employed as secondary structure prediction 31

programs (specifically PAIR-COIL) predicted coiled-coil formation to be 43% for DSXF and 72% for DSXM (Erdman et al. 1996). Furthermore, an intersex mutation in the

female-specific region shown to block dimerization (G398D, previously identified as the

female-specific dsx null allele dsxf (Nothiger et al. 1987) would increase rather than

decrease the coiled-coil propensity of the domain.

Structural studies of the Doublesex DM domain

The solution structure of the DM domain has been previously solved in our lab (Zhu et al.

2000). It consists of a novel zinc-binding module with CCHC and HCCC zinc

2+ coordination sites. The visible absorption spectrum of a (Co )2-DM complex is

consistent with tetrahedral coordination. Furthermore, Co2+ can be out-competed with

two molar equivalents of Zn2+, indicating preferential zinc binding. The overall structure

consists of two helixes connected by a twenty residue loop (Figure I-9, p. 54). Zinc-

binding site I is composed of Cys44, Cys47, His59, and Cys63. Cys44 and Cys47 project

from the first helix, whereas His59 and Cys63 project inward from the turn connecting

the first and second helices. Site II is intertwined with site I, being composed of His50,

Cys66, Cys70, and Cys73. His50 is provided by the C-terminal of helix one, whereas

Cys73 projects from helix two. Cys68 and Cys70 project into the core from a loop.

The C-terminal tail of the DM domain is disordered in solution and forms a nascent helix

at low temperatures (Zhu et al. 2000); (Narendra et al. 2002). As mentioned previously,

Erdman and Burtis predicted strong helical propensity for this region (Erdman and Burtis

1993). It is possible that the helix folds upon contact with DNA as a recognition helix. 32

Consistent with this hypothesis, intersexual mutant R91Q in the tail region impairs DNA binding, as does deletion of the disordered tail. Further evidence supporting the recognition helix hypothesis is found in CD studies of the DM domain in the presence and absence of DNA (Narendra et al. 2002). The difference spectrum between complex and free forms demonstrates an induced helix upon binding to the DNA half-site (5'-

AGTACATTG-3' and complement).

E. Common α-helical motifs of dimerization

Wensink and co-workers suggested the dimerization domain of DSX would take on a coiled-coil motif, based on sequence analysis (An et al. 1996). Burtis and co-works disagreed, believing the DSX primary sequence incompatible with this motif, and favored a more complicated mode of dimer formation (Erdman et al. 1996). In this section, we review common α-helical dimerization motifs.

Coiled-coils

The coiled-coil is perhaps the most recognized of dimerization motifs. Also referred to as leucine zippers, these domains feature a heptad repeat (abcdefg) where the first and fourth positions (a and d) are occupied by hydrophobic residues (Kohn et al. 1997).

Upon helix formation, residues a and d align to form an extensive hydrophobic surface flanked by the fifth and seventh (e and g) positions. Two helices expressing this motif can then associate in an overall left-handed super-helical twist (Figure I-11A, p. 58).

This dimerization motif was first suggested by Crick in 1953, where parallel strands could self-associate with "knobs-in-holes" packing (Crick 1953). Since then, coiled-coils 33

have been predicted by sequence analysis to occur in 2-3% of all proteins (Burkhard et al.

2001). Coiled-coils typically employ a crossing angle of 20°, allowing continuous

contact between helices. Although a parallel arrangement of helices is more common,

antiparallel arrangement do occur (e.g. bacterial seryl-tRNA synthetase).

Quaternary structures of dimeric, trimeric, and tetrameric coiled-coils are known. These motifs are very wide-spread, being found in cytoskeletal proteins, transcription factors, and membrane proteins, among others. Right-handed coiled-coils are less common, and typically consist of an undecad repeat.

Four-helix bundles and helix-loop-helix

The four-helix bundle (4HB) is a common arrangement of helices that is highly adaptable to complex systems (Figure I-11B, p. 58). In its most basic form, four helices are packed against one another in a hydrophobic core. These helices can be arranged parallel or antiparallel. Typically the helices are ordered up-down-up-down, although variations do exist. Like coiled-coils, typical helix crossing angles are 20° to maximize helix-helix contact (Kohn et al. 1997). An example of an intertwined 4HB is the dimerization domain from HNF-1α, the crystal structure of which was previously solved in our laboratory (Figure I-11C, p. 58; (Narayana et al. 2001).

A related fold, the helix-turn-helix (HTH), serves a variety of roles not limited to dimerization (Figure I-11D, p. 58). When the turn region consists of basic residues, the 34

structure is referred to as a basic helix-turn-helix (bHTH) or basic helix-loop-helix

(bHLH), depending on the length of the helix connector. bHLH proteins are often

involved in DNA-binding, and indeed feature prominently in the dosage compensation

scheme discussed above. bHLH tend to form parallel, left-handed 4HB structures with hydrophobic cores (Kohn et al. 1997). HLH proteins have also adapted to become metal- binding motifs, with a metal (typically Ca2+) coordinated within the loop structure

(Figure I-11E, p. 58). Such structures are also known as EF-hand motifs. 35

STATEMENT OF PURPOSE

Summary and statement of purpose

The role of DSX as a genetic switch and its interface with two profoundly important

developmental pathways demonstrate the relevance of this system. Our long term goal is

to understand the mode of sex-specific regulation performed by the male and female

DSX isoforms. Understanding how the differing C-terminal structures lead to opposing

gene regulation is of fundamental importance. How does DSXF mediate dimerization?

What is the structural basis for the intersex mutation G398D? Does the thermodynamics

of the dimeric system suggest a functional relationship between protein folding and gene regulation? What clues to DSX function can we find through analysis of structural

homologs?

Our first aim is to address the structural basis of dimerization through crystallographic studies. This goal was accomplished using synchrotron radiation and SAD phasing. We demonstrate that CTDF-p adopts a stereotypical Ubiquitin associated (UBA) motif to

mediate dimerization, suggesting a possible role for the ubiquitin system in sex

determination. The symmetric dimer contains an extensive hydrophobic interface with

substantial contacts between the second and third α-helices of each protomer. We show

that the CTDF-p forms a stable helix in solution that is extended by a disordered C-

terminal tail.

36

We further demonstrate the CTDF-p forms a stable dimer in solution. An examination of

the thermodynamic stability suggests dimer formation is consistent with the earlier

studies of the intact protein, validating the use of the truncated domain to study

dimerization. We advance our understanding the folding pathway of the dimerization

domain through the use of hydrogen exchange studies, demonstrating a coupled folding-

dimerization pathway.

Through an interdisciplinary and collaborative approach, we address our second goal:

understanding the mechanism(s) underlying the intersexual development of a mutant

Drosophila first described nearly twenty years ago (Nothiger et al. 1987). We

demonstrate that this mutation results in decreased dimerization, decreased DNA-

binding, and leads to protein misfolding and aggregation in vitro. We further show the

site of this mutation is extremely sterically sensitive, permitting only glycine and alanine,

the female- and male-specific residues in native DSX.

Finally, we aim to provide insight into the mechanism of IX recognition by DSXF. Our work suggests dimerization is required to present IX with an ordered binding surface, and that this putative binding surface consists of a groove which crosses both the sex-specific region and the dimer interface. We further speculate on the relevance of the UBA fold in sex-determination, and initiate studies into ubiquitin binding by the novel DSX UBA dimer. 37

Table I-1

DNA binding and oligomerization properties of the isolated DM domain (DBD), DSXM, and DSXF (reproduced from Cho and Wensink, 1998).

DBD DSXF DSXM apparent DNA affinity (M) 7 x 10-9 0.2 x 10-9 0.2 x 10-9 intrinsic DNA affinity (M) 0.48 x 10-9 0.17 x 10-9 0.17 x 10-9 dimerization constant (M) 430 x 10-9 0.16 x 10-9 0.05 x 10-9 DNA binding cooperativity 0.9 2.6 5.4

38

Figure I-1

Sex determining hierarchy in Drosophila melanogaster. The RNA-splicing cascade is initiated by the ratio of X chromosomes to autosomes, leading to the production of male- and female-specific isoforms of DSX. Also shown here is the male-specific splicing

isoform of fruitless, inhibited by tra in females and responsible for male courtship

behavior and the muscle of Lawrence. 39

Figure I-1

X:A 2:2 1:2

sxl ON sxl OFF

tra frum tra-2

female other male dsx target dsx her isoform genes isoform

female male

40

Figure I-2

DSXF Blast search reveals multiple DM-containing homologs, but limited conservation

of the C-terminal. Shown are the first 41 of 273 hits with DSXM and redundant

sequences removed. DSX features low sequence complexity in the middle domain. High

insect homology in the C-terminal begins around DSX residue 300. Aligned insect DSX

from Drosophila melanogaster, D. pseudoobscura, D. buzzatii, Bactrocera oleae, B. tryoni, Ceratitis capitata, Musca domestica, Megaselia scalaris, Anopheles gambiae, and Bombyx mori.

41

Figure I-2 42

Figure I-3

Schematic of larval segmentation and imaginal disc location. Not shown are segments

A9-A10, which along with A8 comprise the terminalia. Figure adapted from http://flybase.net/images/lk/Anatomy/Imaginal_Discs/Larva-imaginal-lbld.jpeg. 43

Figure I-3

44

Figure I-4

Segmental origins and dimorphic features of the genital disc. Segments A8-A10 give rise

to the genital disc. Segment A8 (red) contributes to the majority of female-specific sex

organs in the presence of DSXF. In XO individuals, segment A8 forms a miniture 8th

tergite. Segment A9 contributes to predominantly to male genital structures. In XX

individuals, A9 forms the parovaria. By the third larval instar, the genital disc exhibits

marked sexual dimorphism (schematic at bottom). FGP: female genital primordium;

MGP: male genital primordium; AP: anal primordium; RMP: repressed male primordium; RFP: repressed female primordium. Figure adapted from (Sanchez and

Guerrero 2001). 45

Figure I-4

A8 A9 A10-11 FGP MGP AP

Female Male Development Development DSXF DSXM DSXF DSXM

Larval development Larval development

FGP RMP AP RFP MGP AP

RMP MGP

FGP RFP AP AP 46

Figure I-5

Dsx alternative mRNA splicing and protein organization. (A) RNA-splicing as determined by Burtis and Baker (Burtis and Baker 1989). The male and female isoforms share the first three exons (yellow boxes). The female isoform is extended by exon four

(top splicing, red box), whereas the male isoform is extended by exons five and six

(bottom splicing, green boxes). The resulting proteins are identical for the first 397 amino acids and differ thereafter. (B) Organization of DSX isoforms. Shared region spans residues 1-397 containing DM domain (35-105; gray box) and proximal portion of dimerization domain (350-397; black box). Sex-specific regions comprise residues 398-

427 (DSXF; red) and 398-549 (DSXM; green). CTDF interacts with coactivator IX. 47

Figure I-5

48

Figure I-6

Regulation of sxl transcripts from the early promoter (A) and maintenance promoter (B).

Female splicing of sxl is initiated by the ratio of X chromosomes (the numerator signal) to autosomes (the denominator signal), and influenced by the maternal gene products da, her, gro, and emc. (B) Maintenance of functional sxl is necessary for continued female development. SXL, along with SNF, VIR, and FL(2)D remove a stop codon-containing exon, allowing for continued sxl expression. Figure is after (Cline and Meyer 1996). 49

Figure I-6

50

Figure I-7

Organization of the yp fbe region and co-regulator binding sites. (A) Schematic of the

fbe with distance in base pairs between the central A/T in the DSX binding sites. DSX binding sites are separated by near-integral turns of DNA. (B) Schematic of putative co- regulator binding sites within the fbe. Note the substantial overlap of binding sites.

Minor groove binding by DSX would theoretically allow concurrent binding by a bZIP protein. Nucleotides in bold indicate deviations from the DSX consensus DNA-binding sequence. 51

Figure I-7

A

-308 dsxA dsxB dsxC -196

23bp 43bp

bzip1 B aef1 dsxA 5’ – GTG CACA ACTACA A TGTTGC AAT CAGCGG – 3’

dsxB bzip2 5’ – GAG CCTACAAAGTG AT TACAAATT AAAATA – 3’

dsxC bzip3 5’ – GGT GCTGCTA AGT CAT CAGTGGG GTCAGC – 3’

52

Figure I-8

DSX DM GMSA of the dsxA binding site showing DNA-dependent dimer cooperativity.

A 2:1 DM-DNA complex (C2) is pronounced relative to the 1:1 complex (C1), even in the presence of free DNA (F). Lanes 2-7 contained increasing concentrations of DM domain (4, 8, 12, 18, 24, and 48 nM, respectively). Figure adapted from (Zhu et al.

2000). 53

Figure I-8

C2 C1

F 54

Figure I-9

DM domain (residues 41-81) solution structure. (A) Stereo ribbon diagram showing zinc atom coordination (sites I&II, boxed). Cysteine and histidine residues by Burtis and co- workers are shown here coordinating zinc. (B) Stereo backbone ensemble of DM domain. Figure is from (Zhu et al. 2000). 55

Figure I-9 56

Figure I-10

Courtship ritual of Drosophila melanogaster. The male begins by chasing and orienting towards the female, followed by tapping with a front foreleg. The male then asymmetrically extends a wing to produce a courtship song, a component of which requires the DSXM product. The male continues courtship with licking and copulation.

Illustration is from http://www.is.wayne.edu/mnissani/PAGEPUB/FLIES.HTM. 57

Figure I-10 58

Figure I-11

Common α-helical dimerization motifs. In each case one monomer is shown in blue and

the other in magenta. (A) Coiled-coil (leucine zipper) from the C-Myc-Max heterodimer

(PDB code 1A93). (B) An engineered 4HB (PDB code 1EC5). (C) An intertwined 4HB

with a mini-zipper component from the HNF-1α (MODY) protein (PDB code 1JB6). (D)

A bHLH dimer from MyoD. Here the HLH motif is mediating dimerization while the

zipper region is involved in DNA binding (DNA not shown; PDB code 1MDY). (E) An

EF-hand dimer bound to two Ca2+ atoms (gray spheres). The domain is from Troponin C site III (PDB code 1CTA). 59

Figure I-11

60

CHAPTER II

EXPRESSION, CRYSTALLIZATION AND PRELIMINARY X-RAY AND NMR CHARACTERIZATION OF THE DROSOPHILA TRANSCRIPTION FACTOR DOUBLESEX

Portions of this chapter have been published as Bayrer et al., 2004

Introduction

Doublesex (DSX) is a downstream component of the well-described sex-determining

hierarchy of Drosophila melanogaster. In this genus sex is determined by the ratio of X

chromosomes to autosomes. When the ratio is 2:2 (XX genotype), the sex lethal gene

(encoding an RNA splicing factor) is activated, leading to the expression of Transformer

(TRA), a second RNA-splicing factor. TRA, with TRA2, regulates the female splicing

isoform of DSX (DSXF). When the ratio of X chromosomes to autosomes is 1:2 (XO

genotype), sex lethal is not expressed, and the male isoform of DSX (DSXM) is produced by default. Messenger RNAs encoding DSXM and DSXF contain the same first three

exons; the C-terminal region of DSXF is encoded by exon four whereas the C-terminal

region of DSXM is encoded by exons five and six (Burtis and Baker 1989). Thus, as a

consequence of sex-specific splicing, male and female isoforms are identical for the first

397 residues, but differ for the remainder of the proteins.

The N-terminal domain of DSX (residues 35-105) contains a DNA-binding motif,

designated the DM domain (Doublesex and MAB-3 domain; (Raymond et al. 1998). As

the acronym implies, this domain is also found in Caenorhabditis elegans transcription

factor MAB-3, required for male somatic differentiation. Indeed, studies of transgenic 61

worms have demonstrated that the male (but not female) isoform of DSX has been shown

to rescue male features in C. elegans in the absence of MAB-3 (Raymond et al. 1998).

Recently, the DM domain's role in sexual differentiation has been shown to be conserved

among metazoans. In humans, for example, deletion of three contiguous DM-containing genes on chromosome 9 is associated with XY sex reversal and gonadoblastoma (the 9p

Syndrome) (Ottolenghi and McElreavey 2000). The structure of the DM domain, determined by NMR spectroscopy, contains a novel zinc module that binds in the minor groove of DNA (Zhu et al. 2000;Narendra et al. 2002).

The C-terminal domain of DSX (residues 350-427 in DSXF and 350-549 in DSXM)

contains both sex-specific and non-sex-specific residues. This domain is responsible for

oligomerization and presumably for the differential transcriptional regulation effected by

male and female isoforms (Erdman et al. 1996;An et al. 1996). A mutation in XX

(genetically female) fruit flies at position 398 (G398D) results in an intersex phenotype

(Nothiger et al. 1987). This mutation impairs dimerization in vitro, implying that

dimerization is required for the function of DSXF in vivo (Erdman et al. 1996).

Homologues of the dimerization domain are not recognizable outside of insects. Given the broad conservation of the DM domain, however, it is possible that the three- dimensional structure of the dimerization domain will be shared by proteins with limited sequence similarity. In this chapter we describe the expression, purification, and crystallization of the DSXF dimerization domain, its labeling with selenomethionine, and

preliminary crystallographic X-ray characterization. We further describe the isotopic

labeling and preliminary NMR analysis of CTDF. 62

Materials and methods

Protein expression and purification

Coding sequences corresponding to the C-terminal domain of DSXF (residues 350-427)

and truncated analogue (residues 350-412) were ligated into expression plasmid

pMW127 (Hinck et al. 1993;Ukiyama et al. 2001). The truncated analogue was designed

based on prior yeast two-hybrid studies showing that residues 413-427 are not required

for dimerization (Erdman et al. 1996;An et al. 1996); sequence analysis suggests that this

tail is unstructured. The domains were expressed in Escherichia coli strain

BL21(DE3)pLysS (Invitrogen, Carlsbad, California) as fusion proteins with an N-

terminal His6-tag-Staphylococcal Nuclease fused to the target peptide via a thrombin- cleavable linker. Expression is under the transcriptional control of isopropyl-β-D- thiogalactoside (IPTG) (Roche, Indianapolis, Indiana). Essentially identical protocols were employed for protein purification. In each case a 120-ml starter culture containing

2YT medium and antibiotics was grown overnight with shaking at 37 °C and subsequently inoculated into 4.5 l of 2YT containing ampicillin (AMP) (Roche,

Indianapolis, Indiana) and chloramphenicol (CA) (Boehringer Mannheim, GmbH,

Mannheim, Germany). Cultures were grown to an optical density at 600 nm (OD600) of ca. 0.6 and then induced with 0.5 mM IPTG. Cells were grown to ca. 1.0 OD600 and harvested by centrifugation at 3500 g for 15 minutes at 4 °C. The cell pellet was harvested, resuspended in extraction buffer (20 mM imidazole, 20 mM Tris-HCl (pH

7.0), and 200 mM NaCl) with 0.75 mg/ml lysozyme and 0.174 mg/ml phenylmethylsulfonyl fluoride (PMSF) (Sigma, St. Louis, Missouri), and incubated for

20 min at room temperature. The mixture was then exposed to three freeze-thaw cycles 63

followed by sonication on ice. The lysate was centrifuged at 17,400 g for 1 hour at 4 °C.

The supernatant was reserved, and the pellet resuspended in extraction buffer, sonicated,

and centrifuged again. The supernatants were combined, incubated with Co2+-affinity

resin (BD Biosciences, Palo Alto, California), and applied to an FPLC column at 4 °C.

The column was washed with two-to-three column volumes of extraction buffer. The

fusion protein was eluted from the column with elution buffer (250 mM imidazole, 20

mM Tris-HCl (pH 7.0), and 200 mM NaCl) and dialyzed into cleavage buffer (5 mM

imidazole, 20 mM Tris-HCl (pH 7.9), and 200 mM NaCl) overnight at 4 °C. Thrombin

cleavage was accomplished at room temperature with ca. 1 unit thrombin (Sigma, St.

Louis, Missouri) per mg fusion protein until completion as assayed by SDS-PAGE,

usually ca. 4 hours. The mixture was applied to a Co2+-affinity column, and the flow-

through containing the protein of interest was collected. Eluted fractions containing the

domain were concentrated using an Amicon ultrafiltration system with YM1000 filter

(Millipore, Billerica, Massachusetts). The protein was purified to near homogeneity by

gel filtration (Superdex 75; Amersham Pharmacia, Uppsala, Sweden) in the case of the

truncated domain and by anion exchange (DEAE-5PW; Toso Haas, GmbH, Stuttgart,

Germany) in the intact domain.

Selenomethionine-labeled protein was expressed in B834(DE3)pLysS competent cells

(Novagen, Madison, Wisconsin) grown in M9 minimal medium supplemented by all

amino acids except methionine at 40 mg/l and selenomethionine (Sigma, St. Louis,

Missouri) at 50 mg/l. A 140-ml starter culture containing 135 ml of the supplemented

M9 media and 5 ml of 2YT media and antibiotics was grown overnight at 37 °C. 120 ml 64

of the starter culture was inoculated into 4.5 l supplemented M9 medium together with 6 ml of 2YT medium to facilitate cell growth. Induction, harvest, and purification were as

described for the native domains.

Uniformly 15N and 13C isotopically labeled domains for NMR analysis were expressed in

BL21(DE3)pLysS cells in M9 minimal media with 15N-ammonium sulfate and 13C- glucose as the sole nitrogen and carbon sources, respectively. 1 ml of culture testing positive for protein expression was inoculated into 150 ml minimal media for overnight growth. 120 ml of the starter culture was inoculated into 4.5 l M9 media and allowed to grow at 37 °C with shaking. The culture was induced with 1 mM IPTG at OD600 ca. 0.5-

0.6, typically reached after five hours of growth. Cultures were then allowed to grow

until OD600 ca. 1.0, typically 10-15 hrs post induction. Harvest and purification were as

described for the native domains.

Crystallization

The purified native fragment (residues 350-412) was concentrated in an Amicon

ultrafiltration cell using YM1000 filters to ca.10 mg/ml in GPC buffer (20 mM Tris-HCl

(pH 7.4) and 200 mM NaCl). Crystallization trials were performed with the Hampton

Crystal Screening Kits I and II and PEG/Ion screen using the hanging-drop vapour-

diffusion method (Hampton Research, Laguna Niguel, California). Drops were formed

by mixing 1-4 μl of protein stock with 1-4 μl of reservoir solution. Crystallization drops

were allowed to equilibrate at either room temperature or 4 °C for 1-6 weeks over 1-ml

reservoir solution. Conditions initially producing crystals were optimized by varying the 65 precipitant and protein concentrations. Optimal crystallization conditions for the dimerization domain were determined to be 1.85 M ammonium sulfate and 7% isopropanol with a protein stock concentration of 12 mg per ml. Crystallization conditions for the Se-Met containing analogue were identical to those above. Crystals were not obtained for the intact C-terminal domain (residues 350-427) under any of the conditions tested.

Crystallography data collection and processing

Native X-ray diffraction data were collected under cryoconditions on beamline 14BMC at Argonne National Laboratories. Prior to data collection, crystals were washed in a solution containing reservoir solution and 15% v/v glycerol, mounted in nylon loops, and flash-frozen in liquid nitrogen. Data collection was at 0.9000 Å at a film-crystal distance of 110 mm and oscillation range of 1°. Single wavelength anomalous dispersion (SAD) data were collected under cryoconditions on beamline X9B at Brookhaven National

Laboratories. Data reduction was performed using DENZO (native) or HKL2000 (SAD).

Scaling was performed with SCALEPACK (native) or HKL2000 (SAD) (Otwinowski and Minor 1997). Substructure determination and phasing were accomplished with the

SHELX suite of programs (Sheldrick 1990).

NMR data collection and analysis

NMR data were collected on Varian and Bruker instruments at 600, 700, and 800 MHz.

Protein concentrations were typically ca. 0.5-1.5 mM (monomer equivalents) in 10 mM

2H-Tris-HCl (pH 5.6), 50 mM NaCl and 5 mM deuterated dithiolthreitol; experiments 66

were typically performed at 30 °C. Data were analyzed with VNMR and FELIX for

Varian datasets, and XWIN-NMR, NMRPIPE, and NMRVIEW for Bruker datasets.

Mixing times for NOE spectroscopy (NOESY) were 75 and 200 ms; the mixing time for

total correlation spectroscopy (TOCSY) was 55 ms. Triple-resonance experiments were

collected as collaborative efforts with Varian, Inc. and Bruker BioSpin.

Results and discussion

Expression and purification

Protein expression was tested with 2 ml cultures grown from single colonies. Those cultures showing greatest over-expression, as determined by SDS-PAGE, were selected for large-scale protein expression. After elution from the first cobalt column, a band slightly less than the molecular weight of the fusion protein (36.5 kDa for the dimerization domain, 34.5 kDa for the analogue) was in each case observed by SDS-

PAGE. DSX has been noted to migrate abnormally on SDS-PAGE gels (Burtis et al.

1991;Cho and Wensink 1996). The purified proteins were assayed by mass spectrometry and found to correspond to the predicted molecular weights of the domain plus Gly-Ser derived from the thrombin cleavage site (9.402 for the dimerization domain, 7.737 KDa for the truncated analogue). It was noted that protein purified by reverse-phase high- performance liquid chromatography (HPLC) was unusually susceptible to oxidation and aggregation, and hence not used for these studies. Purification of the domains by FPLC gel filtration or ion exchange under non-denaturing conditions resulted in each case in a stock solution stable at room temperature and amenable to crystallization trials.

Selenomethionine and isotopically labeled protein was purified similarly. Protein yields 67

were typically 8 mg/l under native conditions and 6 mg/l of culture for incorporation of

selenomethionine.

Crystallization

Initial trials of DSXF WT 350-427 failed to yield crystals. NMR and circular dichroism

(CD) studies suggest that this domain consists of a well-ordered core dimer with a

disordered tail (see Chapter IV). Reasoning that this tail might be impairing

crystallization, we pursued crystallization of a dimeric fragment lacking this tail (residues

350-412). Trials were initially started with a protein stock solution of 8.5 mg/ml.

Crystals were observed at 3 weeks in 2.0 M ammonium sulfate and 5% isopropanol.

Conditions were optimized to 1.85 M ammonium sulfate and 7% isopropanol with a

protein stock solution of 12 mg/ml. Under these conditions, crystals began forming

within three days, with the most useful crystals (as determined by diffraction quality on

homesource radiation) forming within one week at room temperature. Crystal size

ranged from 0.1 x 0.1 x 0.1 mm3 to 0.5 x 0.5 x 0.3 mm3, with the best diffracting crystals

0.3 x 0.3 x 0.2 mm3. Selenomethionine crystals were grown in the same conditions as for

the native crystals. The morphology of the selenomethionine crystals varied slightly

from the native crystals. Whereas both were roughly square shaped, the

selenomethionine crystals were generally thinner with an "X" across the face. As indicated by the X-ray diffraction pattern, this "X" did not appear to be due to satellite crystal formation.

68

Crystallographic data collection and preliminary X-ray characterization

Native crystals diffracted to 1.6 Å resolution on beamline 14BMC at Argonne National

Laboratories. The crystal belongs to space group P212121, with unit cell parameters a =

39.773 b = 46.623 c = 59.771 Å. Assuming one dimer per asymmetric unit (15.4 kDa),

the Matthew coefficient is calculated to be 1.79 with solvent content 30%. To obtain experimental phases, a SeMet-derivative crystal was measured at the Se peak (λ = 0.9788

Å). Data to 1.8 Å resolution were collected on beamline X9B at Brookhaven National

Laboratory. Crystal parameters were similar to those of native crystals (space group

P212121 and unit cell parameters a = 39.414 b = 46.759 c = 58.596 Å). Four selenium

sites were located in the asymmetric unit. Although the intact dimerization domain

(residues 350-427) appears refractory to crystallization, collaborative NMR spectroscopy

studies have extended the crystal structure of the 350-412 domain to include the missing

tail.

NMR data collection and preliminary characterization

One dimensional (1D) 1H NMR was used to ascertain appropriate salt, pH, and

temperature conditions. Initial trials employed phosphate buffer in potassium chloride,

however, initial concerns of possible metal coordination brought on in part by Cys360

conservation and the domain's sensitivity to reverse-phase HPLC denaturation, led to the

use of 2H-tris buffers for these and subsequent studies. Temperature ranges from 18 °C

to 35 °C were investigated, with the bulk of the data being collected at 30 °C; pH

conditions were optimized to the range pH 5.4-5.6.

69

1D 1H NMR studies demonstrated good signal dispersion and resolution for the truncated domain. Interestingly, the G398A substitution yielded sharper resonances, consistent with its stabilizing effects on the domain (Figure II-4, p. 79). For this reason, much of the preliminary homo- and heteronuclear studies employed the G398A domain, a strategy previously used in the study of Arc repressor (Zagorski et al. 1989).

Heteronuclear NMR is a powerful method for structure determination of medium-to-large domains previously intractable to study (Kay 2005). In a dimeric system, heteronuclear

NMR also allows differential labeling of protomers, and consequently, assignment of intermolecular NOEs in symmetric dimers. We undertook 15N and 13C isotopic labeling

for these purposes. The 1H-15N HSQC spectrum of the CTDF and truncated domain show many well-resolved cross peaks (Figure II-5, p. 81). Additional cross peaks arising from the C-terminal tail fall largely into the near random coil chemical shift region, while the remaining cross peaks are generally the same, suggesting the core ordered domain between the CTDF and the truncated domain are similar, and that this domain is extended

by a flexible tail. We also conducted a host of three dimensional heteronuclear

experiments (Table II-2, p. 72) to allow the complete assignment of resonances for the

CTDF and CTDF-p.

Acknowledgments

We thank Nelson Phillips for his aid in protein purification and G. Reddy (University of

Chicago) for mass spectrometry. We also thank Varian Inc. and Bruker BioSpin for 70 triple-resonance data collection. This work is a contribution from the Cleveland Center for Structural Biology. 71

Table II-1

X-ray data-collection and analysis statistics

Native SeMet Wavelength (Å) 0.9000 0.9788 Unit Cell a = 39.8 b = 46.6 c = 59.8 a = 39.4 b = 46.8 c = 58.6 Parameters (Å) 19-1.6 (1.66-1.60) 32.7-1.8 (1.86-1.80) Resolution 164648 145768 Range (Å) No. measured 15159 (1467) 19414 (1905) reflections No. unique 99.1 (96.9) 100 (100) reflections Completeness 3.8 (0.182) 8.7 (0.456) (%) 62.3 (11) 24.0 (3.4)

Rmerge (%) Average I/σ(I) Highest resolution shell statistics are in parentheses.

72

Table II-II

Triple-resonance experiments

Heteronuclear Experiment Heteronuclear Experiment HNCA CBCACONH HNCO HCCH TOCSY HNCOCA HCC(CO)NH HNCACB NOESY 13C/15N HSQC HNCOCACB 13C-Edited NOESYa a100 and 200 msec mixing times 73

Figure II-1

Crystals of DSXF WT (residues 350-412) grown in 1.9 M ammonium sulfate and 7% isopropanol. 74

Figure II-1

75

Figure II-2

X-ray diffraction pattern of native DSXF WT 350-412 collected at APS. Exposure time

2 sec, distance 110 mm, oscillation range 1.0°. An ADSC Q4 CCD detector was used to record the image. The frame edge close-up is 1.6 Å. 76

Figure II-2

77

Figure II-3

Harker section from anomalous scattering Patterson map showing Selenium positions. 78

Figure II-3

79

Figure II-4

1 F F 1D H NMR spectra in H2O of (A) DSX WT 350-427 and (B) DSX 398A 350-412.

Sample concentration was 1.5 mM in 10 mM Tris-HCl, 50 mM NaCl, and 5 mM 2H-DTT at pH 5.5 and 30 °C. The vertical scale of the aliphatic spectra has been reduced by half.

The field strength was 600 MHz. 80

Figure II-4

81

Figure II-5

1H-15N HSQC spectra of (A) wild-type DSXF domain 350-427 (intact tail) and domain

350-412 (truncated tail). Proteins were made 1.5 mM in 10 mM deuterated Tris-HCl (pH

5.4) and 50 mM NaCl at 30 oC. Residues 413-427 are IYDGGELRNTTRQCG. Arg side chain NεH resonances are folded at bottom right. 82

Figure II-5

83

CHAPTER III

DIMERIZATION OF DOUBLESEX IS MEDIATED BY A CRYPTIC UBA DOMAIN: IMPLICATION FOR SEX-SPECIFIC GENE REGULATION

(Portions of this chapter appear in (Bayrer et al. 2005c)

Introduction

Sexual differentiation in Drosophila is regulated by the X:autosome ratio and a sex-

specific RNA-splicing pathway (Figure III-1A, p. 100; (Cline and Meyer 1996). A

principal target is doublesex (dsx): expression of male- and female-specific transcription

factors (DSXM and DSXF) directs most aspects of somatic sexual differentiation (Burtis

and Baker 1989). The DSX isoforms are encoded by mRNAs sharing the first three exons; the C-terminal segment of DSXF is encoded by exon four whereas that of DSXM is encoded by exons five and six. Male- and female isoforms are thus identical for the first

397 residues but differ thereafter (Figure III-1B, p. 100; (Burtis and Baker 1989).

Common elements include an N-terminal DNA-binding domain, the DM motif

(Raymond et al. 1998). Mutations in the DSX DM domain (a non-classical Zn module;

(Zhu et al. 2000) block DNA binding in association with intersexual phenotypes (Burtis

1993; Erdman and Burtis 1993). Broad conservation of the DM domain in metazoan proteins related to sexual differentiation suggests that mechanisms of sexual dimorphism are in part universal (Raymond et al. 1998).

84

The C-terminal domains of DSXF and DSXM (CTDF and CTDM) are highly conserved among insect homologs but not more broadly (Figure III-2, p. 102). The domains contain a common dimerization element and sex-specific extensions proposed to mediate recruitment of transcriptional co-regulatory factors (Erdman et al. 1996; Garrett-Engele et al. 2002). Dimerization enhances specific DNA binding (Cho and Wensink 1998). A mutation (G398D) in the CTDF that blocks dimerization is associated with intersexual

development (Erdman et al. 1996). Position 398 is the first sex-specific residue,

consequently XY individuals are unaffected: the male-specific exon five encoding

Ala398 is unchanged. The dimer does not contain a recognizable structural motif but has

been predicted to form a coiled-coil (An et al. 1996). Secondary structure analysis by

Wensink and co-workers predict the protomer to be composed of three helical segments

(Figure III-3, p. 104; (An et al. 1996).

Here we describe the crystal structure of a dimeric fragment of CTDF at 1.6 Å resolution

and its scanning mutagenesis in a yeast two-hybrid (Y2H) system. The polypeptide

(designated CTDF-p) spans residues 350-412 and so contains both shared and sex-

specific sequences. Surprisingly, the structure reveals a novel dimeric arrangement of

UBA domains. To our knowledge, this ubiquitin-binding motif -- although widely

conserved among pathways regulating DNA repair and subcellular trafficking -- has not

previously been found in a transcription factor. Dimerization is mediated by an

extensive non-polar interface opposite to the canonical Ub-binding surface. The structure

of CTDF-p thus suggests that the ubiquitination machinery has an unsuspected role in the regulation of sexual dimorphism. 85

Materials and methods

Protein Crystallization

CTDF-p (65 residues; GS followed by DSXF residues 350-412) was designed based on

Y2H studies (An et al. 1996;Erdman et al. 1996) and expressed in Escherichia coli (strain

B834(DE3)pLysS) as a thrombin-cleavable fusion protein and purified as described

(Bayrer et al. 2004). For selenomethionine labeling, the protein was expressed in M9 minimal medium containing 50 mg/L selenomethionine and all other amino acids (except methionine) at 40 mg/L (Bayrer et al. 2004). Crystals were obtained by hanging-drop vapor-diffusion in 4 μl drops containing equal volumes of protein stock (12 mg/ml) and reservoir buffer (1.8 M ammonium sulfate and 7% 2-propanol).

Data Collection and Structure Determination

Native data were collected at APS beamline 14BMC at 100 K. Single-wavelength anomalous dispersion (SAD) data were collected at NSLS beamline X9B at the selenium peak (0.9788 Å). Data-collection statistics are given in Table 1. Data were integrated and

scaled with HKL2000 (Otwinowski and Minor 1997). Substructure determination and

phasing were accomplished with the SHELX suite of programs (Sheldrick 1990). Initial

model building employed Warp/ARP (Lamzin and Wilson 1993). Additional rounds of model building were performed using O (Jones et al. 1991). Initial refinement was

accomplished using CNS, applying non-crystallography symmetry, overall B-factor

corrections, and bulk-solvent corrections (Brunger et al. 1998). Final rounds of

refinement were performed with the CCP4 program REFMAC5 (Collaborative 86

Computational Project 1994). Accuracy of the model was assessed with DDQ (van den

Akker and Hol 1999); statistics are given in Table III-2, p. 98. Figures were generated

using PYMOL (DeLano 2002).

Y2H Assays

CTDF homodimerization was probed using the MATCHMAKER Gal4 system (BD

Clontech, Palo Alto, CA). Substitutions were introduced into plasmids by PCR

mutagenesis (Narendra et al. 2002) and verified by DNA sequencing. pGADT7- and

pGBKT7-CTDF variants (residues 350-427) were co-transformed in pairs into yeast strain

AH109 by lithium acetate. Interactions were monitored by β-galactosidase activity

(below). Expression levels were verified by Western blot using anti-Gal4 antibodies

(Upstate Group, Charlottesville, VA). The o-nitrophenyl-D-galactose assays were used

as a quantitative assay of β-galactosidase activity.

Results

We have determined the crystal structure of the CTDF–p (residues 350-412) at a

resolution of 1.8 Å by SAD phasing with a selenomethionine-containing derivative and

extended the resolution to 1.6 Å with native data from synchrotron radiation.

Representative electron density is shown in Figure III-4A, p. 106. Data collection and

refinement statistics are supplied in Table III-1, p. 97. Model building statistics are

supplied in Table III-2, p. 98. While the R factor and free-R appear somewhat elevated,

they are well within published ranges. The N- and C-terminal five residues are not well

ordered (including the N-terminal Gly-Ser derived from thrombin cleavage). These

residues (representing 16 percent of the sequence) are neither completely disordered nor 87

well defined. Consequently fitting of models to the partial density (or leaving the

residues out) limits the refinement statistics. We are unable to place the first three

residues of one protomer, or side chains for Leu351 and the final six C-terminal residues.

Within the core structure, side chain density is generally well-ordered. The model

exhibits good stereochemistry, with 94 percent of non-Gly, non-Pro residues occupying

the most favored region of the Ramachandran and the additional 6 percent occupying the

additionally allowed region. No residues occupy the generously allowed or disallowed

regions.

Overview of Structure

The dimer is ellipsoidal with protomers oriented head to tail (Figure III-4B, p. 106). Each

protomer contains three α-helices and, unexpectedly, exhibits a UBA fold. The core

helical domain is extended by a disordered tail. Residues 350-352 are not well ordered.

Helical segments are 353-367 (α1), 371-384 (α2), and 388 through 407 (α3). The second

helix exhibits a kink at P375 (Figure III-5, p. 108), separating subsegments α2A (371-374

form a 310 helical turn; green cylinder in Figure III-6A, p. 110) and α2B (red cylinder).

Residues 384-387 comprise a type I' β-turn; the turn at 367-371 is non-canonical.

Substantial hydrophobic contacts ― characteristic of a UBA fold ― occur between α1 and α2 and in part by α3 (Leu363, Met377, and I395; see mini-core below). The surface of α3 contains (i, i+3) and (i, i+4) salt bridges from E390 to R393 and R394 (Figure III-

7A, p. 112). The female sequence begins midway through α3 (398-407); residues 408-

412 are disordered. Despite limited sequence homology (Figure III-6A, p. 110 and

Appendix III), structural alignment of a CTDF-p protomer with 18 UBA domains yields a 88

mean pairwise root-mean-square deviation (RMSD) of 1.7 Å (range 1.2-2.3 Å) among

main-chain atoms (Figure III-6B, p. 110); the mean baseline pairwise RMSD among

these 18 domains is 1.3 Å. The RMSD between CTDF-p and a related CUE domain

(green in Figure III-6B, p. 110; (Kang et al. 2003) is 1.8 Å.

The N-terminal of each protomer begins with an extended sequence that is not well- ordered. Helix α1 begins with residue Gln353 and extends to residue Phe367. This helix does not participate in the dimer formation, having only minor interactions between

Asp354 and Leu357 of α1 and Tyr405 of α3', passing in an orthogonal manner. α1 does however have substantial hydrophobic interactions with helix α2, forming a typical UBA mini-core. At the center of this hydrophobic patch is Cys360, which is implicated in the formation of an "inside-out" dimer that forms upon thermal and hydrophobic denaturation.

The second helix is subdivided into α2A and α2B due to a proline-induced kink (P375) in

the N-terminus of the helix. As mentioned above, α2A adopts a 310 helical conformation.

Pro370 and Pro375 both appear to be required for dimer formation, implying they may be

necessary for limiting the N-terminal region α2 and locking the interface into place.

Helix α2 is the shortest of the three helices (ca. 19 Å), but provides the core of the

dimerization interface, exhibiting substantial contacts with α2' and α3' (discussed below).

These interactions are mainly hydrophobic, although several residues are involved in salt

bridge and hydrogen bond formation. α2 and α2' are arranged roughly parallel with each 89

other and perpendicular to helix α3. Tyr378 and Lys382 each form a hydrogen bond

with Glu397 from α3'.

Helix α3, consisting of residues Ile388 through Arg407, forms the longest helix in the dimerization domain at nearly 29 Å and contains the transition from the non-sex-specific

sequence to the sex-specific sequence. Helices α3 and α3' are aligned roughly anti-

parallel, giving the structure its elongated form.

Protomer mini-core

The protomer mini-core is centered around Met377 and Cys360 (Figure III-8, p. 114),

and includes contributions from all three helixes, but is dominated by non-polar

contributions from α2. Met374, Met377, and Leu381 project into the core opposite

Cys360, Leu363, and Leu364 from α1. Met377 and Ile380, along with L381, also

interact with Ala391 and I395 from α3. Interactions between α1 and α3 also contribute

to the mini-core. Leu363 packs against α3 between I388, A391, and I395; Ser392,

Tyr359, and Lys366 seal this side of the core. The other two sides of the mini-core are also involved in the dimer interface.

Novel Dimer Interface

Dimerization is mediated by helices α2 and α3 (Figure III-4B, p. 106). Whereas α1 and

α3 define the surface of the dimer, α2 and α2’ pack at the interface: they are parallel and

nearly perpendicular to α3 and α3’. Packing between α2 and α2’ exhibits “knobs in

holes” complementarity (Figure III-5B, p. 104), where hydrophobic side chains project 90

across the dimer interface into corresponding hydrophobic pockets. 24 residues in each

protomer define an extensive interface; the core is non-polar (L357, Y369, W371, L373,

M374, P375, L376, Y378, V379, I395, G398, V401, and V402; Figure III-4B, p. 102).

The overall interface juxtaposes concave and convex surfaces. The contact area of a protomer (1077 Å2) spans 29% of its total surface (green regions in Figure III-10A,B, p.

118); dimerization thus buries a combined surface of 2153 Å2. Dimerization of the

subunits results in a structure more globular than the individual subunits based on the

percent change in surface area to molecular weight ratio. This finding is in agreement

with the hydrophobic forces driving subunit association (Jones and Thornton 1995).

Hydrophobic packing is extended by intermolecular salt bridges, hydrogen bonds, and a

bridging network of water molecules (D354, K382, D383, R394, E397, Q399, and Y405;

Figure III-7A, p. 112). R394 participates in both intramolecular and intermolecular salt

bridges (E390 and D383’); E397 interacts with Y378’ and K382’ (Figure III-7B, p. 112).

Such interactions (and neighboring solvation) apparently offset the proximity of D383

and D383’; furthermore, D383 may be protonated in the dimer. Minor α1-α3'

interactions are also observed (involving D354, L357, and Y405’). The junction between sex-non-specific and sex-specific portions of CTDF-p (G398) packs against α2’ across

the dimer interface. This junction is a site of mutation (G398D) associated with

intersexual development (Erdman et al. 1996). The overall structure is comprised

predominantly of non-sex-specific residues and so is likely to be similar in CTDM.

91

Alanine Scanning Mutagenesis

Y2H studies of variant domains were undertaken as a collaborative effort by W. Zhang to

test the contribution of interfacial side chains to dimerization (Table III-3, p. 99). A

negative control was provided by G398D (Erdman et al. 1996). 17 Ala substitutions were

tested: 4 substitutions in the core (Y378, V379, I380, I395) markedly impair β- galactosidase activity where 3 substitutions at the surface (W371A, Y400A, and N403A) yield no perturbations. Ala substitutions at three interfacial valines in α2 and α3 exhibit partial (V379 and V403) or full (V401) activity; the resulting packing defects may be offset by the higher helical propensity of Ala. R394A (expected to disrupt inter- and intramolecular salt bridges) and K382A (expected to disrupt a dimer-specific salt bridge) markedly impair activity. E396A and E397A are partially tolerated; D383A has no effect. Interestingly, the two prolines, although not features of canonical UBA or CUE domains, appear essential (P370A and P375A have negligible dimerization in our assay).

G398A enhances β-galactosidase activity, presumably by stabilizing α3.

Putative Ub-Binding Surface

The canonical Ub-binding surface of known monomeric UBA and CUE domains (gold

surface in Figure III-10A, p. 118) does not overlap with the dimer interface of CTDF-p

(green surface). It is thus possible that the CTDF-p dimer presents two Ub-binding sites, one on each side. Alignment of one protomer with CUE2 as bound to Ub (Kang et al.

2003) yields a model of a complex between Ub and the CTDF-p dimer (Figure III-6C, p.

110). This model suggests that potential UBA-Ub salt bridges are conserved; D18 and

D40 of CUE2 align with E365 and E389 of CTDF-p. The model also permits a second 92

Ub molecule to bind to the other protomer. We thus envisage that such a UBA dimer

could enable bidentate recognition of a poly-Ub chain.

Discussion

CTDF sequence conservation

The CTDF-p shows remarkable sequence conservation, extending from the dimeric core

to the surface (Figure III-11, p. 120). High levels of conservation in the core are not

surprising, as disruption of the core can lead to impaired dimerization and folding (Bayrer

et al. 2005b). The level of surface conservation, however, is striking and suggestive of

conserved functionality (Nooren and Thornton 2003).

Relationship to previous Y2H studies

Previous Y2H studies by Burtis and co-workers identified critical regions in the CTDF-p and surrounding sequence using error-prone PCR (Erdman et al. 1996). One such region is the previously mentioned Gly398, which remarkably re-surfaced during mutagenesis.

The other substitutions noted by Burtis and co-workers all appeared as double or triple mutations, which make the effect of any single mutation uncertain. It is, however, illustrative to examine the native environment of those mutations occurring with the

CTDF-p. Leu373 and Met377 were identified as a pair of mutations leading to loss of

dimerization. Leu373 and Leu373' pack against each other at the N-terminal of α2,

sealing one edge of the dimer interface (Figure III-12A, p. 122). Met377, along with I395

(identified as one member of a triple mutation) contribute to the mini-core of the CTDF-p

protomer and also to the edge of the dimer interface (Figure III-12B, p. 122). Leu381 and 93

Lys382 were also implicated, along with Q313, in impaired dimerization. We

demonstrate Lys382 is involved in an interprotomeric salt bridge formation with E397'

(Figure III-7B, p. 112); mutation of Lys382 to alanine in our Y2H also results in a loss of dimerization. Leu381 is involved in the mini-core of the protomer, as described above

(Figure III-8, p. 114). In each case above the wild-type side chain appears to play a key role within the structure, and point to the importance of the UBA mini-core in stabilizing the dimer.

The CTDs of DSX enhance DNA recognition by providing a strong dimer contact (An et al. 1996; Erdman et al. 1996; Cho and Wensink 1997). Although the monomeric DM domain itself can bind DNA sites as a cooperative dimer (Zhu et al. 2000; Narayana et al.

2001), the CTDs enhance specific DNA binding by 35-fold (Cho and Wensink 1998).

The biological importance of CTD dimerization is suggested by a mutation in CTDF

(G398D) that blocks dimerization in association with intersexual development (Erdman et al. 1996). Unexpectedly, the crystal structure of CTDF-p demonstrates that dimerization is mediated by a UBA domain, previously unrecognized due to the absence

-10 of detectable sequence homology. The stability of the UBA dimer (Kd 10 M; (Cho and

Wensink 1998) reflects formation of an extensive hydrophobic interface flanked by inter-

subunit salt bridges and hydrogen bonds. Intersexual mutation G398D would insert an

uncompensated charge into this interface. The CTD dimer thus extends to the realm of

transcription the repertoire of UBA dimerization, previously implicated in cell-cycle

control (Bertolaet et al. 2001) and receptor trafficking (Liu et al. 2003). To our

knowledge, the CTDF-p represents the first structure of a UBA dimer. It is not known, 94

however, if our structure represents the mode of dimerization employed by other UBA

domains, or a novel class of its own.

The monomeric and dimer structures of the CUE domain, related to the UBA fold, have

been recently solved (Kang et al. 2003); (Prag et al. 2003). While the monomer is similar

to the UBA (and CTDF protomer) fold, the dimer unit is formed by domain swapping

(Figure III-12, p. 122). The Kd of the domain-swapped dimer is 1 mM, while that of the

CTDF-p is 0.01 nM (Bayrer et al. 2005b). The CUE-dimerization previously

demonstrated is unrelated to that of CTDF-p.

The presence of a UBA domain in DSX may represent the incidental recruitment of a

common structural motif or indicate a functional role for Ubiquitin (e.g., binding of

mono- or polyubiquitinated proteins, engagement of the enzymatic machinery of

ubiquitination and/or the proteosome) in DSX-mediated transcriptional regulation. To

our knowledge, the involvement of such processes has not previously been described (or

even suspected) in the Drosophila sex-determining hierarchy. Their involvement is nevertheless plausible in light of growing evidence that ubiquitination and ubiquitinated proteins play central roles in the regulation of eukaryotic gene expression. (i) Ub- triggered proteolysis can control transcription through "suicidal" regulation wherein each cycle of transcription is coupled to destruction of a specific transcription factor (Salghetti

et al. 2001). (ii) Ubiquitination can regulate subcellular localization (Rape et al. 2001)

and protein-protein interactions (Kaiser et al. 2000). (iii) Transcriptional initiation and functional mRNA processing can require degrons, proteolytic signaling elements within 95

transcriptional activation domains that recruit Ub ligases (Muratani et al. 2005). (iv)

Non-destructive ubiquitination of activation domains can enable transcriptional activation

(Salghetti et al. 2001). It is possible that one or more of these general processes operates

in transcriptional regulation by the DSX isoforms, but were missed in classical genetic

screens for impaired sexual differentiation due to the pleiotropic functions of the ubiquitination machinery.

We propose that the DSX UBA domain may contribute to the assembly of a specific preinitiation complex through non-covalent interactions with mono- or polyubiquitinated proteins. Such interacting protein(s) are presently unidentified. An intriguing possibility would be suggested by the possible ubiquitination of DSXF-associated transcriptional

coactivator Intersex (IX; (Garrett-Engele et al. 2002), the homolog of a mammalian

Mediator subunit (Sato et al. 2003). Sex-specific binding of IX to DSXF (and so presumably to CTDF) is required for female differentiation (Waterbury et al. 1999). The

avidity of such binding could be enhanced by dual recognition of IX and a tethered Ub

chain. In the future this and other hypotheses can be addressed by structure-based

molecular genetics. A critical test of whether DSXF functions in vivo as a Ub-binding protein, for example, could be provided by targeted mutations in its putative Ub-binding surface that do not impair dimerization or binding of unmodified IX. Should such mutations be obtained in vitro, we would predict that the variant dsx alleles would be associated with an intersexual phenotype. This class of mutations would be of broad interest as a model for a UBA-associated genetic diseases such as Paget's disease of the bone, commonly caused by surface mutations in a monomeric UBA domain affecting 96

UBA-Ub complex formation (Cavey et al. 2005). Deciphering the role of the transcription-associated ubiquitination machinery in developmental decisions represents an important future challenge.

Structural Information

Structural information has been deposited in the Protein DataBank (accession code

1ZV1).

Acknowledgements

We thank B. Baker for discussion, B. Li for plasmid construction and helpful discussion,

N. B. Phillips for helpful discussion, and the staff of NSLS beamline X9B and APS beamline 14BMC for assistance. 97

Table III-1

Data collection and refinement statistics native SeMet

wavelength (Å) 0.9000 0.9788

unit cell parameters (Å) a = 39.8 b = 46.6 c = 59.8 a = 39.4 b = 46.8 c = 58.6

resolution range (Å) 19-1.6 (1.66-1.60) 32.7-1.8 (1.86-1.80)

no. measured reflections 164648 145768

no. unique reflections 15159 (1467) 19414 (1905) redundancy 10.9 7.5 completeness (%) 99.1 (96.9) 100 (100)

Rmerge (%) 3.8 (0.182) 8.7 (0.456)

average I/σ(I) 62.3 (11) 24.0 (3.4) correlation coefficient 0.59 aValues in parentheses are for the highest resolution shell. 98

Table III-2

Model-building and refinement statistics. resolution range (Å) 10.86-1.60

number reflections (percent) 14294 (99.2)

number reflections (Rfree) 736

Rcryst (Rfree) % 20.8 (25.7)

DDQ Score DDQ-W 31.71a DDR-R 169.17b

RMSD bond lengths (Å) 0.013 bond angles (°) 1.4

Non-hydrogen atoms per asymmetric unit: protein 994 solvent 126

Ramachandran plot (%) most favored 94 additionally allowed 6 aAbove average. bTop 25% (van den Akker and Hol 1999). 99

Table III-3

Yeast Two-Hybrid Analysesa (collaborative effort from W. Zhang) variant SAb Y2H variant SA Y2H Wt --- + D383A 0.04 + Δ413-427 --- + R394A 0.12 - G398Dc 0.00 - I395A 0.00 - P370A 0.50 - E396A 0.59 +/- W371A 0.42 + E397A 0.26 +/- P375A 0.06 - G398A 0.00 ++ Y378A 0.06 - Y400A 0.68 + V379A 0.00 +/- V401A 0.09 + I380A 0.00 - V402A 0.05 +/- K382A 0.51 - N403A 0.79 + aSubstitutions were tested in the context of fragment 350-427; deletion of residues 413- 427 does not affect activity (line 2). bSA, solvent accessibility based on average of protomers. cControl mutation associated in vivo with intersexual development of XX flies and previously shown to block Y2H dimerization (Erdman et al. 1996). 100

Figure III-1

Sexual differentiation cascade in D. melanogaster and domain organization of DSX. (A)

Sex is determined by X:autosome ratio, leading to a RNA-splicing cascade. Not shown:

IX-DSXF interaction and fruitless branch of pathway. (B) Organization of DSX isoforms.

Shared region spans residues 1-397 containing DM domain (35-105; gray box) and

proximal portion of dimerization domain (350-397; black box). Sex-specific regions

comprise residues 398-427 (DSXF; red) and 398-549 (DSXM; green). CTDF interacts with coactivator IX. 101

Figure III-1 102

Figure III-2

Sequence alignment of DSX homologs in insect alleles. (A) Sex-non-specific region

(residues 350-397). (B) Proximal female-specific region (residues 398-412); DSXF ends at residue 427 (see Figure 1). (C) Corresponding male-specific segment 397-412; DSXM extends to residues 549. 103

Figure III-2 104

Figure III-3

Comparison of CTDF-p secondary structure with that predicted by Wensink and co- workers (An et al. 1996). (A) Clinders above the sequence represents actual secondary structure (α2A and α2B are shown as a single helix). Blue cylinders below sequence represent helix predictions. Although the predicted breaks in the helix are approximately correct, the CTDF-p forms a more complicated dimerization motif than the predicted coiled-coil of Wensink and co-workers, consistent with the sequence analysis performed by Burtis and co-workers. In (B), COILS output analysis of the CTDF-p showing coiled- coil propensity for a segment of the domain. 105

Figure III-3

A

B 106

Figure III-4

Structure of CTDF-p. (A) Representative electron density near Y378. 2Fo-Fc map contoured at 1σ. (B) Ribbon model (stereo pair; residues 351-409). One protomer is blue, the other teal. Key residues in dimer are shown. 107

Figure III-4 108

Figure III-5

Unusual structure and packing of bent helix α2. (A) Helix α2 contains 310 N-terminal portion (green), proline-associated kink (red), and α-helical C-terminal portion (blue).

Substitution P375A markedly impairs dimerization (see Table III-3, p. 99). One protomer is shown for clarity. (B) Parallel packing of α2 in interior of dimer. P375 is shown in red. α2-α2' interface contains residues Y369, P370, L373, P375, L376, V379,

I380, D383 and symmetry-related mates. Also shown is M377; α1 and α3 have been removed for clarity. 109

Figure III-5 110

Figure III-6

Alignment of CTDF-p and Classical UBA Domains. (A) Sequence alignment. CUE-Ub

contacts are shown in green; UBA-Ub contacts (inferred from NMR chemical-shift

perturbations; (Kang et al. 2003;Prag et al. 2003;Mueller et al. 2004) are shown in red.

Helical segments are depicted as cylinders; α2A (310 turn; green) and proline kink (black)

are specific to DSX. (B) Structural alignment of CTDF-p (blue), UBA domains (black), and CUE2 (green). (C) Model of CTDF-Ub complex. PDB codes for aligned domains:

1DV0, 1F4I, 1IFY, 1OAI, 1OQY, 1PGY, 1Q02, 1TR8, 1V92, 1VDL, 1VEG, 1VEK,

1VG5, 1WGN, 1WHC, 1WIV, 1WJI, 1GO5, and 1P3Q. 111

Figure III-6 112

Figure III-7

Stereo diagrams salt bridge interactions in CTDF-p. (A) Glu390, Arg393, and Arg394 form i+3 and i+4 intrahelical salt bridges. Asp383 and Asp383' are forced into close proximity by dimer packing. This unfavorable interaction is mitigated at least in part by a minor alternate conformation of Arg394, as well as solvent molecules (not shown). (B)

Interprotomeric salt bridge and hydrogen bond interactions occur between Tyr378,

Lys382, and Glu397'. Substitution of Lys382 for alanine results in undetectable dimerization by Y2H. Tyr378 resides opposite to Gly398', site of an intersex mutation. 113

Figure III-7 114

Figure III-8

Stereo view of protomeric UBA mini-core. The densely packed mini-core of the protomer consists of contributions from all three helices, with α2 providing the bulk of the interactions. Several residues important for folding, and hence dimerization, are located within the core, including Pro370, Pro375, Ile380, and Ile395 (See Table III-3, p.

99). The surface residue Trp371 appears to seal off one edge of the protomer; mutation to alanine, however, does not affect dimerization. Panels A&B are related by a 180° rotation about the vertical axis. The opposite protomer and portions of α1 and α3 have been removed for clarity. 115

Figure III-8

116

Figure III-9

Enlargement of hydrophobic core and key dimer interface residues. The "knobs-in- holes" complementarity is evident looking down the α2-α2' axis (parallel with z-axis).

R394 is shown in both conformations. Note the interprotomeric salt bridge between

K382 and E397' (see also Figure III-7B). One protomer is shown in blue, the other teal. 117

Figure III-9

118

Figure III-10

Dimerization interface and potential Ub-binding surface. (A) Space-filling model of dimer (stereo pair). Front and back protomers are shown in blue and gray (a and b).

Canonical Ub-binding surface of front protomer is shown in orange, and the hidden dimer interface of back protomer in green. (B) Dimer interface of back protomer on removal of front protomer. (C) Inside surface of front protomer (rotated 180o) indicates that Ub- binding surface (orange) does not extend within the interface. 119

Figure III-10

120

Figure III-11

Surface representation of CTDF-p dimer color-coded according to extent of sequence

conservation among insect dsx alleles (see Appendix III). (A) “Front view” corresponding to Fig. 2B in main text: front edge of dimer interface is centrally located with N-terminus of helix α2 facing viewer. (B) “Back view” related to panel A by 180o rotation about vertical axis. C-terminus of helix α2 faces viewer. (C) Surface of a single

protomer rotated 90° from panel A. (D) Reverse side of protomer showing high levels of

conservation within the dimer interface. Color code: cool colors (blue to green) represent

conserved surface whereas warm colors (yellow to red) represent non-conserved portions

of surface. 50% conservation among insect alleles is yellow. 121

Figure III-11

122

Figure III-12

Ribbon model depicting the environments of (A) L373 and (B) M377 and I395. (A)

L373 packs against L373’ at the edge of the dimer interface to seal the core. One edge of the side chain is exposed to solvent. (B) In mini-core of individual protomer the side

chain of I395 packs against L363, F367, and M377; they are shielded from solvent in part

by Q399. This portion of the protomeric mini-core (See Figure III-8) also contributes to

the dimer interface. One protomer is blue, the other teal.

123

Figure III-12

124

Figure III-13

Ribbon stereo comparison of CUE (A) and DSX CTDF-p (B) dimerization motifs. Note the much greater dimer interface in the CTDF-p than the CUE dimer. This difference is reflected in the Kd values of 1 mM and 0.01 nM, respectively (Prag et al. 2003). PDB code for the CUE domain is 1MN3.

125

Figure III-13

126

CHAPTER IV

SEX-SPECIFIC GENE REGULATION: INTERSEXUAL DROSOPHILA

DEVELOPMENT DUE TO MISFOLDING OF A NOVEL UBA DOMAIN

(Portions of this chapter from (Bayrer et al. 2005b)

Introduction

Sexual differentiation in Drosophila is regulated by the X:autosome ratio and a sex-

specific RNA-splicing pathway (Figure IV-1A, p. 139; (Cline and Meyer 1996). A

principal target is doublesex (dsx): expression of male- and female-specific transcription

factors (DSXM and DSXF) in turn directs most aspects of somatic sexual differentiation

(Burtis and Baker 1989). The DSX isoforms are encoded by mRNAs sharing the first

three exons; the C-terminal segment of DSXF is encoded by exon four whereas that of

DSXM is encoded by exons five and six. Male- and female isoforms are thus identical for

the first 397 residues but differ thereafter (Figure IV-1B, p. 139; (Burtis and Baker 1989).

DSXM and DSXF share two recognized domains, an N-terminal DNA-binding domain

(Erdman and Burtis 1993) and a C-terminal dimerization domain (An et al. 1996; Erdman et al. 1996). The DNA-binding domain (the DM motif; (Raymond et al. 1998) contains a

non-classical Zn module with a flexible tail (Zhu et al. 2000). Mutations in this domain

block DNA binding in association with intersexual phenotypes (Erdman and Burtis

1993;Burtis 1993). C-terminal dimerization enhances DNA binding (Cho and Wensink

1998) and is mediated by a novel UBA dimer (Figure IV-2A, p. 141; (Bayrer et al.

2005c). Mutations in the DSX DM domain block DNA binding in association with

intersexual phenotypes (Burtis 1993; Erdman and Burtis 1993; Narendra et al. 2002).

Broad conservation of the DM motif in metazoan proteins related to sexual differentiation 127

suggests that mechanisms of sexual dimorphism are in part universal (Raymond et al.

1998). The importance of the DSX dimerization domain is likewise indicated by

genetics: G398D in CTDF causes intersexual development of karyotypic females

(Erdman et al. 1996).

Here we investigate a mutation in the C-terminal domain of DSXF (CTDF; Figure IV-1B,

p. 139) causing intersexual development (Erdman et al. 1996). The mutation (G398D)

perturbs a well-packed dimerization interface (Figure IV-2B, p. 141; (Bayrer et al. 2005c)

and blocks dimerization in a yeast two-hybrid (Y2H) system (Erdman et al. 1996). This mutation is encoded in the female-specific exon four and thus only affects XX individuals (karyotypic females); the male-specific exon five is unchanged. Y2H analysis of diverse substitutions at position 398 suggests that only alanine (found in the male isoform) is tolerated. G398D is associated with misfolding and aggregation in vitro, impaired expression in Drosophila Schneider 2 (S2) cells, and attenuated DSX-dependent transcriptional regulation in a yeast one-hybrid (Y1H) system. Our results suggest a coupling between dimerization and folding of CTDF, define a new class of transcription- associated UBA domains, and suggest a structural mechanism for impaired sexual differentiation in Drosophila melanogaster. Yeast hybrid and insect cell work was a collaborative effort of W. Zhang. 128

Experimental Procedures

Protein Expression and Purification

Design of CTDF-p was based on Y2H studies (Erdman et al. 1996;Cho and Wensink

1996). Domains were expressed in E. coli as thrombin-cleavable fusion proteins with N-

F terminal His6-tag and purified as described (Bayrer et al. 2004). Briefly, CTD and

variants were expressed using the plasmid pMW127 in E. coli strain BL21(DE3)pLysS

(Invitrogen, Carlsbad, CA). Cells were grown at 37 °C or 20 °C (see text) with shaking

to a density of ca. 0.6 at OD600 at which point cultures were induced with 0.5 mM

isopropyl-β-D-thiogalactoside (IPTG) (Roche, Indianapolis, IA). Cells were allowed to

grow until ca. OD600 1.0 (typically 6-10 hours), after which cultures were centrifuged and

stored at -80 °C until purification by metal-affinity and anion exchange (CTDF) or gel filtration (CTDF-p and variants). Purified domains contain an N-terminal Gly-Ser

derived from thrombin cleavage of the fusion protein. 15N-Labeled proteins were

likewise purified following expression in M9 minimal medium containing 15N-

ammonium sulfate. Samples were routinely assayed by mass spectroscopy to verify protein identity and to guard against inadvertent mutation; experimentally determined mass values provided exact matches to predicted values. Protein yields were typically ca.

8 mg per liter of culture.

Site-directed mutagenesis

Mutations were introduced into the CTDF using the PCR-based two-stage overlap extension mutagenesis as described previously (Narendra et al. 2002). The insert was 129 sequenced in every case to confirm the desired mutation and exclude the acquisition of unwanted changes during PCR amplification and cloning

Size-Exclusion Chromatography

Analytical SEC was performed in 20 mM Tris-HCl (pH 7.4) and 200 mM NaCl at room temperature using Tricorn Superdex 75 10/300 GL column (Amersham Biosciences,

Piscataway, NJ); the flow rate was 0.75 ml/min. The column was calibrated with insulin

(5.8 kDa), myoglobin (17.6 kDa), trypsin inhibitor (20 kDa), ovalbumin (44 kDa), and thyroglobin (669 kDa, column void).

Cell Culture and Transfection

Drosophila Schneider 2 (S2) cells were maintained in Schneider's medium supplemented with 10% fetal bovine serum and 100 units/ml penicillin-streptomycin at 25 °C without

5 CO2. Transfections were performed at a density of 8 × 10 cells/ml as described by the vendor (Qiagen, Valencia, CA).

Y2H and Y1H Assays

CTDF homodimerization was probed using the MATCHMAKER Gal4 Y2H system (BD

Clontech, Palo Alto, CA). pGADT7- and pGBKT7-CTDF variants (residues 350-427) were co-transformed in pairs into strain AH109 by lithium acetate. Interactions were monitored by β-galactosidase activity (below). Specific DSXF-DNA binding was probed using a MATCHMAKER Y1H system in which lacZ is regulated by a 48-bp fragment of the DSX-responsive fat-body enhancer (fbe) containing binding sites dsxA and dsxB 130

(Burtis et al. 1991). Control experiments are described in Appendix VIII. In each case

expression levels of fusion proteins were verified by Western blot using anti-Gal4

antibodies (Upstate Group, Charlottesville, VA).

Enzyme Assays

The o-nitrophenyl-β-D-galactose (ONPG) assays were used as a quantitative assay of β- galactosidase activity. Results (given in Miller’s units) represent mean ± SD of triplicate experiments (Oh et al. 2002).

Optical and NMR Spectroscopy

CD wavelength and denaturation studies were performed as described (Hua and Weiss

2004). Spectral deconvolution employed PROSEC (Sreerama and Woody 1993).

Unfolding transitions were interpreted by a two-state model (folded dimer and unfolded monomer; (De Francesco et al. 1991). The assumption is supported by the close

F correspondence between the inferred stability of the CTD dimer (ΔGU) and an

independent estimate of the DSXF dimerization constant derived from analysis of

cooperative DNA binding (Cho and Wensink 1998). 1H-15N-HSQC spectra were

acquired at 600 and 700 MHz as described (Hua and Weiss 2004). Proteins were made

1.5 mM in 10 mM 2H-Tris-HCl (pH 5.6), 50 mM NaCl and 5 mM deuterated

dithiolthreitol.

131

Molecular Modeling

Rigid-body modeling of variants was performed using program O with the best

rotamer(s) selected for analysis. Visualization was accomplished with Pymol (DeLano

Scientific).

Results

CTDF (residues 350-427) and a truncated domain (CTDF-p; residues 350-412) were

expressed in E. coli as soluble fusion proteins. Following metal-ion affinity

chromatography and thrombin cleavage, final purification was achieved by anion

exchange (CTDF) or SEC under non-denaturing conditions (CTDF-p; (Bayrer et al. 2004).

These protocols each yield a single species of predicted molecular mass. Analytical SEC

demonstrated that CTDF is dimeric in the concentration range 1 μM – 1.5 mM without

shift in retention time Figure IV-3A, p. 143 and Figure IV-4, p. 145). Higher-order

oligomers were not observed (Figure IV-3B, p. 143).

Mutations Block Dimerization in Yeast Model

Mutations at Gly398 were tested in a CTDF Y2H system (Erdman et al. 1996). Gly398

and G398A are associated with robust expression of β-galactosidase whereas substitution

by 8 other amino acids (C, D, F, K, L, N, T, and V) impair expression by at least 10-fold

(Figure IV-3C, p. 143). The inactivity of G398D (relative β-galactosidase activity < 2%)

is consistent with prior studies (Erdman et al. 1996). Western blots established that such

decrements are not a result of impaired expression of bait or prey. Because of the linkage

between dimerization and specific DNA binding (Cho and Wensink 1998), a Y1H system 132

was constructed wherein the Gal4 activation domain (AD) is fused to the N-terminal of

DSXF variants. Specific DNA-binding by the DM domain in these constructs to target

dsxA and dsxB binding-sites activates transcription of lacZ reporter gene (Figure IV-5, p.

147; (Oh et al. 2002). Subsequent b-galactosidase activity is an indicator of relative

DNA-binding affinities (see Appendix VIII for control assays). This system was used to

probe effects of mutations on transcription activation and hence presumed DNA-binding

activity (Figure IV-3D, p. 143). The wild-type DSXF-AD fusion confers a robust signal, which is blocked by R91Q (an intersexual mutation in the DM domain that reduces specific DNA binding by at least 100-fold; (Erdman and Burtis 1993; Narendra et al.

2002). G398D yields an intermediate phenotype (two-fold lower than wild type) and so impairs but does not block specific DNA binding. Deletion of CTDF and intervening

sequences to form an AD-DM fusion results in similarly attenuated activity. The

similarity of Y1H perturbations on mutation or deletion of CTDF is consistent with a mutational block to dimerization.

Biochemical Characterization of G398D Variant

CTDF-p variant G398D was analogously expressed in E. coli. Whereas on growth at 37

°C the native fusion protein remains soluble after cell lysis, the variant largely formed

inclusion bodies. Efficiency of expression was nonetheless similar. Inclusion bodies

were dissolved in 6M guanidine in the presence of β-mercaptoethanol. Following slow

dialysis, the soluble portion was subjected to thrombin cleavage. During the cleavage

reaction, a fine white-colored aggregate formed. Expression of the variant at 20 °C

avoided inclusion body formation and enabled efficient purification by the wild-type 133

protocol. During thrombin cleavage, however, the variant CTDF-p again underwent

aberrant aggregation, leading to its elution predominantly in the SEC void. Analytical

SEC studies indicated that the variant migrates as a mixture of oligomeric species

(asterisk in Figure IV-3A, p. 143); a monomer peak is not detectable. The eluted peaks

are broad relative to that of the wild-type dimer, suggesting exchange among

conformations or oligomeric states. Analogous differences were observed on transient

expression of an epitope-tagged full-length DSXF in S2 cells (Figure IV-3E, p. 143).

Expression of the G398D variant, as probed by Western blot, was decreased by at least

ten-fold relative to wild type (Figure IV-3D, p. 143, lanes 3 and 4). By contrast,

expression of the R91Q variant was as robust as wild-type (lane 5). These results suggest

that G398D impairs protein stability in vivo and that such instability is unrelated to

decreased DNA binding.

CD spectra of CTDF and CTDF-p exhibit α-helical signatures consistent with the crystal

structure (spectra a and b in Figure IV-6A, p. 149); inferred helix contents are 34% and

44%, respectively. The higher percentage in CTDF-p reflects truncation of the tail. That

the tail is disordered is suggested by the shape of the difference spectrum (Δ in Figure IV-

6A, p. 149). A comparison of 1H-15N HSQC spectra from CTDF and CTDF-p

demonstrate that while the resonances arising from the ordered core domain are

essentially identical, the additional cross-peaks arising from the C-terminal tail (residues

413-427) largely exhibit near-random coil shifts (Figure IV-7, p. 151). Furthermore, motional narrowing of tail-specific NMR resonances is apparent in the heteronuclear 3D

HNCACB spectrum when compared with cross-peaks from the ordered core domain 134

(Figure IV-8, p. 153). The helical structure is maintained at temperatures 4-50 °C.

Guanidine denaturation studies yield a free energy of unfolding (ΔGu) of 14.1 kcal/mol at

4 ºC (transition b in Figure IV-6B, p. 149). Assuming that the monomer is unfolded (De

Francesco et al. 1991), this implies a dimerization constant of 0.01 nM in accord with

studies of the intact protein (Cho and Wensink 1998). By contrast, the CD signature of

the G398D CTDF-p (as an aggregate) exhibits a substantial reduction in helix content

(from 44% to 11%) and aberrant presence of β-sheet (61% as compared to 9% in the wild

type; spectrum d in Figure IV-6A, p. 149). Unlike G398D, an Ala substitution in CTDF-p

is associated with native expression, solubility, and dimerization (Figure IV-3B, p. 143).

CD and 1H-15N-HSQC spectra of the analogue are essentially identical to those of wild-

type (Figure IV-9, p. 155). Denaturation studies indicate enhanced stability consistent

with the substitution of alanine for glycine in a helix (transition c in Figure IV-6B, p. 149;

ΔGu 14.8 kcal/mol at 4 ºC). The increased ΔGu implies a 5-fold stronger dimerization.

Residue 398 in DSXM is Ala.

Discussion

The DSX CTDs have multiple biochemical roles: to enhance specific DNA-binding

affinity through dimerization (Cho and Wensink 1998), enable recruitment of

transcriptional co-regulatory proteins to sex-specific preinitiation complexes (Garrett-

Engele et al. 2002), and augment protein stability. Although sequences of eukaryotic

UBA domains vary widely (Buchberger 2002), the strict conservation of CTDF and

CTDM among insect homologs – extending from core to surface (Figure IV-10, p. 157

and Figure IV-11, p. 159) – is likely to reflect multiple interlocking functional 135

constraints. Its conserved dimerization interface defines a novel class of such domains.

Dimerization of UBA domains has previously been described biochemically (but not

structurally) in proteins regulating protein trafficking (Liu et al. 2003) and the cell cycle

(Bertolaet et al. 2001). The canonical UBA domain, however, is monomeric (Buchberger

2002). Since structures of other dimeric UBA domains have not been determined, it is

not known whether the mode of dimerization seen in the crystal structure of CTDF-p

(Bayrer et al. 2005c) is general or specific to this family. DSX provides the first example

of a UBA domain in a transcription factor as well as the first UBA dimer.

The intersexual phenotype of G398D XX flies is the consequence of blocked CTDF dimerization: impaired dimer-coupled specific DNA binding and protein misfolding. (i)

Dimerization. The DM domain binds cooperatively to DNA half-sites (Kd 8 nM; (Zhu et

al. 2000). Such binding is enhanced by strong CTD dimerization (in both isoforms) and

made more stringent by excluding half-site occupancy (monomer binding is not observed

for intact DSX; (Cho and Wensink 1998). The monomeric DM domain binds target

DNA sites as a dimer due to weak DNA-dependent half-site cooperativity (abundance of

2:1 protein:DNA complex in the presence of excess free DNA; (Zhu et al. 2000;

Narendra et al. 2002). The DNA-binding affinity of DSXF is 35-fold greater than that of

the DM domain (Cho and Wensink 1998), evidence of the dimeric contribution to DNA-

binding. The Y1H assay is thus not a linear read-out of relative affinities, but may reflect

relative site occupancies. Assuming similar non-specific DNA binding, the specific

affinity of the intact protein (Kd 0.2 nM; (Cho and Wensink 1997) is thus more consistent

with thermodynamic requirements of DNA recognition in the nucleus. Our results 136

demonstrate that G398D impairs dimerization by at least 106-fold. (ii) Protein Stability.

Whereas CTDF exhibits marked stability, G398D causes aberrant aggregation in vitro and

decreased expression in transfected S2 cells. We speculate that in vivo the mutation

would accelerate the degradation of DSXF and thereby further impair DSXF-regulated gene expression.

Gly398 packs at a critical helix-helix junction at the dimer interface. Modeling suggests that the Asp side chain would project into the hydrophobic core, both introducing an uncompensated charge and colliding with the other protomer. Modeling of the male- specific Ala side chain by contrast suggests efficient packing in a small gap (Figure IV-

2C, p. 141). Although such modeling does not consider possible structural reorganization, the otherwise dense packing near position Gly398 predicts that side chains larger than Ala would not be tolerated. This view is supported by Y2H studies: dimerization is impaired by diverse non-polar, polar, charged, and aromatic side chains.

It is not known whether such substitutions would also (like G398D) lead to aberrant aggregation or merely impair dimer stability. Conservation of residue 398 as Gly in

CTDF (Figure IV-10A, p. 157) and Ala in CTDM (Figure IV-10B, p. 157) suggests that

other residues are incompatible with the genetic function of dsx. An exception to this conservation is given by the recently released sequence for the male isoform in

Anopheles gambiae, which employs a glycine at 398 in the female isoform but an aspartic acid in the male. While the female tail is highly conserved, the male Anopheles is not.

This lack of conservation may indicate a non-homologous structure or function, or may encode an otherwise non-functional protein. It is not know whether Anopheles produces 137

alternate dsxm transcripts. It would be of future interest to test by random cassette mutagenesis (Lim and Sauer 1989) whether alternative packing arrangements in the dimer interface may permit other substitutions at 398. Analysis of allowed core sequences may enable identification of DSX-related UBA domains not apparent from sequence homology.

Classical UBA domains are stably folded as monomers. In a hypothetical CTDF-p protomer the G398D mutation would appear to be well tolerated. In such a structure the mutant side chain would project into solvent rather than into the hydrophobic core of the dimer (Figure IV-12, p. 161). That such a variant monomer seems not to exist – the mutant polypeptide instead misfolds and aggregates – suggests an obligatory coupling between folding and dimerization, likely due to exposure of the otherwise hydrophobic dimer core. Such coupling, broadly observed among leucine zippers and other small α-

helical dimerization elements (Weiss et al. 1990;Baxevanis and Vinson 1993), would

rationalize the similarity between estimates of the dimerization constant based on

guanidine unfolding studies of CTDF and those inferred from analysis of DNA-binding

isotherms in full-length DSXF and DSXM (Cho and Wensink 1998). Dimer-dependent

folding of CTDF could also provide a mechanism to restrict the female-specific

recruitment of a transcriptional activator (Intersex; (Waterbury et al. 1999; Garrett-

Engele et al. 2002) to functional DSXF dimers or DSXF-DNA complexes.

Mutations in DSX associated with intersexual Drosophila development provide a model

for genetic diseases due to mutations in a UBA domain. Similarly, Paget's disease of 138 bone is commonly caused by mutations in the human SQSTM1 gene, which encodes a monomeric UBA domain (Cavey et al. 2005). Such clinical mutations allow stable folding but impair binding of mono- and polyubiquitin in vitro; potential ubiquitinated binding partners are unknown. We imagine that ubiquitination can regulate the assembly of multi-protein complexes (Muratani et al. 2005), such as those engaged in transcriptional programs of sexual dimorphism. Deciphering such programs represents an important future challenge at the intersection of biochemistry and developmental biology.

Acknowledgements

We thank B. Baker for advice and reagents, B. Li for plasmid constructions and helpful discussion, and Y. Yang for assistance with NMR studies. This is a contribution from the Cleveland Center for Structural Biology. 139

Figure IV-1

Sexual differentiation cascade in D. melanogaster and domain organization of DSX. (A)

Sex is determined by X:autosome ratio, leading to an RNA-splicing cascade. Not shown:

IX-DSXF interaction and fruitless branch of pathway. (B) Organization of DSX isoforms.

Shared region spans residues 1-397 containing DM domain (35-105; gray box) and

proximal portion of dimerization domain (350-397; black box). Sex-specific regions

comprise residues 398-427 (DSXF; red) and 398-549 (DSXM; green). CTDF interacts with coactivator IX. 140

Figure IV-1

141

Figure IV-2

Structure of CTDF-p. (A) Ribbon model of wild-type CTDF-p dimer (stereo pair; Protein

Databank identifier 1ZV1); boxed regions enlarged below. (B) G398 packs in dimer

interface. G398D would place acidic side chain in core and clash with Y378 and V379;

also shown is P375. (C) Model of G398A suggests -CβH3 group (CPK model) fits in small gap. 142

Figure IV-2

143

Figure IV-3

Biochemical and Cell-Based Studies of CTDF. (A and B) SEC studies. (A) Dimer- specific elution of CTDF-p (residues 350-412) and G398D variant (*). (B) Comparison of

wild-type CTDF-p (z; residues 350-412) and G398A variant (S). Relative to standards

(‹), elution times correspond to mass 14-15 kDa (15.5 kDa predicted). (C) Y2H analysis

of CTDF dimerization. G398D blocks lacZ expression in accord with previous studies

(Erdman et al. 1996). Of other substitutions tested, only G398A permits dimerization; β-

galactosidase activity is two-fold higher than wild type. (D) Y1H analysis of DSXF- regulated gene expression. Activity is blocked by R91Q, which impairs specific DNA binding (Erdman and Burtis 1993). Mutation G398D attenuates activity to levels similar to that of the isolated DM domain. (E) Western blot of V5-tagged DSXF in S2 extracts

following transient transfection. Lane 1: control lysate from non-transfected cells; lanes

2-5: respective lysates following transfection with DSXM, DSXF, and DSXF variants

G398D and R91Q. G398D but not R91Q impairs expression.

(C, D, E) Collaborative effort by W. Zhang. 144

Figure IV-3 145

Figure IV-4

Serial Dilution of CTDF-p. Serial dilution of CTDF-p does not affect retention time at lowest detectable concentration. Column conditions were as above.

146

Figure IV-4

147

Figure IV-5

Schematic of yeast one-hybrid system. DSX variants (including the lone DM domain) are fused to the Gal4 activation domain. Binding of sites derived from the fbe activates lacZ expression. 148

Figure IV-5

149

Figure IV-6

CD Studies. (A and B) CD studies. (A) Far-UV spectra of intact CTDF (a, „), core

domain (b, solid line), and difference spectrum (Δ, dashed line, residues 413-427). Also

shown are spectra of variant core domains 398A (c, {) and G398D (d, ‹). (B) CD-

detected GuHCl titrations: G398A variant (c, {; ΔGu 14.8 kcal/mol, Cmid 2.83 ± 0.05 M,

and m 2.7 ± 0.05 kcal/mol/M) is more stable than native core domain (b, z; ΔGu 14.1

F kcal/mol, Cmid 2.02 ± 0.1 M, and m 3.5 ± 0.17 kcal/mol/M). Unfolding of intact CTD (a,

„) and G398D variant (d, ‹) do not fit two-state model. Smooth lines indicate two-state

model fit. Dashed lines are for visualization only. 150

Figure IV-6

151

Figure IV-7

Comparison of of 1H-15N HSQC spectra of CTDF-p (A, residues 350-412) and CTDF (B, residues 350-427). Whereas resonances from the ordered UBA domain are essentially identical, the additional cross peaks arising from residues 413-427 in CTDF exhibit near random-coil chemical shifts. Truncation of the tail does not perturb folding or dimerization of the UBA domain, consistent with pioneering Y2H studies by the Burtis and Wensink laboratories (Erdman et al. 1996;An et al. 1996). 152

Figure IV-7

153

Figure IV-8

The distal portion of the female-specific tail of DSXF is flexible in solution. NMR

resonances assigned to distal female-specific residues 408-427 exhibit motional

narrowing in spectrum CTDF (residues 350-427). Compared here are strip plots of

HNCACB cross peaks from (A) ordered residues I376-A384 and (B) disordered resides

I413-T422. The distal tail also exhibits near random-coil chemical shifts and a paucity of

inter-residue nuclear Overhauser enhancements (NOEs).

Strip plot and assignments are a collaborative effort by Y. Yang. 154

Figure IV-8

155

Figure IV-9

NMR Studies of native CTDF-p and G398A substitution. 1H-15N HSQC fingerprint

spectra of wild-type CTDF-p (black) and 398A variant (red) are similar. Spectra are superimposed to clarify similarities and differences. G398 and A398 are labeled. 156

Figure IV-9

157

Figure IV-10

Sequence Conservation. Female (A) and male (B) CTD sequences. (A) Female sequences

are aligned to DSXF residues 385-412 in D. melanogaster. Gray and red cylinders

indicate non-sex-specific and female-specific helices; conservation of G398 is highlighted (red box). Female annotation of D. buzzatii and Apis mellifera is presumptive. (B) Male sequences are aligned to DSXM residues 385-416. Sequence

analysis predicts helix in CTDM (green and dashed boxes); C-terminal portion is not

present in CTDF. Most male sequences contain A398 (red box); D398 is observed in A.

gambiae (see text). 158

Figure IV-10

159

Figure IV-11

Surface representation of CTDF-p dimer color-coded according to extent of sequence

conservation among insect dsx alleles (see Appendix III). (A) “Front view” corresponding to Fig. 2B in main text: front edge of dimer interface is centrally located with C-terminus of helix α2 facing viewer. (B) “Back view” related to panel A by 180o rotation about vertical axis. N-terminus of helix α2 faces viewer. Color code: cool

colors (blue to green) represent conserved surface whereas warm colors (yellow to red)

represent non-conserved portions of surface. 50% conservation among insect alleles is

yellow.

160

Figure IV-11

161

Figure IV-12

Environment of (A) G398 and (B) G398D mutant side chain in hypothetical isolated

CTDF-p protomer. In each panel a ribbon model is shown at left and space-filling model

at right (stereo pair). G398 and D398 are highlighted in red. (C) GRASP surface of wild-type protomer showing electrostatic potential near G398. (D) Corresponding

GRASP representation of putative G398D protomer. Additional red patch is due to mutation. 162

Figure IV-12

163

CHAPTER V

HYDROGEN EXCHANGE REVEALS A STABILE DIMERIC CORE OF

DOUBLESEX CTDF

Introduction

The Doublesex (DSX) transcription factor is a major regulator of somatic sexual

differentiation in Drosophila melanogaster (Cline and Meyer 1996). Male and female

isoforms are produced as a consequence of sex-specific RNA splicing (Burtis and Baker

1989). Disruption of DSX function results in intersexual development of Drosophila.

The male and female isoforms of DSX share an N-terminal DNA-binding region termed the DM motif, named for the two transcription factors in which is was first identified

(DSX and the C. elegans MAB-3) (Raymond et al. 1998). This domain has been recognized bioinformatically in nearly 150 different proteins, and is highly conserved among metazoans. The DM-containing genes have been implicated in human sex reversal (9p Syndrome) and male-specific development in mouse models (DMRT1;

(Raymond et al. 2000).

The C-terminal domain of DSX (CTD) mediates dimerization and is presumably responsible for the sex-specific functionality of the isoforms. Unlike the broad conservation of the DM domain, C-terminal conservation is more limited. The function of DSX as a genetic switch between male and female development, however, highlights the biological importance of this domain. Position G398, residing in the dimer interface, 164

is the site of an intersex mutation (G398D) (Bayrer et al. 2005b;Erdman et al. 1996).

Disruption of the dimer interface by G398D results in misfolding, aggregation, and decreased expression in cell culture, suggesting proper dimer formation is required for function in vivo. Dimerization is thermodynamically linked to DNA-binding, increasing binding affinity 35-fold (Cho and Wensink 1998). The sex-specific binding of the DSXF co-activator Intersex (IX) suggests CTDF may function in protein-protein interactions.

We have previously characterized the native C-terminal dimerization domain (CTDF-p,

residues 350-412) and two variant domains (G398D and G398A) by CD-detected

guanidine hydrochloride titration (Bayrer et al. 2005b). Here we extend these studies

through hydrogen exchange, CD-detected guanidine hydrochloride titration in D2O, and thermal denaturation. We investigate the native domain and an alanine substitution

(G398A) that fills a potential minor packing defect found within the native dimer interface. The variant results in increased dimer stability without major alterations in . Hydrogen/Deuterium exchange rates were monitored by 2D 1H-15N

HSQC. We show the long-lived resonances belong predominantly to residues found within the hydrophobic dimer interface. ΔGAPP values are lower than ΔGU (H2O) values by ca. 1.2 kcal/mol, but consistent with ΔGU (D2O), indicating this difference is due most

likely to solvent isotope effects. ΔGAPP values for the native domain were 12.9 kcal/mol at 30 °C, implying a dimerization constant of 0.5 nM, compared to ΔGu 12.6 kcal/mol in

D2O at 30 °C; our values for the dimerization domain are consistent with the published

value for the intact protein (Cho and Wensink 1998) and our prior analysis at 4 °C

(Bayrer et al. 2005b). The alanine substitution ΔGAPP of 13.3 kcal/mol. implies a 2-fold 165

stronger dimerization relative to the native dimer. The ΔΔGAPP is smaller than the previous ΔΔGU noted for native and 398A substitution, but within experimental error.

166

Materials and Methods

Protein expression and purification.

Native and 398A domains were expressed in BL21plysS E. coli (Invitrogen, Carlsbad,

California, USA) with the vector pMW127 as thrombin-cleavable fusion proteins

containing Staphylococcal Nuclease as the fusion partner. Protein expression and

purification was essentially as described (Bayrer et al. 2004). Briefly, cells were grown

at 37 °C to a density of ca. 0.6 OD600 and induced with 0.5 mM isopropyl-β-D-

thiogalactoside (IPTG; Roche, Indianapolis, Indiana, USA) for rich media preparations

and 1.0 mM IPTG for minimal media preparations. Cells were allowed to grown to a

density of ca. 1.0 OD600 and harvested by centrifugation at 3500 g for 15 minutes at 4 °C.

Cells were suspended in extraction buffer (20 mM tris (pH 7.0) 20 mM imidazole and

500 mM NaCl) and lysed by three freeze-thaw cycles followed by three sonication cycles

on ice. Lysate was centrifuged at 17,400 g for 45 minutes at 4 °C, the supernatant

reserved, and the pellet resuspended in extraction buffer, sonicated, and centrifuged

again. The supernatants were combined and incubated with Co2+ affinity resin (BD

Biosciences, Palo Alto, California, USA) and washed with extraction buffer. Fusion

protein was eluted (20 mM tris (pH 7.0) 300 mM imidazole and 500 mM NaCl) and

dialyzed into cleavage buffer (20 mM tris (pH 7.9) 5 mM imidazole and 500 mM NaCl.

Thrombin cleavage was accomplished with ca. 1 U thrombin (Sigma, St. Louis, Missouri,

USA) per 2 mg protein. Cleavage product was applied to Co2+ resin, the flow-through contained the protein of interest. Final purification was achieved through gel filtration chromatography (Superdex 75; Amersham Pharmacia, Uppsala, Sweden). Labeled domains were prepared as above, except minimal media (M9 salts) containing 15N- 167

ammonium sulfate as the sole nitrogen source were used in place of rich media.

Purification was as for unlabeled material.

Circular Dichroism Spectroscopy.

CD spectra, temperature scans, and guanidine denaturation studies were performed as

described using an Aviv model 202 spectrophotometer equipped with an automated

titration and temperature control units (Hua and Weiss 2004). Titrations were performed

at 30 °C in 10 mM tris (pH 7.0; pD 7.0 as indicated in figure caption) 50 mM NaCl and at

4 °C and 25 °C in 10 mM tris (pH 7.6) 0.1 mM NaCl. Spectra were deconvoluted using

PROSEC (Sreerama and Woody 1993). Unfolding transitions were interpreted assuming

a two-state equilibrium between folded dimer and unfolded monomers (De Francesco et

al. 1991).

Differential Scanning Calorimetry.

DSC was performed with a Microcal VP-DSC calorimeter. Samples were 40, 80, and

100 μM CTDF-p prepared in 10 mM KPi (pH 7.4) 50 mM KCl. Samples were

extensively dialyzed (ca. 3-4 days) against buffer to allow samples to come to

equilibrium. Melting curves were examined with the Origin software package.

NMR Spectroscopy.

1H 1D and 1H-15N-HSQC spectra were acquired at 700 MHz as described (Hua and

Weiss 2004). The proteins were made 0.2-1.5 mM concentration in 10 mM 2H-Tris-HCl

(pD 7-8), 50 mM NaCl and 5 mM deuterated dithiolthreitol (DTT). Sample pH was 168

adjusted in water by direct meter reading according to the relationship pD = pH+0.4 and

lyophilized until use in exchange experiments. Prior to each exchange experiment the

NMR was shimmed with a similar sample after equilibration to 30 °C.

Lyophilized exchange samples were reconstituted in 300 μl 99.96% D2O (Aldrich) and

immediately transferred to a shigemi NMR tube. The sample was equilibrated at 30 °C

and shims were adjusted. Acquisition of the first data point was typically started seven

minutes after the addition of D2O. Exchange was monitored for up to four weeks,

depending on rate of exchange, with increasingly delayed time points. Lack of chemical

shift perturbations over the time course of the experiment demonstrate structural stability.

For 2D HSQC experiments, spectra were processed using NMRPipe and analyzed by nmrView. Bruker XWINNMR was used to process 1D data and to calculate peak intensities. Detailed assignment for the 15N and 1H resonances will be described

elsewhere.

Hydrogen exchange data analysis.

Intensities from the hydrogen exchange experiments versus time were fitted to the

equation:

I = Aexp(-kext)+D (1)

where I represents the intensity at a given time point, A is the amplitude of the decay

curve, kex is the observed decay rate, t is the time (seconds), and D is a constant. Data were fitted using Origin 7.5. 169

The hydrogen exchange process can be written as

kop krc (N-H O=C)cl (N-H)op N-D k cl (2)

where kop and kcl are the rate constants for structure opening and close, respectively.

Exchange occurs only from the open state at an intrinsic rate (krc) and is irreversible due

to the overwhelming solvent D2O. krc is calculated for each residue based on model

peptides using the program Sphere (available online at http://www.fccc.edu/research/labs/roder/sphere/). Exchange can take place under one of two different conditions, termed EX1 and EX2. Under EX1 conditions, the exchange process is monomolecular where kcl << krc. Under these conditions, kex = kop. EX2 conditions, bimolecular exchange, kcl >> krc and the protein exists predominately in the

native (folded) state. EX2 conditions are highly pH dependent, with a 10-fold difference

in exchange rate expected for each unit pH change. Our experiments indicate the domain

exchanges via EX2 (see Results and Discussion). The observed exchanged rate, kex, can

be written:

kopkrc kopkrc kex = = = Kopkr k +k +k k op cl r cl (3)

where Kop is the equilibrium constant for structural opening. Under EX2 conditions, the

free energy of structural opening can be given by:

ΔGHX = - RT ln (kex/krc) (4)

where R is the gas constant and T the temperature in Kelvin. For globally exchanging

residues this value corresponds to the apparent free energy of the system. For oligomers 170

at equilibrium, however, the residual free energy of the system must be taken into

account ((Neira and Mateu 2001;Backmann et al. 1998). For the two state unfolding

system,

F 2U 2 (5)

here, F2 is the folded dimer which transitions directly to two unfolded monomers (U).

This yields the dissociation equation,

2 Kd = [U] /[F2] (6)

At equilibrium where the fraction of unfolded monomers is 50 percent, Keq can be

written:

Keq = Ct (7)

where Ct is the protein concentration expressed in monomer equivalents (see Appendix

IX for derivation). Thus, the free energy in the dimeric system at equilibrium can be

expressed as

ΔGresidual = - RT ln (Ct) (8)

This contribution to the free energy in the system must be applied to the ΔGHX determined experimentally,

ΔGAPP = - RT ln (Kop) - RT ln (Ct) (9)

where ΔGAPP is the apparent ΔG for the dimeric system. Assuming the two-state model, the dimerization constant (Kdimer) can be determined from global residue ΔG values

according to

Kdimer = exp(-ΔGAPP/RT). (10)

171

Molecular Modeling. The wild-type crystal structure was used as a template for rigid-body

modeling of G398A using the program O. Visualization was accomplished with Pymol (DeLano

Scientific).

Results and Discussion

CTDF-p undergoes irreversible thermal denaturation

In order to extend our previous thermodynamic studies and independently confirm the

results of the GuHCl titrations, we examined the thermal stability of the CTDF-p by CD and DSC.

CD analysis of 25, 40, and 80 μM samples monitored at θ222 from 4 °C to 90 °C showed

a standard melting transition at 74 °C (Figure V-1, p. 180). Upon the reverse scan,

however, the existing refolding transition was not observed. Furthermore, repeated

heating of the sample followed the reverse scan, indicating the CTDF-p does not refold

after thermal denaturation. Likewise, DSC analysis of the domain demonstrated a wide

melting curve upon thermal denaturation centered around 70 °C. Subsequent cooling failed to show a refolding transition, and further heating and cooling cycles were indistinguishable from baseline. In both cases, irreversible denaturation was associated with the formation of small, visible aggregates. Reasoning that we may cross a threshold at which denaturation is irreversible after domain unfolding, we repeated the thermal unfolding studies by CD. Heating to the beginning of the transition (50-55 °C) and subsequent cooling had a minimal effect on the secondary structure of the domain as assayed by λ scan at 4 °C. Heating to the midpoint of the transition, however, once again 172

showed an irreversible process -- CD cooling trace was non-superimposable over the

heating trace and changes in the λ scan were apparent.

The irreversible nature of the thermal denaturation prevents the use of equilibrium-

dependant analysis such as those necessary for the calculation of ΔG and the dimerization

constant. Consequently we have sought to verify and further characterize by hydrogen-

exchange the thermodynamics of the CTDF-p system.

CTDF-p exchanges by the EX2 mechanism

For a typical protein in its native state, core labile hydrogens are protected from solvent

exchange by hydrogen bonds and/or a hydrophobic environment. Rates are influenced by

the identity of neighboring residues, denaturant concentration (if used), temperature, and

pH (concentration of catalytic OH- or H+). Occasionally, either local fluctuations in

structure or protein unfolding expose these hydrogens to solvent (the open state, (N-H)op in (eq2)), at which point they can exchange by one of two mechanisms (EX1 or EX2).

The mechanism of hydrogen exchange dictates the analysis of the data and the subsequent utility of the results. The monomolecular EX1 mechanism results when kcl

<< krc, which simplifies the rate equation to kex = kop. This implies the exchange occurs

from the open state before the protein has an opportunity to refold (close), and thus is generally independent of pH (lowering of the pH dramatically can however switch the

mechanism of exchange to EX2). Under EX1 conditions, most all hydrogens can be

expected to exchange after a single unfolding event (Ferraro et al. 2004). As such, EX1 173

is of limited utility in this study. The EX2 mechanism of exchange (kcl >> krc) is

- + dependent on catalyst concentration (OH or H ), and simplifies to kex = (kop/kcl)krc (eq. 3 above). Under this exchange regimen, the apparent free energy of opening ((kop/kc), Kop) can be used to calculate the ΔGHX for the system.

Two experimental measures indicate the CTDF-p exchanges via the EX2 mechanism.

First, exchange rates vary considerably between individual resonances. As discussed

above, we would expect all resonances to exchange at a similar rate under EX1

conditions. Secondly, we measured the exchange rate of several residues in the CTDF-p

at pD 7.0 and pD 8.0 by 1D 1H methods. For all residues examined, the expected ten-

fold change per pD unit was observed (Huyghues-Despointes et al. 2001).

Exchange results indicate the native CTDF-p contains a stabile dimeric core

F 1 15 An analysis of protection factors (krc/kex) in the native CTD -p derived from H- N-

HSQC experiments shows three distinct clusters of protected resonances, the

overwhelming majority of which reside within helices α2 and α3 (Figure V-2A,C, p.

182). All residues with the exception of Phe367 reside in helices, and all are part of the

optimal hydrogen binding network (supplement). Furthermore, a significant percentage

of these resonances are localized to the dimer interface (Leu376, Tyr378, Val379, Ile380,

Lys382, Arg394, Ile395, Glu397, and Gly398).

The first cluster of slowly exchanging amides (Group I) reside at the C-terminal end of

α1 (Leu364 and Glu365) and within the first loop (Phe367) (Figure V-3, p. 184). Leu364 174

projects into the mini-core of the protomer, near Met377. Phe367 in part seals the apex

of the α1-α2 region. This cap appears to be further stabilized by the presence of two

prolines, one at the terminal of the connecting loop, the other within α2. As the N- terminal-most residues are disordered in solution and the helix contains very minor dimer contacts, it is not surprising that the major portion of this helix is subjected to fast- exchange.

The second cluster of slow-exchanging amides (Group II) reside in α2, the heart of dimer

(Figure V-4A,B, p. 186). Incredibly, seven out of the ten residues composing segment

α2B are slowly-exchanging (Leu376, Met377, Tyr378, Val379, Ile380, Leu381, and

Lys382; the first residue of this helical segment is Pro375, which does not contribute to

the 1H-15N-HSQC). As mentioned previously, Met377 projects into the mini-core near

Leu364. The protection factors of Group I are considerably lower than that of Met377,

suggesting the dimeric core persists longer than the mini-core of the protomer. Leu381,

the other slow-exchanging residue in Group II not in the dimer interface, packs between

α1 and α3 and is completely buried in the native structure. This residue however has the

second lowest protection factor of Group II, suggesting its inclusion in this stretch may

be due to the strong α2-α2' interface locking the helix in place.

Group III is localized to α3, spanning much of the N-terminal where it interacts with α2 and α2' (Figure V-4A, p. 186). The clustering of Group III predominantly within the N- terminal of the helix reflects the disorder of the helical C-terminus, analogous to Group I in α1, and the role of this segment in dimerization, providing intra- and intermolecular 175

contacts within the hydrophobic dimer core. A surprisingly long-lived resonance is

found C-terminal to the α2'-α3 interface at Asn403. Although not itself a dimer contact,

it does neighbor Val401 and Val402, both dimer interface residues showing a small

degree of protection. The Asn403 amide is hydrogen bonded to Glu396, a member of

Group III.

Hydrogen Exchange of the 398A substitution shows a similar protection pattern

The 398A substitution shows increased protection factors relative to the native domain

consistent with its greater stability (Bayrer et al. 2005b). The overall distribution of

slow-exchanging amides generally assumes the same grouping as the native domain.

Interestingly, the 398A variant also shows Asn403 to be a slow-exchanging residue.

Arg394 has a decreased protection factor relative to other residues in the 398A domain,

other exchange patterns are similar in the native and 398A domains.

A summary of protection factors for native and 398A substitution CTDF-p can be found

in Tables V-I and V-II.

Exchange rates give a measure of protein stability consistent with previous studies

Native data were examined according to the equations stated above (see Materials and

Methods). An analysis of the five lowest ΔGAPP values (Met377, Tyr378, Ile380,

Arg394, and Asn403) yield an average of 12.9 ± 0.2 kcal/mol, implying a dimerization constant of 0.5 ± 0.2 nM. Three of these residues are localized to the dimer interface

(Tyr378, Ile380, and Arg394), suggesting that exchange occurs upon dimer unfolding. 176

F Previous studies note a ΔGu for the native CTD -p of 14.1 kcal/mol (obtained at 4 °C versus 30 °C), a difference of 1.2 kcal/mol. This value is within the range attributable to

solvent isotope effects (1-2 kcal/mol; (Makhatadze et al. 1995). The 398A substitution

yields an average ΔGAPP of 13.3 ± 0.1, reflecting the greater stability of this mutant. A

difference of 0.6 kcal/mol was noted previously for these domains in H2O at 4 °C, which

is within the margin of error of these results. The five largest ΔGAPP values are

contributed by Met377, Tyr378, Val379, I380, and Leu381. Interestingly, while the

F native CTD -p includes two residues from α3, the 398A substitution lowest ΔGAPP is

derived exclusively from contiguous α2 residues, of which three are involved in the

dimer interface.

To further investigate the potential role of solvent isotope effects (correlated to hydration

of hydrophobic core residues upon protein unfolding), we performed CD-detected

guanidine titrations in D2O at 4 and 30 °C (Figure V-5, p. 188). At 4 °C, the ΔGu in D2O was 12.8 kcal/mol, a difference of 1.3 kcal/mol relative to ΔGu in H2O, indicating a

significant solvent isotope contribution. Additional titration analysis performed at 30 °C

yields a ΔGu of 12.6 ± 1 kcal/mol, consistent with that obtained by Hx above.

The close correspondence of the hydrogen exchange derived ΔG values and that of the

ΔGu in D2O, and importantly the location of the long-lived resonance in the dimer core,

support a two-state unfolding model where exchange is limited to unfolded states (either

locally or globally), rather than through a native or native-like state observed for the p53

tetramer (Neira and Mateu 2001). Moreover, exchange under these conditions does not 177 appear to be influenced by proline isomerization events (expected to contribute up to 0.6 kcal/mol to the system), typically used to account for discrepancies between exchange data under native conditions and denaturant-induced unfolding experiments.

ACKNOWLEDGEMENTS

We thank Y. Yang for assistance with experimental set-up of hydrogen exchange experiments. 178

Table V-1

Hydrogen exchange parameters for native CTDF-p H-bonding -5 -1 5 -6 Residue acceptor kex x 10 s P x 10 Keq x 10 ΔGAPP kcal/mol Kd (nM) Leu364 Cys360 0.65 ± 0.1 1.1 ± 0.2 9.2 ± 1 12.2 ± 0.1 1.6 ± 0.2 Glu365 Gln361 10 ± 2 0.083 ± 0.02 120 ± 24 10.7 ± 0.1 20 ± 4 Phe367 Leu363 4 ± 2 0.83 ± 0.4 12 ± 6 12.0 ± 0.3 2.1 ± 1 Leu376 Leu373 0.87 ± 0.2 0.75 ± 0.2 13 ± 3 12.0 ± 0.2 2.3 ± 0.6 Met377 Leu373 0.74 ± 0.2 3.5 ± 0.9 2.8 ± 0.7 12.9 ± 0.1 0.48 ± 0.1 Tyr378 Met374 1 ± 0.3 3.0 ± 1 3.3 ± 1 12.8 ± 0.2 0.56 ± 0.2 Val379 P375 0.48 ± 0.1 2.0 ± 0.6 4.9 ± 1 12.6 ± 0.2 0.84 ± 0.2 Ile380 Leu376 0.16 ± 0.06 3.6 ± 1 2.8 ± 1 12.9 ± 0.2 0.47 ± 0.2 Leu381 Met377 0.83 ± 0.1 0.81 ± 0.1 12 ± 2 12.0 ± 0.1 2.1 ± 0.4 Lys382 Tyr378 2.0 ± 0.2 1.2 ± 0.1 8.2 ± 0.7 12.3 ± 0.05 1.4 ± 0.1 Glu390 Asn387 2.0 ± 0.2 0.48 ± 0.05 21 ± 2 11.7 ± 0.06 3.5 ± 0.04 Ser392 Ile388 5.0 ± 2 2.0 ± 0.8 4.9 ± 2 12.6 ± 0.2 0.83 ± 0.3 Arg393 Glu389 5.0 ± 1 2.1 ± 0.4 4.8 ± 1 12.6 ± 0.1 0.82 ± 0.2 Arg394 Glu390 2.0 ± 1 4.3 ± 2 2.3 ± 1 13.0 ± 0.3 0.39 ± 0.2 Ile395 Ala391 2.0 ± 0.5 0.67 ± 0.2 15 ± 4 11.9 ± 0.1 2.5 ± 0.6 Glu396 Ser392 2.0 ± 0.6 0.40 ± 0.1 25 ± 7 11.6 ± 0.2 4.3 ± 1 Glu397 Arg393 4.0 ± 0.4 0.24 ± 0.3 42 ± 5 11.3 ± 0.07 7.1 ± 1 Gly398 Arg394 8.0 ± 3 0.72 ± 0.3 14 ± 5 12.0 ± 0.2 2.4 ± 1 Val402 Gly398 15 ± 1 0.042 ± 0.003 238 ± 16 10.3 ± 0.04 40 ± 3 Asn403 Gln399 4.0 ± 0.8 2.4 ± 0.5 4.1 ± 0.8 12.7 ± 0.1 0.70 ± 0.1

179

Table V-2

Hydrogen exchange parameters for G398A CTDF-p H-bonding -5 -1 5 -6 Residue acceptor kex x 10 s P x 10 Keq x 10 ΔGAPP kcal/mol Kd (nM) Leu364 Cys360 0.33 ± 0.01 2.1 ± 0.07 4.7 ± 0.2 11.5 ± 0.02 4.7 ± 0.2 Glu365 Gln361 7.0 ± 0.2 0.12 ± 0.004 84 ± 3 9.8 ± 0.02 84 ± 3 Phe367 Leu363 7.0 ± 0.2 0.47 ± 0.01 21 ± 0.6 10.6 ± 0.02 21 ± 0.6 Leu373 Pro370 20 ± 0.8 0.041 ± 0.002 246 ± 9 9.2 ± 0.02 246 ± 9 Leu376 Leu373 0.30 ± 0.02 2.2 ± 0.2 4.5 ± 0.3 11.6 ± 0.04 4.5 ± 0.3 Met377 Leu373 0.047 ± 0.006 56 ± 7 0.18 ± 0.02 13.5 ± 0.07 0.18 ± 0.02 Tyr378 Met374 0.058 ± 0.005 52 ± 5 0.19 ± 0.02 13.5 ± 0.06 0.19 ± 0.02 Val379 Pro375 0.029 ± 0.004 33 ± 5 0.30 ± 0.05 13.2 ± 0.09 0.30 ± 0.05 Ile380 Leu376 0.015 ± 0.003 39 ± 9 0.26 ± 0.06 13.3 ± 0.1 0.26 ± 0.06 Leu381 Met377 0.034 ± 0.005 20 ± 3 0.51 ± 0.08 12.9 ± 0.1 0.51 ± 0.08 Lys382 Lys362 0.35 ± 0.02 7.0 ± 0.4 1.4 ± 0.08 12.3 ± 0.03 1.4 ± 0.08 Glu390 Asn387 1.0 ± 0.03 0.96 ± 0.03 10 ± 0.3 11.1 ± 0.02 10 ± 0.3 Ser392 Ile388 6.0 ± 0.07 1.7 ± 0.02 5.9 ± 0.07 11.4 ± 0.007 5.9 ± 0.07 Arg393 Glu389 2.0 ± 0.04 5.2 ± 0.1 1.9 ± 0.04 12.1 ± 0.01 1.9 ± 0.03 Arg394 Glu390 1.0 ± 0.04 8.7 ± 0.3 1.2 ± 0.04 12.4 ± 0.02 1.2 ± 0.04 Ile395 Ala391 0.11 ± 0.009 13 ± 1 0.79 ± 0.06 12.6 ± 0.05 0.79 ± 0.06 Glu396 Ser392 0.10 ± 0.008 7.7 ± 0.6 1.3 ± 0.1 12.3 ± 0.05 1.3 ± 0.1 Glu397 Arg393 0.45 ± 0.03 2.1 ± 0.2 4.7 ± 0.3 11.6 ± 0.04 4.7 ± 0.3 Ala398 Arg394 0.26 ± 0.009 12 ± 0.4 0.83 ± 0.03 12.6 ± 0.02 0.83 ± 0.03 Gln399 Ile395 8.0 ± 0.07 0.62 ± 0.006 16 ± 0.1 10.8 ± 0.006 16 ± 0.1 Val402 Ala398 15 ± 0.7 0.042 ± 0.002 238 ± 11 9.2 ± 0.03 238 ± 11 Asn403 Gln399 3 ± 0.07 3.2 ± 0.08 3.1 ± 0.08 11.8 ± 0.02 3.1 ± 0.08

180

Figure V-1

Thermal denaturation of native CTDF-p monitored by CD. CTDF-p undergoes a melting transition at 74 °C (transition curve). Cooling of the sample does not reveal a refolding transition, indicating the thermal unfolding is irreversible. 181

Figure V-1

-7

-9

3 -11

] x 10 -13 222 θ [ -15

-17

-19 4 1424344454647484 Temperature (°C)

182

Figure V-2

Protection factors for slowly exchanging amides for (A) native and (B) 398A substitution. Error bars indicate fit to standard exponential decay model. Residues cluster into groups I (purple), II (red), or III (magenta). (C) Stereo ribbon diagram showing clustering of amides. Gln399 is included, although protection was only seen with 398A. 183

Figure V-2

184

Figure V-3

Stereo diagram of Group I residues. Group 1 is the fastest exchanging group, and resides in the protomer mini-core. This group is the smallest cluster, and lacks dimer contacts. 185

Figure V-3

186

Figure V-4

Group II and III residues. (A) Stereo ribbon of the interaction between the two groups of slowly exchanging amides across the dimer interface. Group III residues are clustered near the N-terminus of α3 where this region intersects with α2'. Asn403 is separated by five residues from the cluster, but is a slowly exchanging amide in both native and 398A

CTDF-p. Hydrogen-bonding by the side chain and amide may contribute to enhanced

local stability versus the surrounding helical regions. (B) Stereo ribbon of isolated α2-

α2' showing group II residues residing in the core of the dimer. 187

Figure V-4

188

Figure V-5

F CD-detected guanidine hydrochloride titration of native CTD -p in D2O at 30 °C.

Titration in D2O reveals significant isotope effects relative to prior studies in H2O.

Fitting parameters are ΔGu 12.6 ± 0.1 kcal/mol Cmid 2.7 ± 0.06 and m 1.8 ± 0.04. 189

Figure V-5

190

CHAPTER VI

SUMMARY AND FUTURE DIRECTIONS

The objective of this chapter is to review and summarize the major results of this work

and their implications for future study. The findings from this study enable new paths of

discovery in diverse fields of developmental biology, transcriptional regulation, and

protein-protein interaction. We discuss the possible use of structure-informed design of

novel alleles for gene targeting that will allow for directed examination of specific

hypotheses regarding the role of DSX in development (see Cooperativity, Behavior, and

Ubiquitin subsection below). Such future aims are made possible by the structure

reported in this dissertation. Work towards the elucidation of the DSXF-IX complex is

given, along with exciting mutational data suggesting the location of the IX binding

surface. Structural homology with the UBA domain is discussed, and intriguing

preliminary results suggesting a role for Ubiquitin in sex-determination are presented.

Summary of work

Chapter II describes the expression, purification, and preliminary X-ray and NMR characterization of the DSX dimerization domain. Unlike other small, autonomous

domains the DSX CTDF does not properly refold after denaturation by reverse phase

HPLC, forming instead an "inside-out" non-native dimer characterized by an aberrant

disulphide bond. Refolding of the domain was not successful, however an alternate

purification protocol involving gel filtration and anion exchange was development. This

latter protocol enabled crystallographic studies of CTDF. Although crystals of the full 191

CTDF were not obtained, crystallization of the CTDF-p occurred readily in buffers

containing ammonium sulfate and isopropanol. Likewise, the 1H NMR spectrum of

CTDF demonstrates poorer resolution as compared with CTDF-p, due presumably to overlapping tail resonances and increased aggregation. The C-terminal tail resonances fall in the near-random coil region of the 1H-15N HSQC, suggesting the majority of the sex-specific female tail is disordered. This result was reinforced by CD λ scans shown in

Chapter IV, and by collaborative studies demonstrating motional narrowing and a paucity

of NOEs. Thus we conclude the CTDF consists of an ordered core (CTDF-p) extended by

a disordered tail.

The crystal structure of the domain, solved by SAD phasing and presented in Chapter III, demonstrates a helical dimerization motif consistent with prior CD analysis and

secondary structure predictions (An et al. 1996;Erdman et al. 1996;Bayrer et al. 2005c).

The CTDF-p surprisingly assumes a UBA fold to mediate dimerization. This head-to-tail

arrangement of helices is unique among known helical dimerization motifs. The

canonical Ub-binding surface is opposite to the extensive hydrophobic dimer interface,

suggesting a potential role for Ub in sex determination (discussed below).

The importance of dimerization for the function DSX is provided by the variant G398D;

the mutation lies within the female-specific region and results in intersexual development

in vivo and abolished dimerization in vitro (Nothiger et al. 1987;Erdman et al. 1996). In

Chapter IV we examine the effects of G398D on protein folding and stability in the

context of CTDF-p. We find this mutation perturbs the well-packed dimerization 192

interface, leading to protein misfolding and aggregation in vitro. In collaborative studies,

we show this mutation is associated with decreased expression in Drosophila S2 cells and reduces DSX DNA-binding in a Y1H system. Molecular modeling further suggests that only glycine and alanine (found in the male isoform) are permitted at this site, suggesting an “Achilles' heel” within the dimerization domain. An analysis of diverse polar, nonpolar, aromatic, and charged residues by Y2H confirms the validity of our structural analysis. Modeling of the Asp398 variant suggests that the substitution should be well tolerated by a DSX monomer. That such a monomer does not exist suggests folding of the protomer is predicated upon dimer formation; i.e. monomer folding and dimerization are linked in a two-state manner.

The thermodynamics of the DSX dimerization domain were examined by multiple techniques, including the use of chemical and thermal denaturation as well as under native state conditions by hydrogen exchange (discussed in Chapters IV&V). Size- exclusion chromatography performed with serial dilutions of CTDF-p demonstrated the

domain migrates as a dimer at concentrations within the UV/Vis detection range. While

thermal denaturation studies demonstrated that the domain undergoes irreversible thermal

denaturation, guanidine hydrochloride denaturation was reversible, allowing a ΔGu of

14.1 and 14.8 kcal/mol to be calculated for the native and 398A substitution CTDF-p,

respectively. The implied dimerization constant of 0.01 nM for the native domain is

consistent with that derived by Wensink and co-workers for the intact proteins,

suggesting our model successfully recapitulates the dimerization found in the full protein.

Furthermore, our CD titration results are independently validated through hydrogen 193 exchange analysis occurring under native conditions. Our HX studies allow the study of individual residues undergoing both local and global exchange. The most stable residues belong predominantly to the dimer core, indicating coupled folding and dimerization. 194

Future Directions

The future directions for the characterization of the Doublesex proteins encompass

diverse fields of study, but can be divided generally into three categories: (i) future

structural studies including analysis of interesting structural features of CTDF-p, (ii) structural and biochemical characterization of protein complexes involving DSX, and (iii) biological roles and significance of DSX and DSX-containing complexes. These directions are discussed below, with a special emphasis on the potential role of ubiquitin in DSX regulation.

Structural features of CTDF-p

Role of Pro370 and Pro375 in dimer formation

Our Y2H studies indicate crucial roles for Pro370 and Pro375 in dimer formation. As

discussed previously, Pro375 induces a kink in α2, with the N-terminal portion forming a

310 helix. This N-terminal region is also set-off from the first turn by a proline, Pro370.

Interestingly, the UBA2 domain from Rad23 also employs a proline at this position (the

loop region where Pro333 resides in UBA2, however, does not superimpose over Pro370 when helices α1 and α2 are aligned). While mutation of Pro333 in UBA2 to glutamic acid has minimal effects on domain structure, this mutation does impair binding to the

HIV-1 Vpr protein (Withers-Ward et al. 2000). Pro370 in the DSX CTDF may likewise function as part of a protein-protein binding surface (discussed below). It would, however, be of further interest to clarify the structural role of these prolines in the context

of dimer formation. It is possible that the restraints imposed by the proline ring serve to

"lock in" the parallel arraignment of helices, allowing Leu373 and Leu373' to seal off the 195

dimer core. Pro375 may help induce the 310 helix, which may in turn be required for

proper helix-helix orientation in the dimer. Although dimerization of P370A and P375A

substitutions is impaired, it is possible that such variants may be expressed and purified

recombinantly in our E. coli system, forming low-to-moderate affinity dimers. Structural

studies (either NMR or crystallographic) of these substitutions will address the ability of

the N-terminal segment of α2 to form the 310 helix, demonstrating the structural significance of this structure. Furthermore, hydrogen exchange studies examining local fluctuations in such mutants may prove to be especially enlightening. Such studies would

address the role of helical rigidity and packing in dimer formation, particularly given the

location of Pro375 in the heart of the dimer core (see Chapter V).

Role of Asp383-Asp383’ and R394 in dimer formation

Asp383 and its dimer-related mate are positioned unusually close in the dimer, residing at

the C-terminal of α2. Although several examples of Asp-Asp interactions exist in the

literature (Singh. J and Thornton 1992), it is intriguing to find such a potentially negative

interaction in a system with subnanomolar dimerization. One possible explanation is

charge distribution – the negative side chains are involved in intra- and interprotomeric

salt bridges with Arg394 and Arg394’. Interestingly, mutation of Arg394 to alanine, but

not lysine, results in a loss of Y2H activity at 30 °C, suggesting a key role for positive

charge (R394A at 20 °C restores activity). The solution structure further suggests an

alternate side chain conformation for this residue, whereas the crystal structure employs a

well-ordered water molecule; this region of the protein is highly solvated. A further

possibility that may be addressed by 13C-HSQC analysis is protonation of the carboxylic 196 side chain (Kawase et al. 2000). Although such protonation would disrupt the R394 salt bridge, the neutralization of the negative side chain interaction may prove more energetically favorable towards dimer formation. Mutational analysis may also provide valuable insight: Bombyx mori and Apis mellifera both employ tyrosine at this position, while Megaselia scalaris and Anopheles gambiae utilize serine. Furthermore, M. scalaris contains a leucine at position 394, suggesting the positive charge is dispensable in the absence of Asp383. Structural studies employing substitutions at one or both of these positions may provide valuable information about this highly charge region.

Position 398, the DSX “Achilles’ Heel”

Position 398 was shown to reside at a critical dimer interface, interacting with Tyr378’ and Val379’. Extensive Y2H studies demonstrated that only glycine and alanine are tolerated at 398, most likely due to steric restrictions imposed by the well-packed hydrophobic interface. While nearly all DSX homologs share conservation of the hydrophobic core and 398, alternate residue configurations may also allow dimer formation. Identification of allowed substitutions can be made through PCR-based mutagenesis where amino acid identity is randomized at multiple points within the dimer core and screened by our Y2H system. Such allowed substitutions would enable a greater diversity in searches for DSX homologs, perhaps expanding the scope of known

CTD homologs outside of arthropods.

197

DSX-containing complexes: Structure and function

Role of cooperativity in gene regulation

Intact DSX was previously shown to bind adjacent DNA target sites cooperatively, with the male isoform showing greater cooperative binding than the female (Cho and Wensink

1998). Sex-specific cooperativity implies the C-terminal of DSX plays at least some role in mediating this interaction. Whether cooperative binding is required for proper gene regulation, as in λ repressor (Johnson et al. 1981), or an artifact of the system employed by Cho and Wensink is unknown. We are limited by a lack of known DSX regulatory targets and rigorous biochemical characterization of DSX-DNA complexes. Nonetheless, an analysis of the promoter site from the yp genes (inclusive of the fbe, see Chapter I) suggests a role for cooperative gene regulation by DSX. Three DSX binding sites (dsxA, dsxB, and dsxC) are separated by near-integral turns of DNA (Figure VI-1, top, p. 212).

Mutational analysis of individual and paired binding site substitutions demonstrated some impact on DSX recognition of the unchanged site(s) (Coschigano and Wensink 1993).

Additional circumstantial evidence for cooperative gene regulation is the existence of multiple potential DSX binding sites contained within the protomer regions of putative

DSX targets (Figure VI-1, p. 212).

The most direct method of observing cooperative interfaces would be a crystal structure of intact DSX complexed with looped DNA containing adjacent binding sites, complemented by electron microscopy (EM) studies demonstrating looped DNA structures (Friedman et al. 1995;Griffith et al. 1986). Work towards expression and purification of full-length DSX is on going (see Appendix V). 198

Although the CTDF does not form tetramers, crystal contacts employed by the CTDF-p

may be part of a tetramer interface that is extended by more N-terminal sequences. Three

unique crystal contacts are observed (each dimer displays six contacts due to symmetry).

The N-terminal helix residues 354-362 crosses its symmetry mate in an antiparallel

fashion (Figure VI-2A, p. 214). Hydrophobic interactions are provided by Tyr359 and

Val355 from each dimer. Charged interactions between Lys362 and Asp354' and

Asp358' bridge the dimers, while an intrachain bridge with Asp358 further stabilizes the

region. A second crystal contact is mediated by the first loop in one dimer and the

second loop in the other (Figure VI-2B, p. 214). This contact may make use of the highly

conserved Trp371, noted previously to be solvent exposed, interacting with Asn387'.

Pro370 and Leu364 contribute a potential hydrophobic bridge to Ala386'. The third

dimer-dimer contact is mediated by α3 from each dimer arranged antiparallel, similar to

the first contact discussed (Figure VI-2C, p. 214). This contact is highly charged, with

Arg393 aligned across from Glu396'; the interaction is mirrored by the symmetric pair.

The exposed surface of α3 contains additional negative charges provided by Glu389,

Glu390, Glu397, and Glu404. This charge interface is bridged by several waters

clustered between Glu397 and Glu404.

These crystal contacts provide an opportunity for structure-based mutagenesis.

Mutations which impair specific interactions, such as the salt bridge between Arg393 and

Glu396' can be expressed in full length DSX and assayed by GMSA for effects on cooperative DNA binding. Additionally, such salt bridge interactions can be restored by 199

a compensatory mutation, allowing for rescue of cooperativity. Such mutations, once

verified via in vitro assays to allow proper protein folding, dimerization, and co-factor

interaction, can be incorporated into Drosophila by the gene targeting methods of Golic

and co-workers (Gong and Golic 2003). This methodology has two advantages over

standard P element-mediated transgenic studies: (i) the target mutation is performed in its native chromosomal location and as such is subjected to normal expression by its native promoter and, importantly, not incorporated into a random genome location where insertion could interfere with other Drosophila genes; (ii) the mutated alleles are present in normal copy numbers avoiding potentially confounding issues of over- or under- expression. Using this technique, custom alleles can directly address the importance of cooperative DNA-binding in vivo. The extreme limits of the role of cooperativity are (i) no effect, the individual is phenotypically wild type or (ii) cooperativity is required for all

DSX function and mutant flies are typical intersexes. We anticipate a mixed phenotype where cooperativity is required for some, but not all, aspects of DSX function. For example, it is possible that DSX utilizes cooperativity to affect local DNA structure to recruit a transcriptional activation complex, but uses single binding-site occupation to perform "road-block" inhibition of other genes. Furthermore, the sex-specific strength of cooperativity implies that the male and female isoforms may use cooperative assembly in differential regulation of target genes. It is possible that cooperativity-deficient mutants are already available, but due to lack of information regarding the tetramer interface, target gene regulation, and detailed sequence information of dsx alleles, they remain unknown.

200

Structure-based design of temperature sensitive mutations

Our understanding of the role of DSX throughout development and the adult life of

Drosophila is in part limited by a lack of conditional mutants such as that provided by temperature sensitive (ts) alleles. Proteins can be rendered temperature sensitive through cavity forming mutations in the hydrophobic core as well as hydrogen bond and salt bridge disruption (Wertman et al. 1992;Ohya and Botstein 1994). Ts-mutants were

sought in the CTDF in order to avoid disturbing the DNA-binding of the intact protein as

a collaborative effort by W. Zhang. The alanine mutants described in Chapter IV were

examined for dimerization at both 20 °C and 30 °C. One of these mutants, R394A,

showed wild type dimerization at the permissive temperature, but markedly decreased

dimerization at the elevated temperature. Incorporation of this mutant in the Y1H system

(described in Chapter IV and Appendix VIII) demonstrates a decrease in DNA-binding at

30 °C, suggesting that ts mutations in the CTDF may prove useful for the study of DSXF function in vivo.

Additional ts substitutions may be discovered through the use of PCR mutagenesis. In this system, CTDF cDNA is subjected to random mutagenesis and subsequently inserted

into the Y2H system, where heterodimerization against native CTDF is assayed at

restrictive and permissive temperatures through the use of replica-plating. Colonies

showing Y2H activity at the permissive, but not restrictive, temperature are sequenced;

the isolated substitution(s) are then incorporated back into the Y2H system for

examination of homodimerization at permissive and non-permissive temperatures. Such

variant domains can be expressed in our SN-fusion system for structural studies, 201

confirming proper domain folding. It is possible that a combination of mutations may be

necessary to achieve temperature sensitivity; the Y2H system is well-suited for these

screens.

Candidate ts mutations can be inserted into the Drosophila genome as described above

for cooperativity deficient mutants. Successful ts alleles will enable the study of DSX

function at all stages of life in Drosophila. Previous studies are ambiguous with regard to

the plasticity of the female Drosophila behavior, suggesting the sex-specific behavior is

either (i) plastic throughout development (Belote and Baker 1987) or (ii) irreversibly

programmed at an early point in development (Arthur et al. 1998). Such studies were

complicated by the use of ts alleles acting upstream of DSX in the sex determining

hierarchy. Ts-DSX could resolve this conflict, as well open new avenues of studies in

Drosophila behavior and development.

Intersex, a sex-specific co-activator of DSXF

Intersex (IX) was recently cloned by the Baker lab and shown to be a 188 residue protein

that interacts sex-specifically with DSXF (Garrett-Engele et al. 2002). This sex-specific

interaction is in accordance with prior genetic studies demonstrating ix is required for

female development but dispensable for male development (Waterbury et al. 1999).

Furthermore, the ix null XX individual is a phenocopy of the dsx null XX (Baker and

Ridge 1980;Waterbury et al. 1999). More recently, a human homolog of IX was discovered as a component of Mediator complex, suggesting a role as transcriptional 202

activator for DSXF, wherein IX may act as a bridge between DSXF and Mediator (Sato et

al. 2003).

The organization of a sex-specific transcription complex containing DSXF and IX would be of interest to structural and developmental biologists alike. With the future goal of a

DNA-DSXF-IX co-crystal in mind, we have endeavored to produce recombinantly

expressed IX in E. coli and yeast systems to varying degrees of success (see Appendix

VI).

A method for determining the IX binding site on DSXF not involving traditional

bottlenecks in crystallography (crystallography, phasing, and refinement) employs 1H-15N

HSQC NMR footprinting. In HSQC footprinting assays, excess unlabeled protein is

titrated into a sample of an isotopically labeled interacting partner. The perturbations

caused by the interaction of the two proteins on the HSQC spectrum of the labeled

protein is observed, the unlabeled protein is silent in this assay. Consequently, protein-

protein interactions and binding site information on the target protein can be determined

without spectroscopic interference of the unlabeled protein. In these studies, isotopically

labeled CTDF is mixed with unlabeled IX. This experiment does not require intact IX

(although intact IX can be used), but rather the minimal interacting of IX. The recent installation of cryoprobe technology on our spectrometers allow the use of relatively dilute (ca. 25 μM) sample concentrations of labeled protein, greatly reducing the amount

of material previously required for such studies. Typical footprinting studies require

three-times the amount of unlabeled protein relative to labeled. 203

Structural studies of IX may be hampered by low protein stability. If the Drosophila

melanogaster IX proves intractable, the recent analysis of IX orthologs by Siegal and

Baker may provide a better target for structural biology (Siegal and Baker 2005). In their

study, Siegal and Baker identify several broadly conserved ix genes. Transgenic rescue

studies, where selected orthologs are incorporated into ix null D. melanogaster animals,

demonstrate D. virilis and Megaselia scalaris ix are able to fully rescue the wild-type

phenotype. Bombyx mori partially rescues, whereas Mus musculus not only fails to rescue, but also alters the phenotype of hetero- or homozygous ix- female D.

melanogaster. We therefore expect D. virilis and M. scalaris IX will successfully

interact with D. melanogaster DSXF.

Preliminary results involving mutagenesis of DSXF demonstrate the presence of a

putative IX-binding groove on the surface of CTDF-p (see Appendix VII). Intriguingly,

these results also show the majority of the sex-specific tail is dispensable for IX binding.

The groove runs along the surface of the protomer starting in the sex-non-specific

sequence and continuing across the dimer interface into the sex-specific region of the

other protomer. The 1H-15N HSQC of this region is well-resolved, suggesting footprinting assays with an IX fragment may be used to confirm our mutational studies.

Because the DSXF-IX interaction is weak in our Y2H system, standard screening

techniques have not been successful in determining a discrete interacting element in IX.

However, work by Siegal and Baker suggested functional conservation of a region within 204

the middle of IX previously noted to share homology with human and mouse ESTs of

unknown function. While the N-terminal has low complexity and is generally conserved

in amino acid type and quantity (rather than sequence), this central domain shows marked

conservation. The region is predicted by the PsiPred protein structure prediction program

to be helical (http://bioinf.cs.ucl.ac.uk/psipred/); the conservation of cysteine and

histidine residues is suggestive of a zinc-responsive element. We synthesized peptides

corresponding to the native sequence (residues 98-134) as well as a peptide spanning

residues 100-134 and containing several conserved substations to enhance peptide

stability (C109L, Q125S, and C-terminal Lys-Glu-Lys). CD λ analysis by N. Phillips

demonstrates the peptides are structured; furthermore, the peptides undergo a

conformational change upon addition of zinc. Future structural studies are in place to

examine whether this conserved region of IX is involved in DSXF recognition, including

HSQC footprinting. Additional interacting regions from IX can be sought via overlapping IX peptide fragments (Shuker et al. 1996).

Role of Ubiquitin in sex determination

The role of Ub in protein degradation has been well characterized. Typically, a combination of enzymes (E1 activating, E2 conjugating, E3 ligation) target Ub to a given protein. Once a single Ub moiety is attached, several additional Ub proteins may be

covalently linked in a poly-Ub chain; when poly ubiquitination occurs through Lys29 or

Lys48 in chains of four or more, the target protein is scheduled for degradation by the

26S proteasome. Not all ubiquitination is destructive. Monoubiquitination is associated

with transcriptional activation and elongation, signal transduction, and endocytosis (Rape 205

et al. 2001;Kaiser et al. 2000). For DNA-repair, polyubiquitination of Lys63-linked chains leads to the destruction of stalled pol II transcription complexes (Salghetti et al.

2001).

Understanding of the role of Ub in transcriptional regulation has undergone several fundamental changes in recent years. Traditionally, proteolytic functions of Ub have been thought to tightly control transcription through "suicide" regulation where each cycle of transcription is coupled to destruction of the active transcription factor (Salghetti et al. 2001). Ub can control transcriptional activators by factor localization (Spt23; (Rape et al. 2001), co-factor interaction (Met4; (Kaiser et al. 2000), and constitutive turnover

(β-catenin; (Barker et al. 2000). A novel role for Ub in transcriptional activation has been recently reported (Muratani et al. 2005). An analysis of multiple transcriptional activators demonstrated the presence of degrons (proteolytic signaling elements that recruit Ub ligases; (Muratani et al. 2005;Salghetti et al. 2001) co-localized with transcriptional activation domains (TADs). Ubiquitination of TADs by specific F-Box

Ub-ligases allow for TAD turnover, which is coupled to transcriptional activation -- Ub is required to remove the TAD necessary for initiation, but which otherwise hinders elongation and proper mRNA processing. Non-destructive ubiquitination has also been shown to be necessary and sufficient for transcriptional activation (Salghetti et al. 2001).

Not all Ub-association is covalent. The ubiquitin-interacting motif (UIM) consists of a short stretch (ca. 20) of residues first described in a subunit of the 26S proteasome

(Young et al. 1998). The Ubiquitin-associated (UBA) protein motif was first identified 206

bioinformatically in proteins involved in the ubiquitination-degradation pathway

(Hofmann and Bucher 1996). It has since been identified by sequence analysis in nearly

500 eukaryotic proteins (SMART database as of 7/05). Cellular roles of UBA-containing

proteins are diverse, taking part in nucleotide excision repair, nascent mRNA nuclear

export, as well as cell cycle checkpoint control (Withers-Ward et al. 2000;Suyama et al.

2000;Clarke et al. 2001). UBA domains have been shown to associate with both mono-

and poly-ubiquitin. Binding of Ub by Rad23 has been shown to suppress Ub chain elongation in vitro, perhaps extending the lifetime of the assembled nucleotide excision and repair machinery (Ortolan et al. 2000). UBA domains have been shown to associate with proteins other than Ub, such as the HIV Vpr protein and (in the case of TAP), the nuclear pore complex. Interestingly, UBA domains have also been implicated in both homo and heterodimerization for Rad-23, Ddi1, and c-Cbl. Of the nineteen structures currently in the PDB databank, none are known to employ the UBA fold to mediate dimerization. The overwhelming majority of UBA structures have been solved by NMR.

It is possible that dimer formation was not appreciated at the time of structure solution.

The CUE domain (coupling of Ub conjugation to endoplasmic reticulum degradation) is a

Ub interacting motif structurally similar to the UBA fold. The CUE domain of Vps9p

forms a domain-swapped dimer structurally dissimilar to the DSX CTDF, and with a

considerably weaker dimerization constant of ca. 1 mM (Prag et al. 2003). The

"unswapped" monomer fold employs the stereotypic compact three helix UBA fold.

Both forms of CUE are capable of binding Ub; the dimer utilizes the monomer surface and additional contacts within helix 2. The solution structure of a CUE domain from

CUE2 also employs the UBA fold and binds Ub similarly to the monomer of Vps9p 207

(Kang et al. 2003). These monomer footprints substantially overlap the Ub-binding

footprints of the UBA domains from Rad23 (Mueller et al. 2004). A structure-informed

alignment of the CTDF and CUE2 reveals conservation of potential salt bridges thought

to be important for Ub binding (Asp18 and Asp40 of CUE2, Glu365 and Glu389 of

CTDF). Alignment of the CTDF with CUE2 in complex with Ub demonstrates the potential Ub interface is distinct from the dimer interface (see Chapter III). Due to dimer symmetry, it is possible for the CTDF to bind two distinct Ub proteins, or perhaps multiple Ub units in poly-Ub chains.

Biochemical studies of a putative DSX-Ub complex

To determine if the CTDF-p can function as a UBA domain, we have performed gel shift,

fluorescence quenching, and NMR footprinting experiments. 100 μM CTDF-p was

incubated with 200-300 μM Ub (Boston Biochem Inc., Cambridge, MA) for 30 minutes

and then applied to an analytical gel filtration column. Separate CTDF-p and Ub peaks

were detected, but no complex was noted. The predicted Ub binding site falls near

Trp371 in the CTDF-p. As Ub lacks tryptophan, we sought to examine possible binding

through quenching of tryptophan fluorescence by both mono-Ub and K48-linked tri-Ub

(Boston Biochem Inc., Cambridge, MA). As can be seen in Figure VI-3A&B, p. 216,

quenching was not observed. Trp371 is solvent exposed in the native dimer; fluorescence

is not increased upon chemical denaturation (Figure VI-3C, p. 216) and so is unlikely to

show enhanced fluorescence upon perturbation by protein-protein interactions. As

Trp371 is at the edge of the predicted Ub binding site, these negative results do not rule 208

out Ub binding by CTDF-p. Consequently, we pursued HSQC footprinting to

characterize the potential interaction.

We monitored the 1H-15N and 1H-13C HSQC of the CTDF-p at 0.5-1 mM sample

concentrations, titrating against unlabeled Ub to a final concentration of 3 mM. Under

these conditions, modest shifts were noted in the 1H-15N HSQC spectrum for Arg368,

Iso380, Lys382, Ser392, and Gln408 (Figure VI-4A, red cross peaks, p. 218). Of these

residues, Arg368, Lys382, and Ser392 align with the canonical Ub binding surface. The

1H-13C HSQC spectrum did not reveal similar shifts, indicating that mono-Ub binding to

the CTDF-p may be weak or non-existent. Interestingly, a reverse footprinting

experiment, where 15N-labeled Ub (VLI Research, Inc., Malvern, PA) is mixed with

unlabeled CTDF-p, demonstrated modest shifts in the Ub spectrum (Figure VI-5, p. 220).

Whether these shifts represent true binding perturbations or experimental artifacts is not clear (discussed below).

Some UBA domains have been noted to bind mono-Ub poorly or not at all, yet have rather high affinity for poly-Ub (Wilkinson et al. 2001;Davies et al. 2004). A recent structural basis for di-Ub binding by Rad23 was recently reported (Figure VI-6, p. 222;

Varadan et al. 2005). It is unlikely that the CTDF-p is capable of this manner of binding

as it would compromise the dimer interface. As mentioned above, the CTDF-p does present two potential binding surfaces due to its symmetric nature. To examine whether the CTDF-p can bind poly-Ub, we performed a 1H-15N HSQC titration of 15N-labeled

CTDF-p and unlabeled tri-Ub. Taking advantage of the recently installed cryoprobe on 209

the Bruker 700 MHz spectrometer, we mixed 100 μM 15N-CTDF-p and 100 μM tri-Ub.

As CTDF-p concentration is measured in monomer equivalents, the actual molecular ratio

is one dimer per two tri-Ub chains. Modest shifts of same residues found with mono-Ub

were observed (Figure VI-4A, green cross peaks, p. 218). Although the 1H-15N HSQC perturbations are still modest, the inclusion of these canonical residues in the footprint suggests that the CTDF-p may indeed bind poly-Ub. It is possible that binding may be

optimized by tetra-Ub or Ub chains linked through Lys29. Furthermore, four-fold

dilution of the CTDF-p::tri-Ub sample results in attenuation of the shift changes (Figure

VI-4B, p. 218). Highlighting the difficulty of predicting Ub-binding by sequence alone is

work by Lipkowitz and co-workers studying the Cbl family of proteins ((Davies et al.

2004). Despite an overall sequence similarity of 85%, c-Cbl, but not Cbl-b, is capable of

Ub-binding. Furthermore, the UBA domains from Rad23 both bind Ub, yet share only

45% similarity. Met173 and Leu199 in UBA1 from Rad23, part of the hydrophobic

surface proposed to bind Ub, are polar in the Ub-binding UBA domain of Cbl-b.

Role of Ubiquitin in sex determination

One potential role for DSX and Ub in transcription involves histone modification.

Ubiquitinated H2A or H2B may serve as signals of transcriptional activity, recruiting

DSX via its UBA domain (Moore et al. 2002). Such interactions could potentially be

observed by co-immunoprecipitation or NMR footprinting. Another potential role for Ub

in sex-specific regulation is as an enhancer of DSX-IX complex formation. The

association of DSXF-IX is predicted to be relatively weak, based on our Y2H results. 210

Ubiquitination of IX may provide an additional "handle" to which DSX can bind,

allowing for bidentate recognition (Figure VI-7, p. 224).

The biological relevance of the CTDF-p::Ub interaction is currently unknown. It would be of future interest to design mutations in the CTDF impairing interaction with Ub, but leaving intact dimerization and IX-binding. Such mutations could be introduced into

Drosophila via gene targeting. We anticipate a novel phenotype of intermediate intersexual development where Ub is required for some, but not all DSXF functions. The

limits of the study are either (i) no phenotype, Ub is not required for DSXF functionality,

or (ii) classical intersex phenotype indicating Ub is required for all aspects of DSXF function. Additionally, CTDF may interact more strongly with Ub-like proteins or

protein domains (termed UBX domains (Buchberger 2002).

Does DSX CTD represent a common UBA dimerization motif?

Dimerization of several UBA motifs has been demonstrated by multiple techniques

including gel filtration, co-immunoprecipitation, and Y2H (Bartkiewicz et al.

1999;Davies et al. 2004;Bertolaet et al. 2001). The UBA domains in the Cbl proteins, E3

ubiquitin ligases involved in signaling and insulin-responsive glucose transport, require

dimerization for tyrosine phosphorylation and membrane translocation (Davies et al.

2004;Liu et al. 2003). As discussed above, the UBA domain from Cbl-b (UBAb), but not

c-Cbl (UBAa) also interacts with mono- and poly-Ub and ubiquitinated proteins. The

UBA domains from the DNA damage-inducible proteins Rad23 and Ddi1 have been

shown to homo- and hetereodimerize (Bertolaet et al. 2001). Mutation of Leu355 in the 211

UBA2 domain of Rad23 to alanine disrupts heterodimerization with Ddi1. Leu355 aligns

with Ile395 in the DSX CTDF, mutation of which to alanine also blocks dimerization in

our Y2H system. Ile395 contributes to both the protomeric mini-core and the dimer

interface, hence the perturbation shown in both systems can not be attributed necessarily

to dimer disruption. However, this work does suggest the mode of dimerization employed by DSX may be shared among UBA dimers. A potential difficulty with this model is that only the UBA1 domain aligns glycine with Gly398, previously identified to be a restricted position in the DSX dimer (see Chapter IV). While some UBA domains employ hydrophobic residues in this interface region, others (e.g. UBA2 and c-Cbl) utilize polar or charged residues (see Appendix IV for electrostatic potential of putative dimer interfaces of UBA domains). It is possible that these UBA domains form low- affinity dimers and stable monomers, unlike the unfolded protomers of the DSX CTDF.

To identify dimer-capable UBA domains, and discover potential homologs with diverse sequences, it would be of future interest to randomly mutagenize core dimer residues to identify novel combinations of residues that allow for native-like dimerization. This work could be accomplished using a Y2H screen using error-prone PCR mutagenesis, selecting for positively interacting colonies, as discussed above. Structural characterization of dimeric UBA domains by NMR or crystallography are of future interest, and may allow the development of “rules of dimerization” enabling the bioinformatics search of dimeric UBA domains and distant DSX homologs. 212

Figure VI-1

Schematic of putative DSX binding sites in promoter regions of suspected DSX-regulated genes. Sites are color-coded by agreement with the DSX consensus binding site: excellent match (green, no mismatches), plausible (yellow, one or two mismatches to the consensus), and possible (red, containing more than one unfavorable base pair).

213

Figure VI-1

214

Figure VI-2

Crystal contacts suggest potential tetramer surfaces. (A) N-terminal symmetric contact arraigned anti-parallel. (B) Loop-Loop interaction, the only non-symmetrical crystal contact. This contact involves the highly conserved Trp371 and the structurally important Pro370. (C) α3 contact also arraigned roughly anti-parallel. This interaction is primarily mediated by charged interactions. One dimer is blue, the other magenta.

215

Figure VI-2

216

Figure VI-3

Fluorescence studies of the CTDF-p and ubiquitin. Quenching of tryptophan fluorescence

was not observed for (A, left) mono-Ub or (A, right) tri-Ub (K48-linked). Free CTDF-p is blue, CTDF-p and Ub or tri-Ub is purple. (B) Guanidine hydrochloride induced denaturation of the CTDF monitoring both tryptophan (purple) and tyrosine (blue)

fluorescence shows W371 is already fully exposed. Tyrosine shows little change upon denaturation. 217

Figure VI-3

218

Figure VI-4

NMR footprinting studies of the CTDF-p and Ub. (A) 1H-15N HSQC showing weak interaction between CTDF-p (free, blue) and Ub (red) and tri-Ub (green). (B) Dilution of

CTDF-p::tri-Ub samples four-fold returns chemical shift perturbations to near-free state, supporting the possibility of weak binding. 219

Figure VI-4

220

Figure VI-5

Reverse footprint of 15N-Ub with unlabeled CTDF-p. Blue spectrum is Ub alone, red is

2x molar equivalent CTDF-p. 221

Figure VI-5

222

Figure VI-6

Model of di-Ub binding a UBA monomer from Varadan et al. 2005. In this model, the second Ub moiety would interfere with the CTDF-p dimer interface. Binding of DSX to poly-Ub would more likely involve bridging between the two monomer surfaces. 223

Figure VI-6

Figure adapted from (Varadan et al. 2005).

224

Figure VI-7

Organization of a sex-specific transcription complex. (A) Model of sex-specific

regulation of yolk protein expression in female (top) and male (bottom; (An and Wensink

1995a; An and Wensink 1995b). Female- and male-specific C-terminal extensions of

DSX are shown in red and purple, respectively. DSX binds to DNA as a dimer (DM

domain, green spheres; CTD-p, blue and teal). Top, Female-specific complex of DSXF occupies dsxA as an adjoining bZIP factor binds to bzip1 (light blue ribbon). Recruitment

of IX (arrow) enables synergistic activation of transcription. Binding of DSXF or DSXM displaces AEF1 from its target site aef1. In females, fat-body expression of DSXF is higher than that of AEF1, and therefore in presence of bZIP1, yolk proteins are expressed. Expression is by contrast repressed in ovary due to higher levels of AEF1, which displaces DSXF. Bottom, Male-specific repression of yolk proteins occurs as

binding of DSXM, which is 122 residues longer than DSXF (purple tail; (Burtis and Baker

1989), occludes bzip1 or inactivates bound bZIP1 (Coschigano and Wensink 1993). Also pictured is non-tissue-specific activator REF1. B, Schematic models of sex-specific recruitment of IX (orange rectangles) by DSXF. Cylinder model of CTDF-p dimer is

shown in blue and teal. Weak DSXF-IX interaction (top) would be strengthened by

bidentate recognition of tethered Ub moieties (pink triangles; bottom), providing a

mechanism of ubiquitination-coupled assembly of a preinitiation complex. 225

Figure VI-7

226

APPENDIX I

RESIDUE ENVIRONMENTS

This appendix presents the environments and interactions of the well-ordered residues in the CTDF-p as stereo line drawings. Nearly all residues are equivalent in the two protomers. Exceptions are noted in the figures below, and are typically the consequence of alternative side chains conformations.

Figure AI-1. 353 glutamine

Q356 begins helix α1 in both chains and has a hydrogen bond from its carboxyl-O with L357 amide. Q356 makes an interprotomeric contact with Val401' in α3.

L357 N -> Q353 O Sym = 1 Val = 0.640 DA = 3.08 DHA = 22.55

227

Figure AI-2 354 aspartic acid

D354 hydrogen bonds with D358 and several waters. An interprotomeric contact is made with Y405' from α3.

D358 N -> D354 O Sym = 1 Val = 0.645 DA = 3.11 DHA = 24.29

Figure AI-3 355 valine

V355 hydrogen bonds with Y359.

Y359 N -> V355 O Sym = 1 Val = 0.639 DA = 2.98 DHA = 23.35 228

Figure AI-4 356 phenylalanine

F356 resides in the mini-core of the protomer and has secondary structure contacts with Y378, L381, and K382 from α2 and A386 from loop II. F356 hydrogen bonds with Cys360.

C360 N -> P356 O Sym = 1 Val = 0.671 DA = 2.94 DHA = 24.70

Figure AI-5 357 leucine

L357 also resides in the mini-core, having secondary structure contacts with M374. An interprotomeric contact is also formed with Y405' from α3. L357 hydrogen bonds with Q353 and Q361.

L357 N -> Q353 O Sym = 1 Val = 0.640 DA = 3.08 DHA = 22.55 Q361 N -> L357 O Sym = 1 Val = 0.664 DA = 2.93 DHA = 23.38

229

Figure AI-6 358 aspartic acid

D358 resides on the solvent-exposed side of α1, lacking secondary structure contacts with the other helices. D358 is engaged in an i+4 salt bridge with K362. D358 hydrogen bonds with D354 and K362.

D358 N -> D354 O Sym = 1 Val = 0.645 DA = 3.11 DHA = 24.29 K362 N -> D358 O Sym = 1 Val = 0.619 DA = 3.01 DHA = 26.16 D358 OD1 - K362 NZ 5.93

Figure AI-7 359 tyrosine

Y359 helps to seal one end of the mini-protomer and is found in two conformations in the A chain. Y359 has secondary structure contacts with L381 (α2), I388 (α3), and A386 (L2). Y359 hydrogen bonds with V355 and L363.

Y359 N -> V355 O Sym = 1 Val = 0.639 DA = 2.98 DHA = 23.35 L363 N -> Y359 O Sym = 1 Val = 0.617 DA = 2.98 DHA = 25.77

230

Figure AI-8 360 cysteine

C360 is found in the mini-core where is contacts M374, M377, Y378, and L381. C360 hydrogen bonds to F356 and L364.

C360 N -> F356 O Sym = 1 Val = 0.671 DA = 2.94 DHA = 24.70 L364 N -> C360 O Sym = 1 Val = 0.673 DA = 2.91 DHA = 21.92

Figure AI-9 361 glutamine

Q361 functions to seal the outside edge of the mini-core, contacting M374. Gln361 hydrogen bonds with L357 and E365.

Q361 N -> L357 O Sym = 1 Val = 0.664 DA = 2.93 DHA = 23.38 Q361 NE2 -> E365 OE2 Sym = 1 Val = 0.242 DA = 3.39 DHA = 51.40 E365 N -> Q361 O Sym = 1 Val = 0.591 DA = 2.95 DHA = 25.71

231

Figure AI-10 362 lysine

K362 is found on the outside edge of α1, and assumes two conformations in the A chain. K362 forms salt bridges with D358 and E365 and hydrogen bonds with D358 and K366.

K362 N -> D358 O Sym = 1 Val = 0.619 DA = 3.01 DHA = 26.16 K366 N -> K362 O Sym = 1 Val = 0.591 DA = 2.98 DHA = 27.99 D358 OD1 - K362 NZ 5.93 E365 OE1 - K362 NZ 3.93

Figure AI-11 363 leucine

L363 is found in the mini-core of the protomer near the C-terminal end of a1, forming part of the apex between α1 and α2. L363 has secondary structure contacts with M377 of α2, F367 of loop I, and I388 and A391 of α3. L363 has hydrogen bond interactions with Y359 and F367.

L363 N -> Y359 O Sym = 1 Val = 0.617 DA = 2.98 DHA = 25.77 F367 N -> L363 O Sym = 1 Val = 0.489 DA = 3.31 DHA = 26.36

232

Figure AI-12 364 leucine

L364 makes substantial secondary structure contacts within the protomer mini-core, interacting with W371, L373, M374, M377, F367, R368, Y369, and P370. L364 also forms hydrogen bonds with C360, R368, and Y369.

L364 N -> C360 O Sym = 1 Val = 0.673 DA = 2.91 DHA = 21.92 R368 N -> L364 O Sym = 1 Val = 0.250 DA = 3.10 DHA = 77.51 Y369 N -> L364 O Sym = 1 Val = 0.673 DA = 2.96 DHA = 17.19

Figure AI-13 365 glutamic acid

E365 resides near the C-terminus of a1 facing into the solvent, and contacts F367 and R368. E365 forms a salt bridge with K362 and hydrogen bonds with Q361 and R368.

E365 N -> Q361 O Sym = 1 Val = 0.591 DA = 2.95 DHA = 25.71 R368 N -> E365 O Sym = 1 Val = 0.298 DA = 3.06 DHA = 28.87 R368 NH1 -> E365 O Sym = 1 Val = 0.489 DA = 2.62 DHA = 33.43

233

Figure AI-14 366 lysine

K366 terminates α1, projecting into solvent. K366 contacts neighboring F367 and R368 within the turn region. K366 forms a salt bridge with E389, a charge-stabilized hydrogen bond with E396, and hydrogen bonds with K362 and S392.

K366 N -> K362 O Sym = 1 Val = 0.591 DA = 2.98 DHA = 27.99 K366 NZ -> S392 OG Sym = 1 Val = 0.334 DA = 3.47 DHA = 37.71 K366 NZ -> E396 OE2 Sym = 1 Val = 0.433 DA = 3.52 DHA = 25.30 E389 OE2 - K366 NZ 6.96

Figure AI-15 367 phenylalanine

F367 resides within at the beginning of Loop I with its side chain projecting into the mini-core, helping to seal the apex. F367 makes extensive contacts, interacting with L363, L364, E365, and K366 from α1. F367 hydrogen bonds with L363.

F367 N -> L363 O Sym = 1 Val = 0.489 DA = 3.31 DHA = 26.36

234

Figure AI-16 368 arginine

R368 projects into solvent from loop I, and has contact with L364, E365, and K366. R368 hydrogen bonds with L364 and E365.

R368 N -> E365 O Sym = 1 Val = 0.298 DA = 3.06 DHA = 28.87 R368 N -> L364 O Sym = 1 Val = 0.250 DA = 3.10 DHA = 77.51 R368 NH1 -> E365 O Sym = 1 Val = 0.489 DA = 2.62 DHA = 33.43

Figure AI-17 369 tyrosine

Y369 projects from loop I into the edge of the mini-core abutting the dimer interface, where it contacts residues L364, L373', and L376'. Y369 hydrogen bonds with L364.

Y369 N -> L364 O Sym = 1 Val = 0.673 DA = 2.96 DHA = 17.19

235

Figure AI-18 370 proline

P370 terminates loop I, and precedes the 310 component of a2 (α2A). Secondary structure contacts are formed with L364, and a hydrogen bond with L373.

L373 N -> P370 O Sym = 1 Val = 0.725 DA = 2.91 DHA = 15.38

Figure AI-19 371 tyrptophan

W371 begins α2A, a 310 helix. W371 has few secondary structure contacts (limited to L364 and D383) as it is poised at the N-terminal of α2A and projects predominantly into solvent. W371 hydrogen bonds with M374.

M374 N -> W371 O Sym = 1 Val = 0.712 DA = 2.85 DHA = 14.95

236

Figure AI-20 372 glutamic acid

E372 exists with two side chain conformations in both chains with minimal secondary structure contacts. E372 does not have hydrogen bond interactions.

Figure AI-21 373 leucine

L373 seals the top of the dimer interface between α2-α2' along with its symmetry-related L373'. L373 interacts with L364 from α1 and L373' from α2' and has hydrogen bond interactions with P370, L376, and M377.

L373 N -> P370 O Sym = 1 Val = 0.725 DA = 2.91 DHA = 15.38 L376 N -> L373 O Sym = 1 Val = 0.375 DA = 2.86 DHA = 52.16 M377 N -> L363 O Sym = 1 Val = 0.715 DA = 3.12 DHA = 18.61

237

Figure AI-22 374 methionine

M374 is part of the protomeric mini-core and the final residue in α2A. M374 contacts L357, C360, Q361, L364 from α1 and V402' from α3'. M374 hydrogen bonds with W371, M377, and Y378.

M374 N -> W371 O Sym = 1 Val = 0.712 DA = 2.85 DHA = 14.95 M377 N -> M374 O Sym = 1 Val = 0.002 DA = 3.07 DHA = 71.13 Y378 N -> M374 O Sym = 1 Val = 0.687 DA = 2.87 DHA = 17.12

Figure AI-23 375 proline

P375 terminates α2A by introducing a kink in the helix. P375 also has dimer contacts, interacting with G398', Q399', and V402'. P375 is also involved in hydrogen bond formation with V379.

V379 N -> P375 O Sym = 1 Val = 0.597 DA = 3.11 DHA = 25.73

238

Figure AI-24 376 leucine

L376 begins helix segment α2B and is found in the dimer interface opposite to its symmetry mate. L376 interacts with L376', M377', and I380' and hydrogen bonds with L373 and I380.

L376 N -> L373 O Sym = 1 Val = 0.375 DA = 2.86 DHA = 52.16 I380 N -> L376 O Sym = 1 Val = 0.780 DA = 2.99 DHA = 9.43

Figure AI-25 377 methionine

M377 resides in the mini-core near the dimer interface, contacting C360, L363, and L364 from α1, and L376' from α3'. M377 hydrogen bonds with L373, M374, and L381.

M377 N -> L373 O Sym = 1 Val = 0.715 DA = 3.12 DHA = 18.61 M377 N -> M374 O Sym = 1 Val = 0.002 DA = 3.07 DHA = 71.13 L381 N -> M377 O Sym = 1 Val = 0.702 DA = 2.88 DHA = 10.85

239

Figure AI-26 378 tyrosine

Y378 resides at the dimer interface opposite to G398, the site of an intersex mutation in Drosophila. Tyr378 contacts F356 and C360 in α1, E397', G398', and V401' in α3'. Y378 also forms an interprotomeric hydrogen bond with E397', along with backbone hydrogen bonds with M374 and K382.

Y378 N -> M374 O Sym = 1 Val = 0.687 DA = 2.87 DHA = 17.12 Y378 OH -> E397' OE2 Sym = 1 Val = 0.870 DA = 2.74 DHA = 9.75 K382 N -> Y378 O Sym = 1 Val = 0.633 DA = 2.85 DHA = 24.36

Figure AI-27 379 valine

V379 resides in the dimer interface between α2 and α3', contacting I380' in α2' and R394' and I395' in α3'. V379 forms hydrogen bonds with P375 and D383.

V379 N -> P375 O Sym = 1 Val = 0.597 DA = 3.11 DHA = 25.73 D383 N -> V379 O Sym = 1 Val = 0.711 DA = 2.90 DHA = 14.73

240

Figure AI-28 380 isoleucine

I380 is a dimer interface residue between α2 and α2' and has some contact with the protomeric mini-core. I380 contacts I395 from α3, L376', V379', and I380' from α2', and A384 from loop II. I380 hydrogen bonds with L376 and A384.

I380 N -> L376 O Sym = 1 Val = 0.780 DA = 2.99 DHA = 9.43 A384 N -> I380 O Sym = 1 Val = 0.430 DA = 3.14 DHA = 45.08

Figure AI-29 381 leucine

L381 is found within the protomer mini-core and adopts two conformations in the A chain. L381 contacts F356, Y359, and C360 from α1, A384, D385, A386, and N387 from loop II. L381 forms hydrogen bonds with M377 and A386.

L381 N -> M377 O Sym = 1 Val = 0.702 DA = 2.88 DHA = 10.85 A386 N -> L381 O Sym = 1 Val = 0.696 DA = 2.81 DHA = 13.86

241

Figure AI-30 382 lysine

K382 resides near the C-terminal of α2 and interacts with F356 from α1 and E397' from α3'. K382 forms an interprotomeric charge stabilized hydrogen bond with E397' and backbone hydrogen bonds with Y378 and D385.

K382 N -> Y378 O Sym = 1 Val = 0.633 DA = 2.85 DHA = 24.36 K382 NZ -> E397' OE2 Sym = 1 Val = 0.361 DA = 2.65 DHA = 77.21 D385 N -> K382 O Sym = 1 Val = 0.534 DA = 2.96 DHA = 26.76

Figure AI-31 383 aspartic acid

D383 helps to seal on end of the α2-α2' dimer core, and contacts R394, R394', D385, and D383'. D383 forms both intra- and intermolecular salt bridges and a backbone hydrogen bond with V379.

D383 N -> V379 O Sym = 1 Val = 0.711 DA = 2.90 DHA = 14.73 D383 OD1 - R394 NH1 6.48 D383 OD2 - R394 NH1 5.64 D383 OD1 - R394' NH1 3.42 D383 OD1 - R394' NH2 4.95 D383 OD2 - R394' NH1 3.88 D383 OD2 - R394' NH2 6.02 242

Figure AI-32 384 alanine

A384 terminates α2B and begins loop II, a type I' β-turn. A384 interacts with N387 and has hydrogen bonding interactions with I380 and N387.

A384 N -> I380 O Sym = 1 Val = 0.430 DA = 3.14 DHA = 45.08 N387 N -> A384 O Sym = 1 Val = 0.520 DA = 2.97 DHA = 31.79

Figure AI-33 385 aspartic acid

D385 continues the b-turn, with the side chain projecting into the solvent. D385 has secondary structure interactions with L381, K382, and D383. Additionally, D385 forms an intraprotomeric salt bridge with R394. D385 hydrogen bonds with K382.

D385 N -> K382 O Sym = 1 Val = 0.534 DA = 2.96 DHA = 26.76 D385 OD2 - R394 NH2 6.83

243

Figure AI-34 386 alanine

A386 is also found on the connecting loop between α2 and α3, and interacts with F356, Y359, and L381. A386 hydrogen bonds with L381.

A386 N -> L381 O Sym = 1 Val = 0.696 DA = 2.81 DHA = 13.86

Figure AI-35 387 asparagine

N387 is the last residue comprising the β-turn between α2 and α3. It interacts with L381. The N387 side chain hydrogen bonds to E389 and E390; backbone hydrogen bonds are formed with A384 and A391.

N387 N -> A384 O Sym = 1 Val = 0.520 DA = 2.97 DHA = 31.79 E389 N -> N387 OD1 Sym = 1 Val = 0.220 DA = 3.21 DHA = 71.77 E390 N -> N387 OD1 Sym = 1 Val = 0.627 DA = 3.03 DHA = 11.17 A384 N -> N387 O Sym = 1 Val = 0.747 DA = 2.88 DHA = 13.41

244

Figure AI-36 388 isoleucine

I388 is the first residue of α3 and is a component of the mini-core. I388 interacts with Y359, K362, L363 from α1 and with L381 from α2. I388 carbonyl oxygen hydrogen bonds with S392 side chain while a backbone hydrogen bond is formed with the amide hydrogen of S392.

S392 N -> I388 O Sym = 1 Val = 0.687 DA = 2.90 DHA = 16.79 S392 OG-> I388 O Sym = 1 Val = 0.241 DA = 3.08 DHA = 38.03

Figure AI-37 389 glutamic acid

E389 projects into the solvent, sealing the N-terminus of α3. E389 forms a salt bridge with K366, while a hydrogen bond is formed between the N387 side chain and the E389 amide hydrogen. A backbone hydrogen bond is formed with R393.

E389 N -> N387 OD1 Sym = 1 Val = 0.220 DA = 3.21 DHA = 71.77 R393 N -> E389 O Sym = 1 Val = 0.719 DA = 2.98 DHA = 17.09 E389 OE2 - K366 NZ 6.96

245

Figure AI-38 390 glutamic acid

E390 is found at the solvent-exposed surface of α3, and is one of several charged residues in this region. E390 has a salt bridge interaction with R394, and forms a charge- stabilized hydrogen bond with R393. E390 also forms a hydrogen bond with N387 and R394.

E390 N -> N387 OD1 Sym = 1 Val = 0.627 DA = 3.03 DHA = 11.17 R393 NH2 -> E390 OE2 Sym = 1 Val = 0.461 DA = 2.58 DHA = 47.62 R394 N -> E390 O Sym = 1 Val = 0.664 DA = 2.94 DHA = 23.71 R394 NE -> E390 OE2 Sym = 1 Val = 0.751 DA = 2.89 DHA = 31.15 E390 OE1 - R394 NH1 6.66 E390 OE1 - R394 NH2 5.50 E390 OE2 - R394 NH1 4.96 E390 OE2 - R394 NH2 3.74

Figure AI-38 391 alanine

A391 projects into the protomer mini-core, interacting with L363 from α1, and with I380 and L381 from α2. A391 hydrogen bonds with N387 and I395.

A391 N -> N387 O Sym = 1 Val = 0.747 DA = 2.88 DHA = 13.41 I395 N -> A391 O Sym = 1 Val = 0.665 DA = 3.05 DHA = 21.24

246

Figure AI-39 392 serine

S392 helps seal the outside edge of the mini-core, contacting K366 from α1. S392 forms hydrogen bonds with I388 and E396.

S392 N -> I388 O Sym = 1 Val = 0.687 DA = 2.90 DHA = 16.79 S392 OG -> I388 O Sym = 1 Val = 0.241 DA = 3.08 DHA = 38.03 E396 N -> S392 O Sym = 1 Val = 0.696 DA = 2.89 DHA = 16.24

Figure AI-40 393 arginine

R393 projects from a3 into the solvent and has multiple salt bridge and hydrogen bond contacts. R393 forms a charge-stabilized hydrogen bond with E390, a salt bridge with E397, and backbone hydrogen bonds with E389 and E397.

R393 N -> E389 O Sym = 1 Val = 0.719 DA = 2.98 DHA = 17.09 R393 NH2 -> E390 OE2 Sym = 1 Val = 0.461 DA = 2.58 DHA = 47.62 E397 N -> R393 O Sym = 1 Val = 0.671 DA = 2.89 DHA = 23.57 E397 OE1 - R393 NH1 4.43 E397 OE1 - R393 NH2 4.51 E397 OE2 - R393 NH1 6.47 E397 OE2 - R393 NH2 6.12

247

Figure AI-41 394 arginine

R394 is involved in both intra- and interprotomeric salt bridge interactions, interacting with D383, E390, and D383'. R394 also has interprotomeric secondary structure interactions with V379' from α2'. R394 forms a side chain hydrogen bond with E390 and backbone hydrogen bonds with E390 and G398.

R394 N -> E390 O Sym = 1 Val = 0.664 DA = 2.94 DHA = 23.71 R394 NE -> E390 OE2 Sym = 1 Val = 0.751 DA = 2.89 DHA = 31.15 G398 N -> R394 O Sym = 1 Val = 0.765 DA = 3.00 DHA = 13.86 D383 OD1 - R394 NH1 6.48 D383 OD2 - R394 NH1 5.64 D385 OD2 - R394 NH2 6.83 E390 OE1 - R394 NH1 6.66 E390 OE1 - R394 NH2 5.50 E390 OE2 - R394 NH1 4.96 E390 OE2 - R394 NH2 3.74 D383' OD1 - R394 NH1 3.51 D383' OD1 - R394 NH2 4.95 D383' OD2 - R394 NH1 3.85 D383' OD2 - R394 NH2 5.94 248

Figure AI-42 395 isoleucine

I395 projects from α3 into both the dimer interface and the protomer mini-core. I395 contacts I380 from α2 along with L376' and V379' from α2'. I395 hydrogen bonds with A391 and Q399.

I395 N -> A391 O Sym = 1 Val = 0.665 DA = 3.05 DHA = 21.24 Q399 N -> I395 O Sym = 1 Val = 0.577 DA = 2.93 DHA = 33.44

Figure AI-43 396 glutamic acid

E396 projects from α3 towards α1, creating a salt bridge with K366 sealing one side of the mini-core. E396 also has backbone hydrogen bonds with S392 and Y400.

E396 N -> S392 O Sym = 1 Val = 0.696 DA = 2.89 DHA = 16.24 Y400 N -> E396 O Sym = 1 Val = 0.742 DA = 2.92 DHA = 14.08 E396 OE1 - K366 NZ 5.36 E396 OE2 - K366 NZ 3.52

249

Figure AI-44 397 glutamic acid

E397 projects from a3 over the dimer interface to a2', forming a charge-stabilized hydrogen bond with K382 and side chain hydrogen bond with Y378. E397 also forms an intraprotomeric salt bridge with R393 and has backbone hydrogen bonds with R393 and V401.

E397 N -> R393 O Sym = 1 Val = 0.671 DA = 2.89 DHA = 23.57 V401 N -> E397 O Sym = 1 Val = 0.773 DA = 2.92 DHA = 2.28 K382' NZ -> E397 OE2 Sym = 1 Val = 0.575 DA = 2.62 DHA = 48.73 E397 OE1 - R393 NH1 4.43 E397 OE1 - R393 NH2 4.51 E397 OE2 - R393 NH1 6.47 E397 OE2 - R393 NH2 6.12

Figure AI-45 398 glycine

G398 is the site of an intersex mutation in Drosophila, and resides at the dimer interface between α2' and α3. G398 has secondary structure contacts with P375' and Y378', and hydrogen bonds with R394 and V402.

G398 N -> R394 O Sym = 1 Val = 0.765 DA = 3.00 DHA = 13.86 V402 N -> G398 O Sym 1 Val = 0.675 DA 2.99 DHA = 16.21

250

Figure AI-46 399 glutamic acid

Q399 projects across the dimer interface and helps to seal one end of the protomer mini- core. Q399 interacts with P375' from α2', and hydrogen bonds with I395 and N403.

Q399 N -> I395 O Sym = 1 Val = 0.577 DA = 2.93 DHA = 33.44 N403 N -> Q399 O Sym = 1 Val = 0.652 DA = 3.01 DHA = 19.56

Figure AI-47 400 tyrosine

Y400 projects from a3 into solvent, having only minor secondary structure contacts. Y400 hydrogen bonds with E396 and E404.

Y400 N -> E396 O Sym = 1 Val = 0.742 DA = 2.92 DHA = 14.08 E404 N -> Y400 O Sym = 1 Val = 0.629 DA = 2.98 DHA = 22.63

251

Figure AI-48 401 valine

V401 is positioned at the dimer interface, interacting with Y378' from α2' and Q353' from α1'. V401 forms hydrogen bonds with E397 and Y405.

V401 N -> E397 O Sym = 1 Val = 0.773 DA = 2.92 DHA = 2.28 Y405 N -> V401 O Sym = 1 Val = 0.688 DA = 3.01 DHA = 17.54

Figure AI-49 402 valine

V402 also resides at the α3-α2' dimer interface, interacting with M374' and P375'. V402 hydrogen bonds with G398 and S406 (side chain and amide hydrogen).

V402 N -> G398 O Sym = 1 Val = 0.675 DA = 2.99 DHA = 16.21 S406 N -> V402 O Sym = 1 Val = 0.658 DA = 3.04 DHA = 22.73 S406 OG -> V402 O Sym = 1 Val = 0.404 DA = 2.95 DHA = 33.03

252

Figure AI-50 403 asperagine

N403 projects into solvent and has little secondary structure contacts. N403 hydrogen bonds with Q399 and R407.

N403 N -> Q399 O Sym = 1 Val = 0.652 DA = 3.01 DHA = 19.56 R407 N -> Q399 O Sym = 1 Val = 0.811 DA = 2.99 DHA = 8.38

Figure AI-51 404 glutamic acid

E404 projects into solvent and has little secondary structure contacts. E404 hydrogen bonds with Y400 and Q408.

E404 N -> Y400 O Sym = 1 Val = 0.629 DA = 2.98 DHA = 22.63 Q408 N -> E404 O Sym = 1 Val = 0.556 DA = 2.87 DHA = 25.47

253

Figure AI-52 405 tyrosine

Y405 has minor interprotomeric contacts with L357' from α1'. Y405 hydrogen bonds with V401.

Y405 N -> V401 O Sym = 1 Val = 0.688 DA = 3.01 DHA = 17.54

Figure AI-53 406 serine

S406 and R407 terminal α3 in the A chain and B chain, respectively. S406 side chain hydrogen bonds with V402, and has backbone hydrogen bonds with V402 and L411.

S406 N -> V402 O Sym = 1 Val = 0.658 DA = 3.04 DHA = 22.73 S406 OG -> V402 O Sym = 1 Val = 0.404 DA = 2.95 DHA = 33.03 L411 N -> S406 O Sym = 1 Val = 0.193 DA = 3.19 DHA = 66.57

Residues 407-412 are not well ordered and are modeled without side chains, and hence not included here. 254

APPENDIX II

THREE HELIX DOMAINS

This appendix presents a survey of small three helix domains based on CATH protein

fold classifications from database release 2.6.0, illustrating the uniqueness of the UBA fold.

255

Figure AII-1. Helix Hairpins represented by 1AIL. N-Terminal Fragment Of Ns1 Protein From Influenza A Virus. CATH code 1.10.287. 256

Figure AII-2. Lyase 2-enoyl-coa Hydratase; Chain A, domain 2, PDB code 1DCI. CATH code 1.10.12. 257

Figure AII-3. Chorismate Mutase Domain, subunit A. PDB code 1ECM. CATH code 1.20.59. 258

Figure AII-4. Chaperone, represented by Hsc20 (Hscb), A J-Type Co-Chaperone From E. Coli. PDB code 1FPO. CATH code 1.10.287.110. 259

Figure AII-5. Ligase chain a. PDB code 1FS1. CATH code 1.10.8.70. 260

Figure AII-6. Albumin-binding domain. PDB code 1GAB. CATH code 1.10.8.40.1. 261

Figure AII-7. Homeodomain, represented by POU-specific homeodomain. PDB code 1HDP. CATH code 1.10.10.60. 262

Figure AII-8. Arc Repressor-related fold. Represented by the C-Terminal Domain Of The Rap74 Subunit Of Human Transcription Factor Iif (Tfiif). PDB code 1I27. CATH code 1.10.10. 263

Figure AII-9. HLA-dr Antigens Associated Invariant Chain Chain A. PDB code 1IIE. CATH code 1.10.870. 264

Figure AII-10. Insulin. PDB code 1J73. CATH code 1.10.100. 265

Figure AII-11. Nitrogenase Molybdenum-iron Protein, subunit B domain 4. PDB code 1M1N. CATH code 1.20.89. 266

Figure AII-12. Syk Kinase Chain A, domain 2. PDB code 1M61. CATH code 1.10.930. 267

Figure AII-13. Monooxygenase. Represented by Methane Monooxygenase Hydroxylase From Methylococcus Capsulatus (Bath). PDB code 1MTY. CATH code 1.20.1280. 268

Figure AII-14. RuvA helicase, superfamily of UBA domains. PDB code 1OAI. CATH code 1.10.8. 269

Figure AII-15. Pheromone ER-1. PDB code 2ERL. CATH code 1.20.50. 270

Figure AII-16. Phosducin domain 2. PDB code 2TRC. CATH code 1.10.168. 271

APPENDIX III

This appendix contains the sequence homology of the C-terminal domain of insect DSX male and female isoforms.

272

APPENDIX IV

UBA DOMAINS AND ELECTROSTATICS

This appendix presents ribbon representation of known UBA domains and the corresponding electrostatic surface calculated by APBS. In each case, the N-terminal is orientated to the top-left. Structures are presented with the surface homologous to the

CTDF-p dimer interface facing the viewer.

273

274

275

APPENDIX V

EXPRESSION AND PURIFICATION OF INTACT DSX

Initial work in our laboratory on DSX involved a "divide and conquer" approach,

focusing on the isolated DM and dimerization domains. These studies can be fruitfully

expanded by examination of the full DSX proteins. For example, DSX tetramers were

previously noted in both the presence and absence of DNA (Cho and Wensink 1996),

however, such tetramer formation was not noted for our isolated domains. It would be of

interest to identify the tetramer surfaces for study of cooperativity in gene regulation (see

Chapter VI). Structural studies of intact protein alone, in complex with DNA, and in

complex with the DSXF-specific coactivator IX will prove extremely informative in

understanding the organization of a sex-specific transcription complex.

To these ends, several attempts at producing full length DSX have been undertaken, including bacterial and insect cell expression, as well as an in vitro coupled transcription- translation system. Bacterial expression of a staphylococcal nuclease-DSXF fusion protein resulted primarily in inclusion body formation; recovery of protein by guanidine

hydrochloride or urea suspension and dialysis were not successful. Expression in the in

vitro system using E. coli derived extract was likewise unsuccessful. Protein which is not

expressed, or is poorly expressed, in E. coli and the E. coli extract can often be expressed

successfully using a eukaryotic expression system, such as wheat germ (personal 276 communication, Roche Applied Science). Such studies are ongoing. Insect cell based expression is another possible method for protein production. Currently, stably transfected cell lines expressing DSXF have been produced in our laboratory by collaborative efforts of B. Li and W. Zhang. We have not yet produced amounts sufficient for structural analysis, but have demonstrated expression by western blot. 277

APPENDIX VI

EXPRESSION AND PURIFICATION OF INTACT INTERSEX

To enable structural studies of a DSXF-IX complex, we have endeavored to express intact

IX. Taking advantage of our E. coli Staphylococcal Nuclease system, we expressed IX fusion protein in BL21pLysS cells. The fusion protein was largely expressed in inclusion bodies. We rescued the protein from the inclusion bodies by solubilization in 6M guanidine hydrochloride. Once solubilized, we employed slow dialysis, stepping-down the guanidine concentration in the dialysis buffer over time in the constant presence of

DTT. The dialyzed protein was soluble, and subsequently subjected to thrombin cleavage. Purification by preparative gel filtration demonstrated a single multimeric species, expected as IX has been shown to dimerization by Y2H (Garrett-Engele et al.

2002). During concentration of the purified material, however, precipitate began forming. Following storage at -20 °C, the protein formed a large, insoluble, aggregate. It is possible that the intact protein is generally unstable and susceptible to oxidation, owing to its large number of methionine and cysteine residues.

To avoid inclusion body formation and promote native folding, we chose to express IX in yeast using the pPICZα system in Pichia pastoris under control of the methanol sensitive

AOX1 promoter. The vector expresses IX plus an N-terminal secretion tag (the

Saccharomyces cerevisiae α-factor) and C-terminal myc epitope tag. We demonstrated incorporation of the IX construct in three yeast transformants (Figure AVI-1). Pressure 278 testing by increasing levels of antibiotic suggested multiple gene copies in two of the transformants. Test expression did not show protein expression in the yeast media by western blot, despite the presence of the secretion tag. Low levels of expression were noted in cellular lysate, however. The vector thus constructed does appear capable of transforming target yeast, but addition transformants will most likely be necessary to find a higher copy number colony to express protein in large enough quantity for structural work. 279

Figure AVI-1

IX expression in Pichia. (A) Pressure-testing of zeocin-resistant colonies. (B) PCR analysis of IX cDNA in colonies A3 and C1. (C) anti-Myc Western blot of cell extracts taken over a five day time course demonstrates faint staining consistent with the anticipated molecular weight of the IX-fusion. 280

Figure AVI -1

281

APPENDIX VII

CHARACTERIZATION OF THE PUTATIVE INTERSEX-BINDING GROOVE

Mutational and co-immunoprecipitation data presented in this appendix are

collaborative efforts from W. Zhang.

We have employed co-IP and yeast two-hybrid assays to investigate the role of the DSXF female-specific tail in IX binding. S2 nuclear extracts were first analyzed by Western blot to demonstrate equal expression of transfected IX (Figure AVII-1A, bottom box) and transfected DSXF/M (middle box). Nuclear extracts were immunoprecipitated and

analyzed by Western blot (top box in Figure AVII-1A). As expected, transfected DSXF but not DSXM co-precipitated with IX (lanes 9 and 3 in Figure AVII-1A). A series of

DSX deletion constructs and co-IP results are summarized in Figure AVII-1B.

Surprisingly, our results suggest that the major portion of the sex-specific tail does not contribute to IX binding. It is therefore probable that IX binds to the small portion of the ordered surface that is sex-specific (residues 398-407) and/or to the non-sex-specific

surface of the domain (residues 350-397).

Monomeric DSX CTDF is unfolded (see Chapter V), implying that if IX indeed binds to

a pre-ordered surface of DSXF CTDF, then dimerization is required for IX binding. To

determine whether the deletion fragments used in the co-IP assay retain the ability to

dimerize (and therefore present a structured binding surface), we generated deletion constructs to test the DSXF monomer-monomer interaction in the yeast two-hybrid 282

system. Truncation of the C-terminal region at residue 412 (Δ413-427) does not affect dimer formation, whereas the additional deletion of the tail to residue 406 (Δ407-427)

leads to greater-than 10-fold reduction in β-galactosidase activity, implying markedly

reduced dimerization. These results indicate that the lack of DSX-IX interaction with

these constructs may be due to a disordered binding surface rather than the truncation of

the IX binding surface.

Cell growth was used as a selection assay to identify mutations impairing DSXF-IX interaction. The yeast two-hybrid system employed here utilizes the HIS3 gene under the control of GAL4 upstream activating sequences, conferring histidine sensitivity. Binding of the DSXF-GAL4 BD to the IX-GAL4 AD is expected to restore growth on media

lacking histidine. Structure-based mutagenesis of the C-terminal domain of DSXF

(including the female-specific C-terminal tail) was undertaken to map the IX-binding surface. The crystal structure was examined to identify 25 surface residues for site- directed mutagenesis to scan the protein surface via alanine “shaving” and charge- reversal mutagenesis. Results are summarized in Table AVII-1.

Doublesex binds Intersex in a surface groove using sex- and non-sex-specific sequences

Whereas most substitutions do not perturb the DSXF-IX interaction as probed by our

Y2H assay, three substitutions are able to block this interaction. One substitution lies in

the non-sex-specific region (D385K) while the other two (Y400A and N403A) lie in the

proximal (and well ordered) portion of the female-specific region. These results are in 283

accordance with our findings of the sex-specificity of the DSX-IX interaction: (i) DSXM can interact with IX when the selective media was supplemented with lower concentration of 3-aminotriazole (3-AT), a competitive inhibitor for the HIS3 gene product and (ii) Y400V, the corresponding male residue, is indistinguishable from

Y400A in our Y2H system. We can not exclude the possibility that other residues are involved in this interaction due to the limitation of the Y2H system in the detection of weak interactions. In order to validate our interaction results, we confirmed that these point mutants retained the ability to dimerize via Y2H and further demonstrate impaired protein-protein interaction by co-IP assays with Y400A.

One of the mutations interfering with IX binding is found in the sex-non-specific region of the protein whereas the remaining two are found in the sex-specific region; the Cα atoms are separated by nearly 19 Å. Connecting these two binding hotspots is a groove that runs along the surface of the protein and crosses both the dimer interface and the sex- specific transition. This result explains two separate observations: (i) mutations impairing dimerization block IX binding and (ii) the majority of the female-specific tail is dispensable for IX binding. The binding groove begins at Asp385 in the non-sex-specific region and ends at Asn403 in the sex-specific region. Phe356 and Leu357 from α1 and

Glu397' and V401' from α3' contribute to the floor of the groove. Gly352, Asp354,

Y400', Glu404', Y405' form one side of the groove while Tyr378, Lys382, and Glue397' form the opposing side (Figure AVII-2C). The binding groove bifurcates at residue

V401', suggesting two alternate binding surfaces. To investigate which groove is important for IX binding, we made "roadblock" mutations (V401I and Y405W) predicted 284

to selectively block one or the other pathway for examination in our yeast two-hybrid system. V401I is predicted to block the upper groove, while Y405W blocks the lower groove. Growth with the V401I mutant is considerably slower than with Y405W, the

growth rate of which is comparable to wild-type, indicating perturbation of the upper

binding groove, but not the lower groove, impairs DSX-IX binding.

Sex-specific differences in the binding groove

To investigate the differences between the male and female isoforms that may contribute

to selective IX binding, we modeled male residues into the female tail in the crystal

structure. Secondary structure prediction programs suggest the male tail extends α3 by

less than one turn, implying the secondary structure in the putative IX binding region is

relatively unchanged between the isoforms. Modeling the male residues into the female

tail structure reveals little change in overall topology, i.e. the binding groove is

unobstructed by male residues. The predominant difference between the isoforms is

electrostatic potential; the male isoform contains one charge reversal (E404R), one non-

polar to charge (V401E), one polar to charge (Q399R), and one charge to non-polar

(R407A) change. The charge reversal of E404R is made more prominent by the substitution of Tyr400 for valine, allowing the positive charge of the arginine group to more fully project into the binding groove (Figure AVII-3). While our Y2H results indicate the negatively charged Glu404 is dispensable for IX binding, it is possible that the positively charged arginine contributes to unfavorable interactions with IX. 285

Table AVII-1

Analyses of surface mutations in DSXF C-terminal domain. Surface mutations that block detectable IX binding

D385K, Y400A, and N403A Surface mutations that do not perturb IX binding Q353A, D354K, V355A, D358K, Y359A, Q361A, K362D, E365R, K366D, R368E, W371A, E372A, D383A, E389R, E390R, R393A, E396R, E397R, Q399A, E404A, Y405A, and R407A Sex-specific residues are highlighted in red; non-sex-specific residues are shown in black. 286

Figure AVII -1

Co-IP deletion analysis of DSXF-IX binding. (A) Western blots. Control lanes 1-3: (1)

transfection with HA-IX and pAc5.1/V5 empty vector, (2) transfection with V5-DSXF and pAc5.1/HA empty vector, (3) transfection with HA-IX and wild-type V5-DSXM.

Lanels 4-8, deletion analysis. Lane 9, co-transfection with HA-IX and full-length V5-

DSXF. (B) Deletion constructs and summary of co-IP results (++, + or – indicating

intensity of co-IP band in top box in panel A). Abbreviations: wt, wild-type; IB,

immunoblot; V5 and HA, epitope tags. Deletion to 397 removes part of an ordered α- helix and so might disrupt folding and dimerization. 287

Figure AVII-1

288

Figure AVII-2

Yeast two-hybrid results suggest the presence of an IX-binding groove that spans the dimer interface and both sex-specific and non-specific surfaces. (A) Surface representation colored by protomer (mutations affecting IX binding in green, residues

Asp385, Tyr400, Asn403). (B) Surface representation colored by results of Y2H.

Mutations not affecting IX binding are in gray, mutations abolishing binding are in red, and untested residues are dark green. (C & D) Close up of the putative IX-binding groove on DSXF (C) and a model of DSXM (D). Sex-non-specific surfaces are gray, female-specific surfaces in red, and male in blue. Green residues as in (A). 289

Figure AVII-2

290

Figure AVII-3

Electrostatic potential of IX binding groove. Although the overall topology of the binding groove between female (A) and male (B) is similar, several electrostatic differences are apparent. Surfaces were calculated with APBS. 291

Figure AVII-3

292

APPENDIX VIII

YEAST ONE-HYBRID CONTROL STUDIES

This appendix contains control experiments related to the yeast one-hybrid studies

presented in Chapter IV. This work was a collaborative effort by Wei Zhang.

Design and Control Studies of Y1H System

DSX isoforms regulate the sex- and tissue-specific expression of yolk protein genes via

the well-characterized fat-body enhancer (fbe; (An and Wensink 1995a; An and Wensink

1995b). A Y1H reporter plasmid was therefore constructed using a 48-bp enhancer

consisting of two DSX binding sites (designated dsxA and dsxB), placed upstream of a

lacZ reporter gene. An overview of the design is shown in schematic form in Figure S1-

A. To effect transcriptional regulation in S. cerevisciae, DSXF or fragments containing

the DM domain (residues 1-118) were subcloned in frame into yeast expression vector pACT2. The resulting constructs produced fusion proteins consisting of an N-terminal

Gal4 activation domain (AD) and C-terminal DSX sequences. The Y1H system yields

DSX-dependent expression of β-galactosidase (as monitored by X-gal indicator plates

and as measured in extracts) relative to empty vector controls: heterologous gene

regulation requires the presence of the DSX DM domain. The fidelity of the Y1H system

was validated by the following control studies. 293

Figure AVIII-1. Design of Y1H system and control studies.

(i) Base-pair changes in the enhancer element that block specific DSX binding in vitro also block expression of β-galactosidase. Three variant enhancer elements were

constructed by site-directed mutagenesis to disrupt the DSX binding sites. Paired G→A

transitions were introduced at the centers of dsxA and/or dsxB based on prior analysis of

the sequence specificity of the DM domain (Erdman and Burtis 1993; Zhu et al. 2000).

The wild-type and variant DNA target sites are given in Figure S1-B. The variant reporter plasmids were co-transformed into yeast with a plasmid expressing the Gal4-

AD-DM fusion protein. Following growth on selective medium (-/Leu/-Ura), lacZ expression was evaluated in extracts using a colorimetric β-galactosidase assay (see

Materials and Methods). A photomicrograph of a representative selective plate and

summary of β-galactosidase assays are given are provided in Figure S1-C and S1-D,

respectively. Mutations in either DNA site (dsxA or dsxB) impair DSX-dependent

expression of lacZ by three-to-five fold; simultaneous mutations in both sites essentially 294

eliminate expression. The results of β-galactosidase assays thus suggest that the two binding sites function in an additive manner.

(ii) Mutations in the Zn-binding site of the DM domain that block metal-dependent protein folding and specific DNA binding in vitro also block expression of β- galactosidase. Previous studies have demonstrated that specific DNA binding activity of the DSX DM domain in vitro requires zinc binding and is blocked by mutations in the invariant cysteines and histidines (Erdman and Burtis 1993; Zhu et al. 2000). Three such mutations (H50Y, H59Y, and C70Y) are associated with an intersexual phenotype, further suggesting that folding of the Zn module is required in vivo for sex-specific gene regulation (Erdman and Burtis 1993). To test whether the Y1H system also requires native zinc binding, we constructed the substitutions in AD-DM fusion protein; no detectable β-galactosidase activity was observed.

(iii) Mutations in the tail of the DM domain that block specific DNA binding in vitro also

block expression of β-galactosidase. As described in the main text, an intersexual

mutation has also been identified in the C-terminal tail of the DM domain (R91Q).

Previous studies have demonstrated that this mutation impairs specific DNA binding by

at least 100-fold (Erdman and Burtis 1993; Zhu et al. 2000). The corresponding

substitution in the AD-DM fusion protein leads to loss of detectable β-galactosidase

activity in the Y1H system. Similarly, a broad correlation between extent of impaired

DNA binding and Y1H transcriptional activation is obtained by analysis of a set of

alanine scanning substitutions previously characterized in the tail (Narendra et al. 2002). 295

Whereas substitutions R74A and R99A affect neither specific DNA binding nor Y1H activity, substitutions R79A and R90 exhibit significant impairment in both assays.

Together, control studies of mutations in the Zn-binding site, at the surface of the Zn module, and in the tail suggest the Y1H colorimetric assay correlates with the relative specific DNA-binding activities of substitutions throughout the DM motif.

296

APPENDIX IX

This appendix derives the formalism for residual energy in dimeric systems under 2-state unfolding.

F2 2U (1) [U]2 K = (2) [F2]

Let Fd equal fraction of unfolded monomer.

Concentration of dimer:

[F2] = 0.5Ct(1-Fd)(3)

where Ct is the total concentration in monomer equivalents

Concentration of monomer:

[U] = FdCt (4)

Substituting eq. 3 and 4 into 2:

(F C )2 K = d t (5) 0.5Ct(1-Fd)

At equilibrium, Fd = 0.5:

2 0.25Ct Kd = (6) 0.25Ct

Kd = Ct (7)

Therefore the residual free energy at equilibrium is given by:

ΔG = -RTln(Ct) (8) res

References: Backmann et al. 1998. J. Mol. Biol. 284, 817-833. Neira et al. 2001. Eur. J. Biochem. 268, 4868-4877

297

APPENDIX X

This appendix contains tables of salt bridge and hydrogen bonding interactions. Residues

are listed by order in the structure file and residue number in paranthesis. Residue

number is based on the construct used for crystallization, and is related to numbering in the intact protein by i+347.

Salt bridges in CTDF-p 1 6 ASP ( 11 ) A OD1 - 10 LYS ( 15 ) A A NZ 5.93 2 6 ASP ( 11 ) A OD2 - 10 LYS ( 15 ) A A NZ 6.43 3 13 GLU ( 18 ) A OE1 - 10 LYS ( 15 ) A A NZ 3.93 4 13 GLU ( 18 ) A OE2 - 10 LYS ( 15 ) A A NZ 5.74 5 31 ASP ( 36 ) A OD1 - 42 ARG ( 47 ) A NH1 6.48 6 31 ASP ( 36 ) A OD2 - 42 ARG ( 47 ) A NH1 5.64 7 31 ASP ( 36 ) A OD1 - 101 ARG ( 47 ) B NH1 3.42 8 31 ASP ( 36 ) A OD1 - 101 ARG ( 47 ) B NH2 4.95 9 31 ASP ( 36 ) A OD2 - 101 ARG ( 47 ) B NH1 3.88 10 31 ASP ( 36 ) A OD2 - 101 ARG ( 47 ) B NH2 6.02 11 33 ASP ( 38 ) A OD2 - 42 ARG ( 47 ) A NH2 6.83 12 38 GLU ( 43 ) A OE1 - 41 ARG ( 46 ) A NH1 4.71 13 38 GLU ( 43 ) A OE1 - 41 ARG ( 46 ) A NH2 3.84 14 38 GLU ( 43 ) A OE2 - 41 ARG ( 46 ) A NH1 4.42 15 38 GLU ( 43 ) A OE2 - 41 ARG ( 46 ) A NH2 2.58 16 38 GLU ( 43 ) A OE1 - 42 ARG ( 47 ) A NH1 6.66 17 38 GLU ( 43 ) A OE1 - 42 ARG ( 47 ) A NH2 5.50 18 38 GLU ( 43 ) A OE2 - 42 ARG ( 47 ) A NH1 4.96 19 38 GLU ( 43 ) A OE2 - 42 ARG ( 47 ) A NH2 3.74 20 44 GLU ( 49 ) A OE1 - 14 LYS ( 19 ) A NZ 5.36 21 44 GLU ( 49 ) A OE2 - 14 LYS ( 19 ) A NZ 3.52 22 45 GLU ( 50 ) A OE1 - 41 ARG ( 46 ) A NH1 4.43 23 45 GLU ( 50 ) A OE1 - 41 ARG ( 46 ) A NH2 4.51 24 45 GLU ( 50 ) A OE2 - 41 ARG ( 46 ) A NH1 6.47 25 45 GLU ( 50 ) A OE2 - 41 ARG ( 46 ) A NH2 6.12 26 45 GLU ( 50 ) A OE1 - 89 LYS ( 35 ) B NZ 3.81 27 45 GLU ( 50 ) A OE2 - 89 LYS ( 35 ) B NZ 2.62 28 90 ASP ( 36 ) B OD1 - 42 ARG ( 47 ) A NH1 3.51 29 90 ASP ( 36 ) B OD1 - 42 ARG ( 47 ) A NH2 4.95 30 90 ASP ( 36 ) B OD2 - 42 ARG ( 47 ) A NH1 3.85 31 90 ASP ( 36 ) B OD2 - 42 ARG ( 47 ) A NH2 5.94 32 90 ASP ( 36 ) B OD1 - 101 ARG ( 47 ) B NH1 6.51 33 90 ASP ( 36 ) B OD2 - 101 ARG ( 47 ) B NH1 5.60 34 92 ASP ( 38 ) B OD2 - 101 ARG ( 47 ) B NH2 6.72 35 96 GLU ( 42 ) B OE2 - 73 LYS ( 19 ) B NZ 6.96 36 97 GLU ( 43 ) B OE1 - 100 ARG ( 46 ) B NH1 5.24 37 97 GLU ( 43 ) B OE1 - 100 ARG ( 46 ) B NH2 4.25 38 97 GLU ( 43 ) B OE2 - 100 ARG ( 46 ) B NH1 4.64 39 97 GLU ( 43 ) B OE2 - 100 ARG ( 46 ) B NH2 2.84 40 97 GLU ( 43 ) B OE1 - 101 ARG ( 47 ) B NH1 6.67 41 97 GLU ( 43 ) B OE1 - 101 ARG ( 47 ) B NH2 5.44 42 97 GLU ( 43 ) B OE2 - 101 ARG ( 47 ) B NH1 4.91 298

43 97 GLU ( 43 ) B OE2 - 101 ARG ( 47 ) B NH2 3.62 44 103 GLU ( 49 ) B OE1 - 73 LYS ( 19 ) B NZ 6.75 45 103 GLU ( 49 ) B OE2 - 73 LYS ( 19 ) B NZ 5.28 46 104 GLU ( 50 ) B OE1 - 30 LYS ( 35 ) A NZ 3.52 47 104 GLU ( 50 ) B OE2 - 30 LYS ( 35 ) A NZ 2.65 48 104 GLU ( 50 ) B OE1 - 100 ARG ( 46 ) B NH1 3.52 49 104 GLU ( 50 ) B OE1 - 100 ARG ( 46 ) B NH2 4.21 50 104 GLU ( 50 ) B OE2 - 100 ARG ( 46 ) B NH1 5.54 51 104 GLU ( 50 ) B OE2 - 100 ARG ( 46 ) B NH2 5.67

Hydrogen bonding network

LEU ( 10 ) A N -> GLN ( 6 ) A O Sym= 1 Val= 0.640 DA= 3.08 DHA= 22.55 ASP ( 11 ) A N -> ASP ( 7 ) A O Sym= 1 Val= 0.645 DA= 3.11 DHA= 24.29 TYR ( 12 ) A N -> VAL ( 8 ) A O Sym= 1 Val= 0.639 DA= 2.98 DHA= 23.35 CYS ( 13 ) A N -> PHE ( 9 ) A O Sym= 1 Val= 0.671 DA= 2.94 DHA= 24.70 GLN ( 14 ) A N -> LEU ( 10 ) A O Sym= 1 Val= 0.664 DA= 2.93 DHA= 23.38 GLN ( 14 ) A NE2 -> GLU ( 18 ) A OE2 Sym= 1 Val= 0.242 DA= 3.39 DHA= 51.40 LYS ( 15 ) A N -> ASP ( 11 ) A O Sym= 1 Val= 0.619 DA= 3.01 DHA= 26.16 LEU ( 16 ) A N -> TYR ( 12 ) A O Sym= 1 Val= 0.617 DA= 2.98 DHA= 25.77 LEU ( 17 ) A N -> CYS ( 13 ) A O Sym= 1 Val= 0.673 DA= 2.91 DHA= 21.92 GLU ( 18 ) A N -> GLN ( 14 ) A O Sym= 1 Val= 0.591 DA= 2.95 DHA= 25.71 LYS ( 19 ) A N -> LYS ( 15 ) A O Sym= 1 Val= 0.591 DA= 2.98 DHA= 27.99 LYS ( 19 ) A NZ -> SER ( 45 ) A OG Sym= 1 Val= 0.334 DA= 3.47 DHA= 37.71 LYS ( 19 ) A NZ -> GLU ( 49 ) A OE2 Sym= 1 Val= 0.433 DA= 3.52 DHA= 25.30 PHE ( 20 ) A N -> LEU ( 16 ) A O Sym= 1 Val= 0.489 DA= 3.31 DHA= 26.36 ARG ( 21 ) A N -> GLU ( 18 ) A O Sym= 1 Val= 0.298 DA= 3.06 DHA= 28.87 ARG ( 21 ) A N -> LEU ( 17 ) A O Sym= 1 Val= 0.250 DA= 3.10 DHA= 77.51 ARG ( 21 ) A NH1 -> GLU ( 18 ) A O Sym= 1 Val= 0.489 DA= 2.62 DHA= 33.43 TYR ( 22 ) A N -> LEU ( 17 ) A O Sym= 1 Val= 0.673 DA= 2.96 DHA= 17.19 LEU ( 26 ) A N -> PRO ( 23 ) A O Sym= 1 Val= 0.725 DA= 2.91 DHA= 15.38 MET ( 27 ) A N -> TRP ( 24 ) A O Sym= 1 Val= 0.712 DA= 2.85 DHA= 14.95 LEU ( 29 ) A N -> LEU ( 26 ) A O Sym= 1 Val= 0.375 DA= 2.86 DHA= 52.16 MET ( 30 ) A N -> LEU ( 26 ) A O Sym= 1 Val= 0.715 DA= 3.12 DHA= 18.61 MET ( 30 ) A N -> MET ( 27 ) A O Sym= 1 Val= 0.002 DA= 3.07 DHA= 71.13 TYR ( 31 ) A N -> MET ( 27 ) A O Sym= 1 Val= 0.687 DA= 2.87 DHA= 17.12 TYR ( 31 ) A OH -> GLU ( 50 ) B OE2 Sym= 1 Val= 0.870 DA= 2.74 DHA= 9.75 VAL ( 32 ) A N -> PRO ( 28 ) A O Sym= 1 Val= 0.597 DA= 3.11 DHA= 25.73 ILE ( 33 ) A N -> LEU ( 29 ) A O Sym= 1 Val= 0.780 DA= 2.99 DHA= 9.43 LEU ( 34 ) A N -> MET ( 30 ) A O Sym= 1 Val= 0.702 DA= 2.88 DHA= 10.85 LYS ( 35 ) A N -> TYR ( 31 ) A O Sym= 1 Val= 0.633 DA= 2.85 DHA= 24.36 LYS ( 35 ) A NZ -> GLU ( 50 ) B OE2 Sym= 1 Val= 0.361 DA= 2.65 DHA= 77.21 LYS ( 35 ) A NZ -> GLU ( 50 ) B OE2 Sym= 1 Val= 0.550 DA= 2.65 DHA= 66.85 ASP ( 36 ) A N -> VAL ( 32 ) A O Sym= 1 Val= 0.711 DA= 2.90 DHA= 14.73 ALA ( 37 ) A N -> ILE ( 33 ) A O Sym= 1 Val= 0.430 DA= 3.14 DHA= 45.08 ASP ( 38 ) A N -> LYS ( 35 ) A O Sym= 1 Val= 0.534 DA= 2.96 DHA= 26.76 ALA ( 39 ) A N -> LEU ( 34 ) A O Sym= 1 Val= 0.696 DA= 2.81 DHA= 13.86 ASN ( 40 ) A N -> ALA ( 37 ) A O Sym= 1 Val= 0.520 DA= 2.97 DHA= 31.79 GLU ( 42 ) A N -> ASN ( 40 ) A OD1 Sym= 1 Val= 0.220 DA= 3.21 DHA= 71.77 GLU ( 42 ) A N -> GLU ( 42 ) A OE1 Sym= 1 Val= 0.184 DA= 2.79 DHA= 41.27 299

GLU ( 43 ) A N -> ASN ( 40 ) A OD1 Sym= 1 Val= 0.627 DA= 3.03 DHA= 11.17 ALA ( 44 ) A N -> ASN ( 40 ) A O Sym= 1 Val= 0.747 DA= 2.88 DHA= 13.41 SER ( 45 ) A N -> ILE ( 41 ) A O Sym= 1 Val= 0.687 DA= 2.90 DHA= 16.79 SER ( 45 ) A OG -> ILE ( 41 ) A O Sym= 1 Val= 0.241 DA= 3.08 DHA= 38.03 ARG ( 46 ) A N -> GLU ( 42 ) A O Sym= 1 Val= 0.719 DA= 2.98 DHA= 17.09 ARG ( 46 ) A NH2 -> GLU ( 43 ) A OE2 Sym= 1 Val= 0.461 DA= 2.58 DHA= 47.62 ARG ( 47 ) A N -> GLU ( 43 ) A O Sym= 1 Val= 0.664 DA= 2.94 DHA= 23.71 ARG ( 47 ) A NE -> GLU ( 43 ) A OE2 Sym= 1 Val= 0.751 DA= 2.89 DHA= 31.15 ILE ( 48 ) A N -> ALA ( 44 ) A O Sym= 1 Val= 0.665 DA= 3.05 DHA= 21.24 GLU ( 49 ) A N -> SER ( 45 ) A O Sym= 1 Val= 0.696 DA= 2.89 DHA= 16.24 GLU ( 50 ) A N -> ARG ( 46 ) A O Sym= 1 Val= 0.671 DA= 2.89 DHA= 23.57 GLY ( 51 ) A N -> ARG ( 47 ) A O Sym= 1 Val= 0.765 DA= 3.00 DHA= 13.86 GLN ( 52 ) A N -> ILE ( 48 ) A O Sym= 1 Val= 0.577 DA= 2.93 DHA= 33.44 TYR ( 53 ) A N -> GLU ( 49 ) A O Sym= 1 Val= 0.742 DA= 2.92 DHA= 14.08 VAL ( 54 ) A N -> GLU ( 50 ) A O Sym= 1 Val= 0.773 DA= 2.92 DHA= 2.28 VAL ( 55 ) A N -> GLY ( 51 ) A O Sym= 1 Val= 0.675 DA= 2.99 DHA= 16.21 ASN ( 56 ) A N -> GLN ( 52 ) A O Sym= 1 Val= 0.652 DA= 3.01 DHA= 19.56 GLU ( 57 ) A N -> TYR ( 53 ) A O Sym= 1 Val= 0.629 DA= 2.98 DHA= 22.63 TYR ( 58 ) A N -> VAL ( 54 ) A O Sym= 1 Val= 0.688 DA= 3.01 DHA= 17.54 SER ( 59 ) A N -> VAL ( 55 ) A O Sym= 1 Val= 0.658 DA= 3.04 DHA= 22.73 SER ( 59 ) A OG -> VAL ( 55 ) A O Sym= 1 Val= 0.404 DA= 2.95 DHA= 33.03 ALA ( 60 ) A N -> ASN ( 56 ) A O Sym= 1 Val= 0.811 DA= 2.99 DHA= 8.38 ALA ( 61 ) A N -> GLU ( 57 ) A O Sym= 1 Val= 0.556 DA= 2.87 DHA= 25.47 ALA ( 62 ) A N -> SER ( 59 ) A O Sym= 1 Val= 0.534 DA= 2.99 DHA= 27.50 ALA ( 63 ) A N -> ALA ( 60 ) A O Sym= 1 Val= 0.326 DA= 3.67 DHA= 15.59 ALA ( 64 ) A N -> SER ( 59 ) A O Sym= 1 Val= 0.193 DA= 3.19 DHA= 66.57 LEU ( 10 ) B N -> GLN ( 6 ) B O Sym= 1 Val= 0.665 DA= 2.99 DHA= 22.46 ASP ( 11 ) B N -> ASP ( 7 ) B O Sym= 1 Val= 0.518 DA= 3.16 DHA= 32.34 TYR ( 12 ) B N -> VAL ( 8 ) B O Sym= 1 Val= 0.601 DA= 2.97 DHA= 27.96 TYR ( 12 ) B N -> PHE ( 9 ) B O Sym= 1 Val= 0.019 DA= 3.05 DHA= 65.02 CYS ( 13 ) B N -> PHE ( 9 ) B O Sym= 1 Val= 0.601 DA= 2.92 DHA= 29.59 GLN ( 14 ) B N -> LEU ( 10 ) B O Sym= 1 Val= 0.638 DA= 2.97 DHA= 22.11 GLN ( 14 ) B NE2 -> GLU ( 18 ) B OE2 Sym= 1 Val= 0.047 DA= 3.31 DHA= 71.21 LYS ( 15 ) B N -> ASP ( 11 ) B O Sym= 1 Val= 0.618 DA= 2.93 DHA= 27.91 LEU ( 16 ) B N -> TYR ( 12 ) B O Sym= 1 Val= 0.671 DA= 2.91 DHA= 22.44 LEU ( 17 ) B N -> CYS ( 13 ) B O Sym= 1 Val= 0.723 DA= 2.89 DHA= 17.25 GLU ( 18 ) B N -> GLN ( 14 ) B O Sym= 1 Val= 0.575 DA= 3.01 DHA= 26.05 LYS ( 19 ) B N -> LYS ( 15 ) B O Sym= 1 Val= 0.591 DA= 3.01 DHA= 27.02 LYS ( 19 ) B NZ -> SER ( 45 ) B OG Sym= 1 Val= 0.640 DA= 2.95 DHA= 39.71 PHE ( 20 ) B N -> LEU ( 16 ) B O Sym= 1 Val= 0.517 DA= 3.27 DHA= 28.99 PHE ( 20 ) B N -> LEU ( 17 ) B O Sym= 1 Val= 0.011 DA= 3.36 DHA= 52.49 ARG ( 21 ) B N -> LEU ( 17 ) B O Sym= 1 Val= 0.306 DA= 3.06 DHA= 74.09 ARG ( 21 ) B N -> GLU ( 18 ) B O Sym= 1 Val= 0.219 DA= 3.13 DHA= 32.35 ARG ( 21 ) B NH1 -> GLU ( 18 ) B O Sym= 1 Val= 0.584 DA= 3.05 DHA= 32.15 TYR ( 22 ) B N -> LEU ( 17 ) B O Sym= 1 Val= 0.736 DA= 3.01 DHA= 10.52 TRP ( 24 ) B NE1 -> SER ( 59 ) A OG Sym= 1 Val= 0.187 DA= 3.74 DHA= 34.11 LEU ( 26 ) B N -> PRO ( 23 ) B O Sym= 1 Val= 0.785 DA= 2.92 DHA= 14.70 MET ( 27 ) B N -> TRP ( 24 ) B O Sym= 1 Val= 0.653 DA= 2.87 DHA= 17.62 LEU ( 29 ) B N -> LEU ( 26 ) B O Sym= 1 Val= 0.399 DA= 2.84 DHA= 50.86 MET ( 30 ) B N -> LEU ( 26 ) B O Sym= 1 Val= 0.732 DA= 3.12 DHA= 18.36 300

TYR ( 31 ) B N -> MET ( 27 ) B O Sym= 1 Val= 0.658 DA= 2.86 DHA= 19.83 TYR ( 31 ) B OH -> GLU ( 50 ) A OE2 Sym= 1 Val= 0.872 DA= 2.72 DHA= 9.18 VAL ( 32 ) B N -> PRO ( 28 ) B O Sym= 1 Val= 0.592 DA= 3.15 DHA= 21.26 ILE ( 33 ) B N -> LEU ( 29 ) B O Sym= 1 Val= 0.736 DA= 3.01 DHA= 11.58 LEU ( 34 ) B N -> MET ( 30 ) B O Sym= 1 Val= 0.648 DA= 2.85 DHA= 16.98 LYS ( 35 ) B N -> TYR ( 31 ) B O Sym= 1 Val= 0.618 DA= 2.81 DHA= 24.85 LYS ( 35 ) B NZ -> GLU ( 50 ) A OE2 Sym= 1 Val= 0.575 DA= 2.62 DHA= 48.73 LYS ( 35 ) B NZ -> GLU ( 50 ) A OE2 Sym= 1 Val= 0.395 DA= 2.62 DHA= 89.07 ASP ( 36 ) B N -> VAL ( 32 ) B O Sym= 1 Val= 0.717 DA= 2.87 DHA= 18.26 ALA ( 37 ) B N -> ILE ( 33 ) B O Sym= 1 Val= 0.467 DA= 3.06 DHA= 45.20 ASP ( 38 ) B N -> LYS ( 35 ) B O Sym= 1 Val= 0.568 DA= 2.85 DHA= 26.54 ALA ( 39 ) B N -> LEU ( 34 ) B O Sym= 1 Val= 0.654 DA= 2.81 DHA= 16.78 ASN ( 40 ) B N -> ALA ( 37 ) B O Sym= 1 Val= 0.562 DA= 2.98 DHA= 26.08 ILE ( 41 ) B N -> ALA ( 39 ) B O Sym= 1 Val= 0.008 DA= 3.35 DHA= 68.97 GLU ( 42 ) B N -> GLU ( 42 ) B OE1 Sym= 1 Val= 0.474 DA= 2.68 DHA= 38.14 GLU ( 42 ) B N -> ASN ( 40 ) B OD1 Sym= 1 Val= 0.207 DA= 3.24 DHA= 70.07 GLU ( 43 ) B N -> ASN ( 40 ) B OD1 Sym= 1 Val= 0.585 DA= 3.09 DHA= 21.44 GLU ( 43 ) B N -> ASN ( 40 ) B O Sym= 1 Val= 0.090 DA= 3.16 DHA= 61.32 ALA ( 44 ) B N -> ASN ( 40 ) B O Sym= 1 Val= 0.712 DA= 2.92 DHA= 17.94 SER ( 45 ) B N -> ILE ( 41 ) B O Sym= 1 Val= 0.633 DA= 2.95 DHA= 20.52 SER ( 45 ) B OG -> ILE ( 41 ) B O Sym= 1 Val= 0.316 DA= 2.98 DHA= 39.03 ARG ( 46 ) B N -> GLU ( 42 ) B O Sym= 1 Val= 0.716 DA= 3.01 DHA= 15.29 ARG ( 46 ) B NH2 -> GLU ( 43 ) B OE2 Sym= 1 Val= 0.500 DA= 2.84 DHA= 45.88 ARG ( 47 ) B N -> GLU ( 43 ) B O Sym= 1 Val= 0.575 DA= 2.90 DHA= 32.37 ARG ( 47 ) B NE -> GLU ( 43 ) B OE2 Sym= 1 Val= 0.688 DA= 2.88 DHA= 34.81 ILE ( 48 ) B N -> ALA ( 44 ) B O Sym= 1 Val= 0.690 DA= 3.00 DHA= 19.72 GLU ( 49 ) B N -> SER ( 45 ) B O Sym= 1 Val= 0.761 DA= 2.87 DHA= 8.22 GLU ( 50 ) B N -> ARG ( 46 ) B O Sym= 1 Val= 0.643 DA= 2.91 DHA= 22.42 GLY ( 51 ) B N -> ARG ( 47 ) B O Sym= 1 Val= 0.667 DA= 3.01 DHA= 20.54 GLN ( 52 ) B N -> ILE ( 48 ) B O Sym= 1 Val= 0.635 DA= 2.90 DHA= 23.00 TYR ( 53 ) B N -> GLU ( 49 ) B O Sym= 1 Val= 0.734 DA= 2.89 DHA= 7.01 VAL ( 54 ) B N -> GLU ( 50 ) B O Sym= 1 Val= 0.717 DA= 2.90 DHA= 10.69 VAL ( 55 ) B N -> GLY ( 51 ) B O Sym= 1 Val= 0.734 DA= 2.97 DHA= 13.53 ASN ( 56 ) B N -> GLN ( 52 ) B O Sym= 1 Val= 0.652 DA= 3.00 DHA= 19.98 GLU ( 57 ) B N -> TYR ( 53 ) B O Sym= 1 Val= 0.669 DA= 2.90 DHA= 19.98 TYR ( 58 ) B N -> VAL ( 54 ) B O Sym= 1 Val= 0.694 DA= 2.96 DHA= 18.40 TYR ( 58 ) B OH -> ASP ( 7 ) A OD1 Sym= 1 Val= 0.464 DA= 3.13 DHA= 31.25 SER ( 59 ) B N -> VAL ( 55 ) B O Sym= 1 Val= 0.665 DA= 3.00 DHA= 24.94 SER ( 59 ) B OG -> VAL ( 55 ) B O Sym= 1 Val= 0.384 DA= 2.83 DHA= 39.20 ALA ( 60 ) B N -> ASN ( 56 ) B O Sym= 1 Val= 0.782 DA= 2.93 DHA= 7.13 ALA ( 61 ) B N -> GLU ( 57 ) B O Sym= 1 Val= 0.581 DA= 2.92 DHA= 26.03 ALA ( 62 ) B N -> SER ( 59 ) B O Sym= 1 Val= 0.534 DA= 2.99 DHA= 27.40 ALA ( 63 ) B N -> ALA ( 60 ) B O Sym= 1 Val= 0.326 DA= 3.70 DHA= 17.23 ALA ( 64 ) B N -> SER ( 59 ) B O Sym= 1 Val= 0.207 DA= 3.16 DHA= 66.22

301

APPENDIX XI

Dimerization-Coupled Folding of a Sex-Specific UBA Domain in a Transcription Factor

This appendix presents the manuscript in preparation describing our hydrogen-exchange

experiments (Bayrer et al. 2005a).

INTRODUCTION

The Doublesex (DSX) transcription factor is a major regulator of somatic sexual

differentiation in Drosophila melanogaster (Cline and Meyer 1996). Male- and female

isoforms (DSXF and DSXM) are produced as a consequence of sex-specific splicing

(Burtis and Baker 1989). The isoforms share N-terminal DNA-binding domains (the DM motif; (Raymond et al. 1998) and C-terminal α-helical dimerization elements (An et al.

1996; Erdman et al. 1996). Distinct C-terminal tails confer sex-specific gene-regulatory

functions (An and Wensink 1995a). The crystal structure of a C-terminal fragment of

DSXF (designated CTDF-p) has recently been determined and, surprisingly, revealed a

dimeric ubiquitin-associated (UBA) domain fold (Fig. 1; Bayrer et al. 2005c). These

studies thus extend to transcription a structural motif widely observed among pathways of

DNA repair and protein trafficking (for review, see (Buchberger 2002). Although

dimerization of UBA domains has been implicated in regulation of these processes

(Davies et al. 2004), the crystal structure of the DSXF domain provided the first example

of a dimeric UBA fold (Bayrer et al. 2005c).2 The presence of a classical UBA domain in DSX had not been predicted due to its divergent sequence (Bayrer et al. 2005c).3 302

UBA-mediated dimerization of a transcription factor suggests that this motif – like the

leucine zipper and helix-loop-helix (Pabo and Sauer 1992) – may provide a structural

mechanism of combinatorial gene regulation.

The leucine zipper (LZ) and helix-loop-helix motif (HLH) exhibit dimerization-coupled

folding (Weiss et al. 1990; Ferre-D'Amare et al. 1994). Whereas the monomeric

polypeptide is unfolded, a coil-to-helix transition is stabilized on formation of an extensive non-polar interface. For the LZ this consists of classical “knobs-in-holes” packing of a coiled coil (O'Shea et al. 1991); the HLH domain forms a globular four- helix bundle (Ferre-D'Amare et al. 1993; Hollis et al. 2000). Although the dimeric UBA-

like domain in CTDF-p represents a distinct architecture – a non-classical four-helix

bundle extended by flanking α-helical “handles” (Fig. 1) – its analogous function as a

modular dimerization element in a transcription factor raised the possibility of analogous

dimerization-dependent folding. Such a finding would be novel among UBA domains --

previously characterized as autonomous monomeric structures (Dieckmann et al. 1998) -- and could broaden the definition of potential UBA-like sequences.

Experimental approaches ordinarily employed to demonstrate dimerization-dependent protein folding either (i) require explicit observation of the physical state of the monomeric polypeptide (Weiss et al. 1990) or, if this is not possible due to the strength of dimerization, (ii) exploit reversible equilibrium thermal unfolding as an indirect probe of the unfolded monomer (De Francesco et al. 1991). Unfortunately, neither of these approaches is feasible here as the dimerization constant of CTPF-p is subnanomolar and 303

its thermal unfolding leads to irreversible aggregation (Supplemental Material). To

circumvent these limitations, we have therefore adapted the hydrogen-exchange (H-X)

method to investigate the possible relationship between global unfolding and amide

proton exchange at sites in the dimer interface. The essential idea is to distinguish

between two models of H-X exchange (Fig. 2A). Dimerization of prefolded UBA

monomers (model 1) predicts that global exchange, if present, would be limited to the

core of the monomer; sites exposed on its surface would be expected to exhibit subglobal

exchange irrespective of environment in the dimer. Conversely, dimerization-coupled folding of a monomeric polypeptide (model 2) predicts that such an “H-X exchange core”

(Li and Woodward 1999), if present, might span both the interior of the protomers and

their interface.4 This approach has previously been employed in studies of a peptide

model (Clore et al. 1995) of the p53 tetramer (Neira and Mateu 2001).

Thermodynamic interpretation of H-X exchange requires independent assessment of protein stability and solvent isotope effects (Makhatadze et al. 1995). Accordingly, the

baseline stability of CTDF-p was investigated by guanidine unfolding studies as

monitored by circular dichroism (CD). Such isothermal folding and unfolding are

reversible; the final state is the unfolded monomer. Thermodynamic parameters inferred

from application of a two-state model are given in Table 1A. Values of ΔGu are extrapolated to zero guanidine concentration under conditions (10 mM Tris-HCl (pH or

o pD 7.0) and 50 mM NaCl at 30 C) otherwise compatible with NMR studies. In H2O ΔGu is estimated to be 14.1 ± 0.1 kcal/mole whereas in D2O ΔGu is estimated to be 12.6 ± 0.1

kcal/mole. In the limit of dimerization-coupled folding, these values would correspond to 304

dissociation constants of 0.1 nM and 1.0 nM, respectively, similar to previous estimates of the dimerization constant of the intact protein isoforms (Cho and Wensink 1997). The

magnitude of the solvent isotope effect is consistent with previous studies of unrelated

proteins (Makhatadze et al. 1995). The appropriateness of the two-state formalism is

validated by its consistency with the H-X studies below.

H-X exchange rates in CTDF-p were monitored by 1H-15N heteronuclear single-quantum

coherence (HSQC) NMR spectroscopy in a sample uniformly labeled with 15N. The spectrum of the dimeric domain exhibits marked dispersion (Fig. 2B); complete resonance assignment was obtained by 1H-15N-13C triple-resonance methods (Gronenborn and Clore 1995). As expected, significant protection is observed only in ordered secondary structure moieties. Eight amide resonances are observed (Fig. 2C) whose

5 protection factors correspond to site-specific ΔGapp values > 12.5 kcal/mole (Table 1B).

Consistency between estimates of stability obtained by guandine-induced unfolding

(above) and native-state H-X exchange – remarkable in light of their different underlying assumptions (Huyghues-Despointes et al. 2001) -- strongly suggests that such exchange is mediated by global unfolding. Significantly, the eight long-lived amide resonances map predominantly to the core of the protomer and the dimer interface (red and blue in Fig.

3A).6 (i) Protomeric mini-core: M377 projects from α2 into the protomer mini-core

whereas S392 forms a side chain-to-side chain hydrogen bond with K366, sealing the

mini-core. (ii) Dimer interface. Y378, V379, and R394' participate in contacts between

α2 and α3'. I380 (α2; purple in Fig. 3A) contributes both to the protomeric mini-core and

the dimer interface (Bayrer et al. 2005c). The existence of an H-X core spanning the 305

dimer interface supports model 2 and is inconsistent with model 1. Additional evidence

is provided by comparative studies of a variant domain of enhanced stability.

Substitution G398A (α3) is associated with more efficient apparent dimerization in a

yeast two-hybrid system (Bayrer et al. 2005b).7 Corresponding guanidine-induced

unfolding studies yield an estimate of ΔΔGu of 0.6 ± 0.1 kcal/mole relative to the wild-

type domain. Such increased stability predicts a commensurate increase in H-X

protection at sites of global exchange. This indeed observed: ΔΔGHX is 0.4 ± 0.2

kcal/mole. Native-state H-X analysis thus supports the appropriateness of the two-state

model and highlight the central role of the dimer interface to overall folding.

The DSX C-terminal domains enhance the stringency and affinity of DSX-DNA

recognition in vitro (Cho and Wensink 1998). The biological importance of CTDF is

demonstrated in vivo by the identification of a mutation in this domain (G398D)

associated with intersexual Drosophila development (Nothiger et al. 1987); (Erdman et

al. 1996). The mutation is located at the dimer interface and would be expected to insert

an uncompensated negative charge into its hydrophobic interior. Recent studies of the

variant polypeptide have demonstrated misfolding and aggregation (Bayrer et al. 2005b).

Monomeric and dimeric species are not observed at polypeptide concentrations of 10-200

μM; the variant instead forms a heterogeneous ensemble of larger oligomers with

reduced α-helix content and non-native β-sheet. An obligatory coupling between native folding and dimerization rationalizes such perturbations. Should a stably folded UBA- like monomer be populated, one would expect a predominance of native-like G398D monomers as the mutant side chain would be predicted to project into solvent (Fig. 3B). 306

Protein structures are ordinarily robust to such hydrophilic surface substitutions. Because in this case a monomeric UBA-like fold is unstable, a mutational block to dimerization leads to non-native aggregation. The developmental consequences of mutation associated with unfolding or misfolding would be more severe than those associated with an uncomplicated decrement in the strength of dimerization, especially at protein concentrations above the wild-type or variant dissociation constants (Cho and Wensink

1997). The intersexual phenotype of the 398D XX fly (karyotypic female) is essentially identical to those conferred by mutations in the DM domain that block its specific DNA binding (Erdman and Burtis 1993) and by deletion or frame shift within the dsx gene

(Erdman and Burtis 1993); i.e., complete loss of genetic function.

Assembly-dependent folding of proteins is a general feature of transcription factors (Pabo and Sauer 1992). Such coupling is proposed to provide a mechanism to regulate the assembly of multiprotein-DNA complexes to specific sites of transcriptional activation or repression. Although the subnanomolar stability of DSX dimers (Cho and Wensink

1997) suggests that the isoforms would be dimeric and hence folded within the nucleus

(10-100 protein molecules per eukaryotic nucleus corresponds to an approximate concentration of 10-100 nM), we envisage that this principle may extend to dimerization- coupled folding of UBA-like domains. We speculate that such domains may serve to couple assembly of a transcriptional preinitiation complex to recruitment of mono- or polyubiquitinated factors. Deciphering the regulatory role of the ubiquitination machinery in gene regulation poses a problem of broad importance in molecular biology

(Muratani et al. 2005); (Arndt and Winston 2005). 307

Acknowledgements. We thank B. Baker for advice and encouragement, T. Sosnick and

P. von Hippel for helpful discussion, B. Li for plasmid construction; and W. Jia, Q. X.

Hua, Z.-l. Wan, W. Zhang, and other Weiss laboratory members for helpful discussion.

This is a contribution from the Cleveland Center for Structural Biology. 308

Table AXI-1

Thermodynamic and H-X Parameters

(a) CD Characterizationa ΔGu (kcal/mole) Cmid (M) m (kcal/mole/M) 12.6 ± 0.1 2.7 ± 0.06 1.8 ± 0.04

(b) H-X Coreb ASA ASA ΔGapp proto- mono- -5 -1 5 -6 Site Acceptor kex x 10 s P x 10 Keq x 10 (kcal/mol) mer mer M377c L373 0.74 ± 0.2 3.5 ± 0.9 2.8 ± 0.7 12.9 ± 0.1 0.02 0 Y378d M374 1 ± 0.3 3.0 ± 1 3.3 ± 1 12.8 ± 0.2 0.31 0.06 V379d P375 0.48 ± 0.1 2.0 ± 0.6 4.9 ± 1 12.6 ± 0.2 0.55 0 I380c, d L376 0.16 ± 0.06 3.6 ± 1 2.8 ± 1 12.9 ± 0.2 0.28 0 S392c I388 5.0 ± 2 2.0 ± 0.8 4.9 ± 2 12.6 ± 0.2 0.03 0.03 R393 E389 5.0 ± 1 2.1 ± 0.4 4.8 ± 1 12.6 ± 0.1 0.63 0.6 R394d E390 2.0 ± 1 4.3 ± 2 2.3 ± 1 13.0 ± 0.3 0.35 0.12 N403 E399 4.0 ± 0.8 2.4 ± 0.5 4.1 ± 0.8 12.7 ± 0.1 0.83 0.8 aData were fit to a two-state unfolding model as described (Bayrer et al. 2005b). bH-X

studies were performed as described in caption to Figure 2. “Acceptor” indicates hydro-

gen-bonding partner of amide proton in crystal structure. Accessible surface area (ASA)

of side chains was calculated relative to XX-A-XX peptide; values represent average of

protomers. Exchange data were calculated by fitting peak intensities to a standard decay

curve (Huyghues-Despointes et al. 2001). Exchange rates (kex) were determined

according to EX2 regime based on pH dependence and wide range of observed exchange

times (Huyghues-Despointes et al. 2001; Neira and Mateu 2001). Protection factors (P) were calculated based on standard random-coil values

(http://www.fccc.edu/research/labs/roder/sphere/). ΔGapp includes correction for

oligomeric state (see supplement and (Neira and Mateu 2001); values correspond to

dimerization constants in the range 0.4-0.8 nM. c,dSide chain packs in protomeric mini-

corec or at dimer interfaced.

309

Figure AXI-1

Crystal structures of classical UBA monomer and DSXF UBA-like dimer. (a) X-ray structure of UBA domain in Tip Associating Protein (TAP; PDB accession code 1OAI

(Grant et al. 2003). The three canonical helices are labeled α1, α2, and α3. (b) CTDF-p fragment (residues 353 to 409; PDB accession code 1ZV1). One protomer is shown in blue and the other in teal. 310

Figure AXI-1

311

Figure AXI-2

H-X exchange schemes and 1H-15N HSQC spectra. (a) models 1 and 2 (left and right)

posit potential protein folding and assembly pathways. In model 1 preformed monomers

associate to form dimer. Model 2 illustrates dimerization-coupled folding. (b) 1H-15N

F HSQC spectrum of CTD -p in H2O. (c) Corresponding spectrum after 45 minutes (blue)

and 13 hours (red) in D2O. Selected resonance assignments are given in (c). Spectra were obtained at 30 °C at 700 MHz. 312

Figure AXI-2

313

Figure AXI-3

Sites of global H-X protection and model of putative G398D variant monomer. (a)

Global protection occurs predominantly at dimer interface (Y378, V379, R394; blue) and in protomeric mini-core (M377, S392; red); (I380; purple) contributes to both. Protected residues elsewhere are in pink (R393 and N403; see footnote 6). (b) Model of G398D monomer based on wild-type structure. If such a structure should exist, the variant side chain (red) would be solvent exposed and otherwise compatible with a native-like UBA fold. Despite the plausibility of this model, coupling between wild-type folding and dimerization suggests that a mutational block to dimerization would lead to either an unfolded variant polypeptide or non-native structures. 314

Figure AXI-3

315

BIBLIOGRAPHY

Ahmad, S.M. and B.S. Baker. 2002. Sex-specific deployment of FGF signaling in Drosophila recruits mesodermal cells into the male genital imaginal disc. Cell 109: 651-661.

An, W., S. Cho, H. Ishii, and P.C. Wensink. 1996. Sex-specific and non-sex-specific oligomerization domains in both of the doublesex transcription factors from Drosophila melanogaster. Mol. Cell. Biol. 16: 3106-3111.

An, W. and P.C. Wensink. 1995a. Integrating sex- and tissue-specific regulation within a single Drosophila enhancer. Genes Dev. 9: 256-266.

An, W. and P.C. Wensink. 1995b. Three protein binding sites form an enhancer that regulates sex- and fat body-specific transcription of Drosophila yolk protein genes. EMBO J. 14: 1221-1230.

Arndt, K. and F. Winston. 2005. An unexpected role for ubiquitylation of a transcriptional activator. Cell 120: 733-734.

Arthur, B.I., Jr., J.M. Jallon, B.C. Caflisch, Y., and R. Nothiger. 1998. Sexual behaviour in Drosophila is irreversibly programmed during a critical period. Curr. Biol. 8: 1187-1190.

Backmann, J., G. Schafer, L. Wyns, and H. Bonisch. 1998. Thermodynamics and kinetics of unfolding of the thermostable trimeric adenylate kinase from the archaeon Sulfolobus acidocaldarius. J Mol Biol 284: 817-833.

Baker, B.S. and K.A. Ridge. 1980. Sex and the single cell. I. On the action of major loci affecting sex determination in Drosophila melanogaster. Genet. 94: 383-423.

Barker, N., P.J. Morin, and H. Clevers. 2000. The Yin-Yang of TCF/beta-catenin signaling. Adv Cancer Res. 77: 1-24.

Bartkiewicz, M., A. Houghton, and R. Baron. 1999. Leucine zipper-mediated homodimerization of the adaptor protein c-Cbl. A role in c-Cbl's tyrosine phosphorylation and its association with epidermal growth factor receptor. J Biol Chem 274: 30887-30895.

Baxevanis, A.D. and C.R. Vinson. 1993. Interactions of coiled coils in transcription factors: where is the specificity? Curr Opin Genet Dev 3: 278-285.

316

Bayrer, J., Z. Wan, B. Li, and M.A. Weiss. 2004. Expression, crystallization and preliminary X-ray characterization of the Drosophila transcription factor Doublesex. Acta Crystallogr. D Biol. Crystallogr. 60: 1328-1330.

Bayrer, J.R., Y. Yang, N.B. Phillips, and M.A. Weiss. 2005a. Dimerization-Coupled Folding of a Sex-Specific UBA Domain in a Transcription Factor. In Preparation.

Bayrer, J.R., W. Zhang, N.B. Phillips, and M.A. Weiss. 2005b. Sex-Specific Gene Regulation: Intersexual Drosophila Development due to Misfolding of a Novel UBA Domain. Submitted.

Bayrer, J.R., W. Zhang, and M.A. Weiss. 2005c. Dimerization of doublesex is mediated by a cryptic UBA-domain fold. Implications for sex-specific gene regulation. J. Biol. Chem in press.

Belote, J.M. and B.S. Baker. 1987. Sexual behavior: its genetic control during development and adulthood in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 84: 8026-8030.

Bertolaet, B.L., D.J. Clarke, M. Wolff, M.H. Watson, M. Henze, G. Divita, and S.I. Reed. 2001. UBA domains mediate protein-protein interactions between two DNA damage-inducible proteins. J Mol Biol 313: 955-963.

Brunger, A.T., P.D. Adams, G.M. Clore, W.L. DeLano, P. Gros, and R.W. Grosse- Kunstleve. 1998. Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. sect. D 54.

Bryant, P.J., P.N. Adler, C. Duranceau, M.J. Fain, S. Glenn, B. Hsei, A.A. James, C.L. Littlefield, C.A. Reinhardt, S. Strub, and H.A. Schneiderman. 1978. Regulative interactions between cells from different imaginal disks of Drosophila melanogaster. Science 201: 928-930.

Buchberger, A. 2002. From UBA to UBX: new words in the ubiquitin vocabulary. Trends Cell Biol. 15: 216-221.

Burkhard, P., J. Stetefeld, and S.V. Strelkov. 2001. Coiled coils: a highly versatile protein folding motif. Trends Cell Biol. 11: 82-88.

Burtis, K.C. 1993. The regulation of sex determination and sexually dimorphic differentiation in Drosophila. Curr. Opin. Cell Biol. 5: 1006-1014.

Burtis, K.C. and B.S. Baker. 1989. Drosophila doublesex gene controls somatic sexual differentiation by producing alternatively spliced mRNAs encoding related sex- specific polypeptides. Cell 56: 997-1010.

317

Burtis, K.C., K.T. Coschigano, B.S. Baker, and P.C. Wensink. 1991. The doublesex proteins of Drosophila melanogaster bind directly to a sex-specific yolk protein gene enhancer. EMBO J. 10: 2577-2582.

Casares, F., W. Bender, J. Merriam, and E. Sanchez-Herrero. 1997. Interactions of Drosophila Ultrabithorax regulatory regions with native and foreign promoters. Genetics 145: 123-137.

Cavey, J.R., S.H. Ralston, L.J. Hocking, P.W. Sheppard, B. Ciani, M.S. Searle, and R. Layfield. 2005. J. Bone Miner. Res. 20: 619-624.

Cho, S. and P.C. Wensink. 1996. Purification and physical properties of the male and female double sex proteins of Drosophila. Proc. Natl. Acad. Sci. USA 93: 2043- 2047.

Cho, S. and P.C. Wensink. 1997. DNA binding by the male and female doublesex proteins of Drosophila melanogaster. J. Biol. Chem. 272: 3185-3189.

Cho, S. and P.C. Wensink. 1998. Linkage between oligomerization and DNA binding in Drosophila doublesex proteins. Biochemistry 37: 11301-11308.

Clarke, D.J., G. Mondesert, M. Segal, B.L. Bertolaet, S. Jensen, M. Wolff, M. Henze, and S.I. Reed. 2001. Dosage suppressors of pds1 implicate ubiquitin-associated domains in checkpoint control. Mol Cell Biol. 21: 1997-2007.

Cline, T.W. and B.J. Meyer. 1996. Vive la difference: males vs females in flies vs worms. Annu. Rev. Genet. 30: 637-702.

Clore, G.M., J. Ernst, R. Clubb, J.G. Omichinski, W.M. Kennedy, K. Sakaguchi, E. Appella, and A.M. Gronenborn. 1995. Refined solution structure of the oligomerization domain of the tumour suppressor p53. Nat Struct Biol 2: 321-333.

Collaborative Computational Project, N. 1994. The CCP4 suite: programs for protein crystallography. Acta Cryst. D50: 760-763.

Coschigano, K.T. and P.C. Wensink. 1993. Sex-specific transcriptional regulation by the male and female doublesex proteins of Drosophila. Genes Dev. 7: 42-54.

Crick, F.H.C. 1953. The packing of alpha-helices: simple coiled-coils. Acta Cryst. 6: 689- 697.

Davies, G.C., S.A. Ettenberg, A.O. Coats, M. Mussante, S. Ravichandran, J. Collins, M.M. Nau, and S. Lipkowitz. 2004. Cbl-b interacts with ubiquitinated proteins; differential functions of the UBA domains of c-Cbl and Cbl-b. Oncogene 23: 7104-7115.

318

De Francesco, R., A. Pastore, G. Vecchio, and R. Cortese. 1991. Circular dichroism study on the conformational stability of the dimerization domain of transcription factor LFB1. Biochemistry 30: 143-147.

DeLano, W.L. 2002. The PyMOL User's Manual. DeLano Scientific, San Carlos, CA, USA.

Dieckmann, T., E.S. Withers-Ward, M.A. Jarosinski, C.F. Liu, I.S. Chen, and J. Feigon. 1998. Structure of a human DNA repair protein UBA domain that interacts with HIV-1 Vpr. Nat Struct Biol. 5: 1042-1047.

Epper, F. and R. Nothiger. 1982. Genetic and developmental evidence for a repressed genital primordium in Drosophila melanogaster. Dev Biol. 94: 163-175.

Erdman, S.E. and K.C. Burtis. 1993. The Drosophila doublesex proteins share a novel related DNA binding domain. EMBO J. 12: 527-535.

Erdman, S.E., H.J. Chen, and K.C. Burtis. 1996. Functional and genetic characterization of the oligomerization and DNA binding properties of the Drosophila doublesex proteins. Genetics 144: 1639-1652.

Estrada, B., F. Casares, and E. Sanchez-Herrero. 2003. Development of the genitalia in Drosophila melanogaster. Differentiation 71: 299-310.

Estrada, B. and E. Sanchez-Herrero. 2001. The Hox gene Abdominal-B antagonizes appendage development in the genital disc of Drosophila. Development 128: 331- 339.

Ferraro, D.M., N.D. Lazo, and A.D. Robertson. 2004. EX1 hydrogen exchange and protein folding. Biochemistry 43: 587-594.

Ferre-D'Amare, A.R., P. Pognonec, R.G. Roeder, and S.K. Burley. 1994. Structure and function of the b/HLH/Z domain of USF. EMBO J. 13: 180-189.

Ferre-D'Amare, A.R., G.C. Prendergast, E.B. Ziff, and S.K. Burley. 1993. Recognition by Max of its cognate DNA through a dimeric b/HLH/Z domain. Nature 363: 38-45.

Finley, K.D., P.T. Edeen, M. Foss, E. Gross, N. Ghbeish, R.H. Palmer, B.J. Taylor, and M. McKeown. 1998. Dissatisfaction encodes a tailless-like nuclear receptor expressed in a subset of CNS neurons controlling Drosophila sexual behavior. Neuron 21: 1363-1374.

Finley, K.D., B.J. Taylor, M. Milstein, and M. McKeown. 1997. dissatisfaction, a gene involved in sex-specific behaviour and neural development of Drosophila by the fruitless gene. Proc. Natl. Acad. Sci. USA 94: 913-918.

319

Fortier, E. and J.M. Belote. 2000. Temperature-dependent gene silencing by an expressed inverted repeat in Drosophila. Genesis 26: 240-244.

Friedman, A.M., T.O. Fischmann, and T.A. Steitz. 1995. Crystal structure of lac repressor core tetramer and its implications for DNA looping. Science 268: 1721- 1727.

Garrett-Engele, C.M., M.L. Siegal, D.S. Manoli, B.C. Williams, H. Li, and B.S. Baker. 2002. intersex, a gene required for female sexual development in Drosophila, is expressed in both sexes and functions together with doublesex to regulate terminal differentiation. Development 129: 4661-4675.

Gong, W.J. and K.G. Golic. 2003. Ends-out, or replacement, gene targeting in Drosophila. Proc. Natl. Acad. Sci. USA 100: 2556-2561.

Grant, R.P., D. Neuhaus, and M. Stewart. 2003. Structural basis for the interaction between the Tap/NXF1 UBA domain and FG nucleoporins at 1A resolution. J Mol Biol. 326: 849-858.

Greenspan, R.J. and J.F. Ferveur. 2000. Courtship in Drosophila. Annu Rev Genet. 34: 205-232.

Griffith, J., A. Hochschild, and M. Ptashne. 1986. DNA loops induced by cooperative binding of lambda repressor. Nature 322: 750-752.

Gronenborn, A.M. and G.M. Clore. 1995. Structures of protein complexes by multidimensional heteronuclear magnetic resonance spectroscopy. Crit. Rev. Biochem. Mol. Biol. 30: 351-385.

Held, L.I. 2002. Imaginal Discs: the Genetic and Cellular Logic of Pattern Formation. Cambridge University Press, Cambridge, UK.

Hinck, A.P., W.F. Walkenhorst, W.M. Westler, S. Choe, and J.L. Markley. 1993. Overexpression and purification of avian ovomucoid third domains in Escherichia coli. Protein Eng. 6: 221-227.

Hofmann, K. and P. Bucher. 1996. The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway. Trends Biochem Sci. 21: 172-173.

Hollis, T., Y. Ichikawa, and T. Ellenberger. 2000. DNA bending and a flip-out mechanism for base excision by the helix-hairpin-helix DNA glycosylase, Escherichia coli AlkA. Embo J 19: 758-766.

320

Hua, Q.X. and M.A. Weiss. 2004. Mechanism of insulin fibrillation: the structure of insulin under amyloidogenic conditions resembles a protein-folding intermediate. J Biol Chem 279: 21449-21460.

Huyghues-Despointes, B.M., C.N. Pace, S.W. Englander, and J.M. Scholtz. 2001. Measuring the conformational stability of a protein by hydrogen exchange. Methods Mol Biol 168: 69-92.

Johnson, A.D., A.R. Poteete, G. Lauer, R.T. Sauer, G.K. Ackers, and P. M. 1981. lambda repressor and cro-components of an efficient molecular switch. Nature 294: 217- 223.

Jones, S. and J.M. Thornton. 1995. Protein-protein interactions: a review of protein dimer structures. Prog Biophys Mol Biol 63: 31-65.

Jones, T.A., J.Y. Zou, S.W. Cowan, and M. Kjeldgaard. 1991. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A47: 110-119.

Kaiser, P., K. Flick, C. Wittenberg, and S.I. Reed. 2000. Regulation of transcription by ubiquitination without proteolysis: Cdc34/SCF(Met30)-mediated inactivation of the transcription factor Met4. Cell 102: 303-314.

Kang, R.S., C.M. Daniels, S.A. Francis, S.C. Shih, W.J. Salerno, L. Hicke, and I. Radhakrishnan. 2003. Solution structure of a CUE-ubiquitin complex reveals a conserved mode of ubiquitin binding. Cell 113: 621-630.

Kawase, Y., M. Tanio, A. Kira, S. Yamaguchi, S. Tuzi, A. Naito, M. Kataoka, J.K. Lanyi, R. Needleman, and H. Saito. 2000. Alteration of conformation and dynamics of bacteriorhodopsin induced by protonation of Asp 85 and deprotonation of Schiff base as studied by 13C NMR. Biochemistry 39: 14472- 14480.

Kay, L.E. 2005. NMR studies of protein structure and dynamics. J. Magn. Reson. 173: 193-207.

Keisman, E.L. and B.S. Baker. 2001. The Drosophila sex determination hierarchy modulates wingless and decapentaplegic signaling to deploy dachshund sex- specifically in the genital imaginal disc. Development 128: 1643-1656.

Keisman, E.L., A.E. Christiansen, and B.S. Baker. 2001. The sex determination gene doublesex regulates the A/P organizer to direct sex-specific patterns of growth in the Drosophila genital imaginal disc. Dev. Cell. 1: 215-225.

Kohn, W.D., C.T. Mant, and R.S. Hodges. 1997. Alpha-helical protein assembly motifs. J Biol Chem 272: 2583-2586. 321

Kopp, A., I. Duncan, and S.B. Carroll. 2000. Genetic control and evolution of sexually dimorphic characters in Drosophila. Nature 408: 553-559.

Lamzin, V.S. and K.S. Wilson. 1993. Automated refinement of protein models. Acta Crystallogr. 49: 129-147.

Li, H. and B.S. Baker. 1998. hermaphrodite and doublesex function both dependently and independently to control various aspects of sexual differentiation in Drosophila. Development 125: 2641-2651.

Li, H. and B.S. Baker. 1998b. her, a gene required for sexual differentiation in Drosophila, encodes a zinc finger protein with characteristics of ZFY-like proteins and is expressed independently of the sex determination hierarchy. Development 125: 2641-2651.

Li, R. and C. Woodward. 1999. The hydrogen exchange core and protein folding. Protein Sci 8: 1571-1590.

Lim, W.A. and R.T. Sauer. 1989. Alternative packing arrangements in the hydrophobic core of lambda repressor. Nature 339: 31-36.

Liu, J., S.M. DeYoung, J.B. Hwang, E.E. O'Leary, and A.R. Saltiel. 2003. The roles of Cbl-b and c-Cbl in insulin-stimulated glucose transport. J. Biol. Chem. 278: 36754-36762.

Makhatadze, G.I., G.M. Clore, and A.M. Gronenborn. 1995. Solvent isotope effect and protein stability. Nat Struct Biol. 2: 852-855.

Mann, R.S. and S.B. Carroll. 2002. Molecular mechanisms of selector gene function and evolution. Curr Opin Genet Dev. 12: 592-600.

Manoli, D.S., M. Foss, A. Villella, B.J. Taylor, J.C. Hall, and B.S. Baker. 2005. Male- specific fruitless specifies the neural substrates of Drosophila courtship behaviour. Nature 436: 395-400.

Marin, I. and B.S. Baker. 1998. The evolutionary dynamics of sex determination. Science 281: 1990-1994.

McKeown, M., J.M. Belote, and R.T. Boggs. 1988. Ectopic expression of the female transformer gene product leads to female differentiation of chromosomally male Drosophila. Cell 53: 887-895.

Moore, S.C., L. Jason, and J. Ausio. 2002. The elusive structural role of ubiquitinated histones. Biochem Cell Biol. 80: 311-319.

322

Mueller, T.D., M. Kamionka, and J. Feigon. 2004. Specificity of the interaction between ubiquitin-associated domains and ubiquitin. J. Biol. Chem. 279: 11926-11936.

Muratani, M., C. Kung, K.M. Shokat, and W.P. Tansey. 2005. The F box protein Dsg1/Mdm30 is a transcriptional coactivator that stimulates Gal4 turnover and cotranscriptional mRNA processing. Cell 120: 887-899.

Narayana, N., Q.-X. Hua, and M.A. Weiss. 2001. The dimerization domain of HNF-1α: structure and plasticity of an intertwined four-helix bundle with application to diabetes mellitus. J. Mol. Biol. 310: 635-658.

Narendra, U., L. Zhu, B. Li, J. Wilken, and M.A. Weiss. 2002. Sex-specific gene regulation: the doublesex DM motif is a bipartite DNA-binding domain. J. BIol. Chem. 277: 43463-43473.

Neira, J.L. and M.G. Mateu. 2001. Hydrogen exchange of the tetramerization domain of the human tumour suppressor p53 probed by denaturants and temperature. Eur J Biochem 268: 4868-4877.

Nooren, I.M. and J.M. Thornton. 2003. Diversity of protein-protein interactions. EMBO J. 22: 3486-3492.

Nothiger, R., M. Leuthold, N. Andersen, P. Gerschwiler, A. Grutter, W. Keller, C. Leist, M. Roost, and H. Schmid. 1987. Genetic and developmental analysis of the sex- determing gene "double sex" (dsx) of Drosophila melanogaster. Genetical Res. 50: 113-124.

O'Dell, K.M. 2003. The voyeurs' guide to Drosophila melanogaster courtship. Behav Processes. 64: 211-223.

Oh, D.B., Y.G. Kim, and A. Rich. 2002. Z-DNA-binding proteins can act as potent effectors of gene expression in vivo. Proc Natl Acad Sci U S A 99: 16666-16671.

Ohya, Y. and D. Botstein. 1994. Structure-based systemic isolation of conditional-lethal mutations in the single yeast calmodulin gene. Genetics 138: 1041-1054.

Ortolan, T.G., P. Tongaonkar, D. Lambertson, L. Chen, C. Schauber, and K. Madura. 2000. The DNA repair protein rad23 is a negative regulator of multi-ubiquitin chain assembly. Nat Cell Biol. 2: 601-608.

O'Shea, E.K., J.D. Klemm, P.S. Kim, and T. Alber. 1991. X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science 254: 539-544.

Ottolenghi, C. and K. McElreavey. 2000. Deletions of 9p and the quest for a conserved mechanism of sex determination. Mol. Genetics Metab. 71: 397-404.

323

Otwinowski, Z. and W. Minor. 1997. Processing of x-ray diffraction data collected in oscillation mode. Meth. Enzymol. 276: 307-326.

Ounap, K., O. Uibo, R. Zordania, L. Kiho, T. Ilus, E. Oiglane-Shlik, and O. Bartsch. 2004. Three patients with 9P deletions including DMRT1 and DMRT2: a girl with XY complement, bilateral ovotestes, and extreme growth retardation, and two two females with normal pubertal development. Am J Med Genet 130A: 415- 423.

Pabo, C.O. and R.T. Sauer. 1992. Transcription factors: structural families and principles of DNA recognition. Ann. Rev. Biochem. 61: 1053-1095.

Postlethwait, J.H., P.J. Bryant, and r.G. Schubige. 1972. The homoeotic effect of "tumorous head" in Drosophila melanogaster. Dev Biol. 29: 337-342.

Prag, G., S. Misra, E.A. Jones, R. Ghirlando, B.A. Davies, B.F. Horazdovsky, and J.H. Hurley. 2003. Mechanism of ubiquitin recognition by the CUE domain of Vps9p. Cell 113: 609-620.

Pultz, M.A. and B.S. Baker. 1995. The dual role of hermaphrodite in the Drosophila sex determination regulatory hierarchy. Development 121: 99-111.

Pultz, M.A., G.S. Carson, and B.S. Baker. 1994. A genetic analysis of hermaphrodite, a pleiotropic sex determination gene in Drosophila melanogaster. Genetics 136: 195-207.

Rape, M., T. Hoppe, I. Gorr, M. Kalocay, H. Richly, and S. Jentsch. 2001. Mobilization of processed, membrane-tethered SPT23 transcription factor by CDC48(UFD1/NPL4), a ubiquitin-selective chaperone. Cell 107: 667-677.

Raymond, C.S., M.W. Murphy, M.G. O'Sullivan, V.J. Bardwell, and D. Zarkower. 2000. Dmrt1, a gene related to worm and fly sexual regulators, is required for mammalian testis differentiation. Genes Dev. 14: 2587-2595.

Raymond, C.S., E.D. Parker, J.R. Kettlewell, L.G. Brown, D.C. Page, K. Kusz, J. Jaruzelska, Y. Reinberg, W.L. Flejter, V.J. Bardwell, B. Hirsch, and D. Zarkower. 1999. A region of human chromosome 9p required for testis development contains two genes related to known sexual regulators. Hum. Mol. Genet. 8: 989- 996.

Raymond, C.S., C.E. Shamu, M.M. Shen, K.J. Seifert, B. Hirsch, J. Hodgkin, and D. Zarkower. 1998. Evidence for evolutionary conservation of sex-determining genes. Nature 391: 691-695.

Saccone, G., A. Pane, and L.C. Polito. 2002. Sex determination in flies, fruitflies and butterflies. Genetica 116: 15-23. 324

Salghetti, S.E., A.A. Caudy, J.G. Chenoweth, and W.P. Tansey. 2001. Regulation of transcriptional activation domain function by ubiquitin. Science 293: 1651-1653.

Sanchez, L., N. Gorfinkiel, and I. Guerrero. 2001. Sex determination genes control the development of the Drosophila genital disc, modulating the response to Hedgehog, Wingless and Decapentaplegic signals. Development 128: 1033-1043.

Sanchez, L. and I. Guerrero. 2001. The development of the Drosophila genital disc. Bioessays 23: 698-707.

Sato, S., C. Tomomori-Sato, C.A.S. Banks, T.J. Parmely, I. Sorokina, C.S. Brower, R.C. Conaway, and J.W. Conaway. 2003. A mammalian homolog of Drosophila melanogaster transcriptional coactivator intersex is a subunit of the mammalian Mediator complex. J. Biol. Chem. 278: 49671-49674.

Sheldrick, G.M. 1990. Phase annealing in SHELX-90: direct methods for larger structures. Acta Crystallogr. A46: 467-473.

Shuker, S.B., P.J. Hajduk, R.P. Meadows, and S.W. Fesik. 1996. Discovering high- affinity ligands for proteins: SAR by NMR. Science 274: 1531-1534.

Siegal, M.L. and B.S. Baker. 2005. Functional conservation and divergence of intersex, a gene required for female differentiation in Drosophila melanogaster. Dev.Genes Evol. 215: 1-12.

Singh. J and J.M. Thornton. 1992. Atlas of Protein Side-Chain Interactions. IRL press, Oxford.

Smith, C.A. and A.H. Sinclair. 2004. Sex determination: insights from the chicken. Bioessays 26: 120-132.

Sreerama, N. and R.W. Woody. 1993. A self-consistent method for the analysis of protein secondary structure from circular dichroism. Anal. Biochem. 209: 32-44.

Suyama, M., T. Doerks, I.C. Braun, M. Sattler, E. Izaurralde, and P. Bork. 2000. Prediction of structural domains of TAP reveals details of its interaction with p15 and nucleoporins. Embo Rep. 1: 53-58.

Ukiyama, E., A. Jancso-Radek, B. Li, L. Milos, W. Zhang, N.B. Phillips, N. Morikawa, C.Y. King, G. Chan, C.M. Haqq, J.T. Radek, F. Poulat, P.K. Donahoe, and M.A. Weiss. 2001. SRY and architectural gene regulation: the kinetic stability of a bent protein-DNA complex can regulate its transcriptional potency. Mol. Endocrinol. 15: 363-377.

325

Usui-Aoki, K., H. Ito, K. Ui-Tei, K. Takahashi, T. Lukacsovich, W. Awano, H. Nakata, Z.F. Piao, E.E. Nilsson, J. Tomida, and D. Yamamoto. 2000. Formation of the male-specific muscle in female Drosophila by ectopic fruitless expression. Nat. Cell Biol. 2: 500-506. van den Akker, F. and W.G. Hol. 1999. Difference density quality (DDQ): a method to assess the global and local correctness of macromolecular crystal structures. Acta Crystallogr D Biol Crystallogr. 55: 206-218.

Varadan, R., M. Assfalg, S. Raasi, C. Pickart, and D. Fushman. 2005. Structural determinants for selective recognition of a Lys48-linked polyubiquitin chain by a UBA domain. Mol Cell. 18: 687-698.

Villella, A. and J.C. Hall. 1996. Courtship anomalies caused by doublesex mutations in Drosophila melanogaster. Genetics 143: 331-344.

Volff, J.N. and M. Schartl. 2002. Sex determination and sex chromosome evolution in the medaka, Oryzias latipes, and the platyfish, Xiphophorus maculatus. Cytogenet Genome Res 99: 170-177.

Waterbury, J.A., L.L. Jackson, and P. Schedl. 1999. Analysis of the doublesex female protein in Drosophila melanogaster: role on sexual differentiation and behavior and dependence on intersex. Genetics 152: 1653-1667.

Weiss, M.A., T. Ellenberger, C.R. Wobbe, J.P. Lee, S.C. Harrison, and K. Struhl. 1990. Folding transition in the DNA-binding domain of GCN4 on specific binding to DNA. Nature 347: 575-578.

Wertman, K.F., D.G. Drubin, and D. Botstein. 1992. Systemic mutational analysis of the yeast SCT1 gene. Genetics 132: 337-350.

Wilkins, A.S. 1993. Genetic Analysis of Animal Development. Wiley-Liss, Inc., New York, NY.

Wilkinson, C.R., M. Seeger, R. Hartmann-Petersen, M. Stone, M. Wallace, C. Semple, and C. Gordon. 2001. Proteins containing the UBA domain are able to bind to multi-ubiquitin chains. Nat Cell Biol. 3: 939-943.

Withers-Ward, E.S., T.D. Mueller, I.S. Chen, and J. Feigon. 2000. Biochemical and structural analysis of the interaction between the UBA(2) domain of the DNA repair protein HHR23A and HIV-1 Vpr. Biochemistry 39: 14103-14112.

Wrischnik, L.A., J.R. Timmer, L.A. Megna, and T.W. Cline. 2003. Recruitment of the proneural gene scute to the Drosophila sex-determination pathway. Genetics 165: 2007-2027.

326

Yi, W. and D. Zarkower. 1999. Similarity of DNA binding and transcriptional regulation by Caenorhabditis elegans MAB-3 and Drosophila melanogaster DSX suggests conservation of sex determining mechanisms. Development 126: 873-881.

Young, P., Q. Deveraux, R.E. Beal, C. Pickart, and M. Rechsteiner. 1998. Characterization of two polyubiquitin binding sites in the 26 S protease subunit 5a. J. Biol. Chem 273: 5461-5467.

Zagorski, M.G., J.U. Bowie, A.K. Vershon, R.T. Sauer, and D.J. Patel. 1989. NMR studies of Arc repressor mutants: proton assignments, secondary structure, and long-range contacts for the thermostable proline-to-leucine variant of Arc. Biochemistry 28: 9813-9825.

Zhu, L., J. Wilken, N.B. Phillips, U. Narendra, G. Chan, S.M. Stratton, S.B. Kent, and M.A. Weiss. 2000. Sexual dimorphism in diverse metazoans is regulated by a novel class of intertwined zinc fingers. Genes Dev. 14: 1750-1764.