Proc. Natl. Acad. Sci. USA Vol. 89, pp. 8731-8735, September 1992 Genetics The Enhancer of split [E(spt)] locus of Drosophila encodes seven independent helix-loop-helix CHRISTOS DELIDAKIS AND SPYROS ARTAVANIS-TSAKONAS* Departments of Biology and Biology, Howard Hughes Medical Institute, Boyer Center for Molecular Medicine, Yale University, New Haven, CT 06536-0812 Communicated by Gerald M. Rubin, May 13, 1992

ABSTRACT The E(spl) locus is thought to participate in a function is mediated by two classes of closely linked cell interaction mechanism that controls the choice ofmany cell (16). One class corresponds to a cluster ofthree bHLH genes, fates during Drosophila development, including the segregation m5, m7, and m8, first described by Klambt et al. (17); the of neural precursors. Previous studies have demonstrated that other is a single-copy , m9/10, encoding a nuclear E(spl) is defined by two groups ofclosely related transcripts, (i) containing repeated motifs first identified in the /3 a cluster of three transcripts encoding proteins bearing a subunit of guanine nucleotide-binding proteins (G proteins) helix-oop-helix (HLH) motif and (it) a single-copy gene en- (18, 19). We found that the overall copy number of the E(spl) coding a nuclear protein containing repeated motifs first iden- HLH genes is central in establishing the neural-epidermal tified in the .3 subunit of guanine nucleotide-binding proteins. dichotomy, as gradual reduction in their copy number leads Both groups interact genetically with the Notch locus, which to progressively greater extents of neural hypertrophy (see codes for a transmembrane protein. We report the structure of also ref. 15). Modification of m9/10 copy numbers, on the four additional HLH-encoding genes that reside in the E(spl) other hand, shows less pronounced phenotypic effects. The complex and provide evidence that we have now identified all presence ofthe HLH motifin E(spl) products and the fact that the remaining members of the E(spl) HLH cluster. HLH proteins can form heterooligomers suggest that they serve as transcriptional regulators of the ASC genes, maybe The differentiation of neural and epidermal precursors from even through direct interactions with ASC proteins. the neurogenic regions of the early Drosophila embryo Prior genetic analysis has shown that at least m7 and m8 are involves a process of intercellular communication. Genetic dispensable, suggesting functional redundancy among E(spl) disruption of this signaling mechanism results in neural HLH genes (16). We have further shown that proximal to the hyperplasia accompanied by epidermal hypoplasia in accor- characterized cluster there must be at least one more E(spl) dance with the model that a developing neuroblast sends an HLH gene with properties similar to those ofmS, m7, and m8. inhibitory signal to its neighbors to prevent them from Here we present the localization and characterization of four acquiring the neural fate (1). Characterization of the genes additional E(spl) HLH units, which correspond implicated in this communication mechanism shows highly to this genetically identified proximal component and appear pleiotropic effects on cell fate choices throughout develop- to represent all the remaining members of the E(spl) HLH ment. A central element of this mechanism is Notch, which cluster.t codes for a transmembrane protein. Several studies have identified genes that can interact with Notch and are there- MATERIALS AND METHODS fore thought to be part of the same interaction mechanism. These components include transmembrane, cytoplasmic, DNA Hybridization and Sequence Analysis. DNA restric- and nuclear proteins encoded by the neurogenic genes Delta, tion digests were size-separated by horizontal agarose gel mastermind, and E(spl), as well as Serrate and deltex (2, 3). electrophoresis and blotted to Zeta-Probe nylon membranes This group of genes has been collectively termed the Notch (Bio-Rad) in 0.4 M NaOH (20). The hybridization buffer was group (3, 4). 0.5 M sodium phosphate, pH 7.2/1 mM Na2 EDTA/5% SDS The ability of a cell in the neurogenic ectoderm to acquire (contributing 170 mM Na+)/1% (wt/vol) bovine serum albu- the neural fate depends on the expression of one of the min (fraction V; Sigma). The wash buffer consisted of40 mM "proneural" genes (5), such as those coded for by the sodium phosphate, pH 7.2/1 mM Na2EDTA/1% SDS (34 achaete-scute (ac-sc) complex (ASC) (6). Expression of mM Na+). High-stringency washes were at 68-70°C. To these genes in the precursors of adult sensory organs is determine the optimal low-stringency conditions, we hybrid- initially diffuse over a cluster of cells. Eventually, only one ized duplicate filters containing E(spl)-region subclones with cell, the future neuroblast, continues to express high levels of m5 or m7 probes at 50°C (three filters with each probe). We ASC products (7-9). The situation is probably similar in the subsequently washed each of the three filters at 70°C, 50°C, embryonic central nervous system (10, 11). Possibly the or room temperature. Genomic or cDNA fragments were neurogenic genes function in refining the ASC expression subcloned into M13 mpl8 and/or mpl9 for dideoxy sequenc- pattern. ASC genes encode nuclear DNA-binding proteins ing using the Sequenase kit (United States Biochemical) and bearing a basic-helix-loop-helix (bHLH) motif(12-14). They a combination of synthetic oligonucleotide primers. Se- have, therefore, been implicated as transcriptional activators quence analysis was performed using the MacVector soft- of neuroblast-specific genes. ware (IBI) for the Macintosh computer. Previous studies have implicated several independent tran- PCR Detection of E(spl) HLH Genes. To obtain primers scripts in the function of the E(sp() locus, and thus the locus recognizing all known E(spl) HLH genes, we used the regions is referred to as the Enhancer of split complex (15). Our molecular genetic analysis ofthe E(spl) locus revealed that its Abbreviations: ASC, achaete-scute complex; bHLH, basic-helix- loop-helix. *To whom reprint requests should be addressed. The publication costs of this article were defrayed in part by page charge tThe sequences reported in this paper have been deposited in the payment. This article must therefore be hereby marked "advertisement" GenBank data base (accession nos. M96165-M96168 for m3, mA, in accordance with 18 U.S.C. §1734 solely to indicate this fact. mB, and mC, respectively).

8731 Downloaded by guest on September 30, 2021 8732 Genetics: Delidakis and Artavanis-Tsakonas Proc. Natl. Acad. Sci. USA 89 (1992)

of highest sequence conservation. The ARIN primer corre- E H sponds to the sequence spanning the basic/helix Ijunction of mA, mB, and mC (see also Fig. 4). m8, m7, mS, m3, FIG. 2. The 100-bp products de- Consensus: GCC-AGGAT -AAC-AAG-TGC-CTG-GAC -12.O} rived from a PCR with total Drosoph- 8.5 -4' ila genomic DNA as template and Translation: A R M/I N K/L C/Y L D 7.5 -_' 450C as reannealing temperature were used to probe a Southern blot of ge- This consensus sequence was synthesized. The correspond- 5.2 -_' 5.2 nomic DNA digested with EcoRI ing sequence from the related HLH gene hairy (see Results (lane E) or HindIll (lane H). The 100-bp species is diagnostic of the and Discussion) contains six mismatches: GCC-CGT-ATT- 4.1 -_' AAC-AAC-TGT-CTC-AAT. 4-3.6 presence of an E(spt HLH gene (see 3.4 .-- text). Sizes of the fragments (kb) de- The EKAD primer corresponds to the helix II region ofm8, tected are shown. By comparison to m7, m5, m3, mA, mB, and mC (see also Fig. 4). the E(spt) restriction map (Fig. 1), they all correspond to E(spl) HLH Consensus: Tc-_GAC GAAGGCA ~C_GAGCCATGCTG-GAGA 2.1 genes. Specifically, for the EcoRI di- Translation: T T gest, 8.5 kb corresponds to m5 and mC (identically sized fragments), 7.5 M/F/L E/D K A D/E I/M L E 1. 5-p5 - kb to m3, 5.2 kb to m8, 4.1 kb to mB, reverse complement of the above consensus was syn- 3.4 kb to mA, and 1.55 kb to m7. For The the HindIII digest, 12 kb corresponds thesized (EKAD primer). The corresponding hairy sequence to m7 plus m8, 5.8 kb to mA, 5.2 kb to contains only two mismatches: (T)TG-GAA-AAG-GCC- m5, 3.6 kb to m3, 2.1 kb to mB, and GAC-AT_-CTG-GAG. 1.5 kb to mC. The PCR cycle was 1 min at 940C, 1 min at 450C, 1 min at 720C; 30 cycles were performed. The same reactions were Under these conditions, both mS- and m7-containing repeated with annealing at 370C. Primers were used at 1 AM probes were able to detect all mS, m7, and m8 fragments final concentration and dNTPs at 0.2 mM each. Templates above background (data not shown). We used the same two were plasmid or cosmid clones from the E(spl) region, -1 ng probes on blots containing restriction enzyme digests of per 50-1.l reaction mixture. Total genomic DNA (40 ng) was plasmid or cosmid clones spanning a total of81 kilobases (kb) also used for the reactions at 450C annealing temperature. of contiguous genomic DNA. In addition to the expected restriction fragments corresponding to mS, m7, and m8, we detected strong cross-hybridization signals with four more RESULTS AND DISCUSSION restriction fragments. Fig. 1 shows a restriction map of the The neurogenic function of the locus is provided to a large region analyzed and the fragments exhibiting cross- extent by a group ofhomologous genes (m5, m7, and m8) with hybridization to m5 and m7. One of these intervals corre- apparently equivalent functions (16). Moreover, the region sponds to gene m3, when compared with the restriction map proximal to m5 must contain at least one more member ofthis of Knust et al. (15), whereas the remaining three intervals family. As a result, we searched for other transcription units correspond to as yet uncharacterized transcription units, homologous to the three already described. which we tentatively named mA, mB, and mC. DNA sequence comparisons among the initial three mem- The above conditions of DNA hybridization were inappro- bers ofthe E(spl) HLH family, mS, m7, and m8 (17), revealed priate for use on a genomic DNA blot, as the nonspecific considerable similarity among the 5' coding regions, which signal was sufficient to produce a smear obscuring any correspond to the bHLH domain. As this region spans :200 specifically cross-hybridizing fragments. When we increased base pairs (bp), it was reasonable to assume that DNA the wash temperature to 50°C (conditions otherwise the same fragments containing E(spl) HLH genes would cross- as before), m7 or mA probe detected restriction fragments hybridize to one another at low stringency. After determining corresponding to m7, m3, mA, or mB, whereas an m8 probe the optimal stringency conditions, we could scan the cloned detected only m5 in addition to itself. No distinct cross- E(spl) genomic region to detect the additional predicted hybridizing species were seen with an mC probe. We could members of the family by cross-hybridization. thus group the E(spl) HLH genes into three similarity sub- groups, m8/m5, m7/m3/mA/mB, and mC, which were later 2 kb confirmed by sequencing. However, we could not be certain that we had isolated all E(spl) HLH genes in the genome, as our genomic hybridization conditions did not allow detection of such genes across different similarity subgroups. A second method was devised to detect E(spl) HLH genes. After obtaining the sequences of m3, mA, mB, and mC, we used the regions of highest similarity to design degenerate oligonucleotides that would hybridize to all seven E(spl) ~mm3t ?m5t m7 8 m9/ HLH genes. The 24-mer ARIN contains sense-strand se- quence corresponding to the basic/helix I junction and the 23-mer EKAD contains antisense sequence for the amino- FIG. 1. Restriction map of81 kb ofthe E(spl) region, including the terminal half of helix II (see Materials and Methods). These position of characterized transcription units. A, EcoRI; 0, HindIII; two primers generate a 99- to 105-bp product (predicted size) , Xho I. Location of transcription units distal to ml is taken from when used in a PCR with template DNA bearing any of the refs. 15 and 18; location oftranscription units proximal to mC is taken seven E(spl) HLH genes. We used these primers in PCRs, from ref. 21. HLH genes are shown as filled arrowheads. Striped bars as or cosmid clones from across the above the map mark the fragments showing cross-hybridization with using templates plasmid m5 and m7 probes. Filled bars below the map mark fragments that E(spl) genomic region. We detected only a PCR product of produce a PCR product (of -100 bp) with the primers ARIN and the expected size, -'100 bp, with clones containing one ofthe EKAD, whereas open bars indicate fragments producing no such already characterized E(spl) HLH genes; in all other cases, PCR product. no amplified product was detected (data not shown). Fig. 1 Downloaded by guest on September 30, 2021 Genetics: Delidakis and Artavanis-Tsakonas Proc. Nati. Acad. Sci. USA 89 (1992) 8733

shows the fragments tested. When these primers were'used underlined. mB and mC, as well as m5, m7, and m8, have no to amplify genomic DNA, again only a single 100-bp product and this probably also holds true for m3 and mA. was obtained. After purification by gel electrophoresis, this Partial genomic sequencing and restriction mapping of cDNA amplified product was used to probe a genomic DNA blot. All versus genomic subclones with the Rsa I (whose recognition the genomic fragments detected with this probe corre- sequence is 4 bp) (data not shown) support this claim. sponded to the known seven E(spl) HLH genes, and no The sequences of the seven E(spl) HLH proteins bear a additional bands were seen in any of five different restriction great degree of similarity over a stretch of -~120 amino acids, digests analyzed (Fig. 2). a region encompassing over half of the protein, beginning at To allow for the possibility that there might be mismatches the amino terminus. Fig. 4 presents the alignment of the in the regions corresponding to the primers, we performed the seven protein sequences. For the assignment of similarity we same PCR tests at two different reannealing temperatures, have grouped amino acids into the following categories: K, R, 450C and 37T( (the latter being low enough to allow a few H (basic, +); D, E (acidic, -); 5, T (hydroxyl in the side mismatches in our 23-mer and 24-mer primers). The results chain); A, V, L, I, M, F (hydrophobic, U); Y, F (aromatic); were identical at both reannealing temperatures, which makes Q, N (amide); G, A, S (small side chain). We have also the existence of an additional E(spl) HLH gene unlikely. It included the sequence of the Drosophila hairy protein (h), should be noted that hairy, a closely related HLH gene, yet not which shows the highest similarity to the E(spl) proteins a member ofthe E(spl) HLH family, was not detected at either among the HLH superfamily members. Finally, the bHLH temperature, due to the large number of mismatches with the domain of a more distantly related HLH protein, l'sc of the ARIN primer (see Materials and Methods). ASC complex, is also shown for comparison. It is clear that We sequenced the two contiguous genomic HindHI frag- the seven E(spl) proteins are colinear in the bHLH region and ments containing genes mB and mC, a total of 3.6 kb (Fig. 3). again after a divergent segment of 13-27 amino acids for The m3 and mA genes were sequenced from cDNA clones another stretch of 37 amino acids (the F+XGY stretch, which (Fig. 3). cDNA clones for both of these genes were isolated by is also partially present in the hairy protein). Within these probing a 4- to 8-hr embryonic cDNA library (22) with the regions there is a precise one-to-one correspondence of 2.6-kb HindIII-EcoRI fragment containing m3. Despite the residues, without the need for introducing gaps in the se- high stringency used in the screen (hybridization and washes quences. There are only two exceptions: (i) in the loop region at 680(2, the homology between these two genes was sufficient colinearity breaks down, but there is still a good degree of to allow detection of mA by the m3 probe. The translated similarity, most notable being the presence of a G- dipep- products are shown under the respective DNA sequences. The tide; (ii) the F+XGY region of m3 diverges after the first 11 bHLH domain and the conserved carboxyl terminus are residues. The even closer similarity between m5 and m8 or

mC ~~~~~~3060 90 120 150 190 K A V Q 0 Q R P M T K T Q N Y R K V T KP L L K S K S R A S M N L Y L D K LK 0 L I V D T M D AQ 0 G Q V S K L K K A D> 210 240 270 300 330 360

I LKELT V NY L K AQ Q QQ0RV A NP Q SP P PD Q VN L D K PR AOY T QA AYERV S HI F S T V P L D LK POGT> 390 420 450 490 510 CACCATOAOCA?~OOTCCCACTCAAOOACATGAAACAOOAAGAAGAAATCATCOTTOCAGACO7ACAC GOCAAAO~AATTCCAAAOGATATTCATCATOOGCOAAGAAOTCMO COCCCCGTOOTA H N L MK QLONQ0L K DMNK QKKK9 I I DMNA KKEPV N LA DQ0K R SK S P RK E DI HNOKKEV R PM>

mB ~~~~~~3060 90 120 150 1S0 AT ~~CAAAT¶CAATCAAGACCTAr-ATATA0COCAAGOTGATGAAG CCCA3OCTOO"AOAOAAOCO7TCGOTOCAGA ATACATCTGCACAAOT CTATOO¶'rOCCACCT'?OOATCCOAOOWCGAACTACO'YAAAOA N S S L Q N S K K S K T Y Q Y R K V M K P K L K R K RR A R I N K C L 0 E L K 0 L N V A T L K S K 0 K H V T R L KK A D> 210 240 270 300 330 360

I LK9L T V T L Q KMNK 0QQR QMK R A SOOKES L T PA KOP R SOGY I HA V NKEV SR S L SQ L POMNN VS LOGT> 390 420 450 490 510 540

O L M T H L 0 Q R L N Q I Q P A K K K V L P V T A P L S V N I A N R D A Y S V P I S P I SS Y A 0 5 P N S N T S S T S N> 570 600 TCCCqI'ACACCTC0AOTOAAAOATOOAGACGACAGCGAOGACOAOOAAAC GTCI7COTCCCTO(PAG S L L TT ID0V TK MK9DD SKED EKN VMS PM' >

mA ~~~~~~3060 90 120 150 150 ATOOTINTOGAAATOOAOATOT7CCAAGACC TATCACTACWOCAAGOTGATGAAOCCCATG CTOOAOCOTAAGCOCCGTGCCAOOA¶TAAC AAOTOCCTOOrACGAOCTCAA0GACATCATO GTOAOTOCCTOACCCAOO.AOOCGAOCAC ATCACCCGOCTOGAGAAOOCCGACATcCT M V LKNKRM SK T Y Q YRK VMNK PN LKER K RRAR I N K C LDE L K D INM VKC L TQKOKNE I TS L EK AD I L> 210 240 270 300 330 360 GAOCIr.ACCOTOOAOCACATCGAAAAOCTG CTCAAAAOTTTTCAOC OTCACCOOT0GAOT0TrCTCCGAOTO3CCGAT CAAC'AATOTAA'''CTOTOPA¶FATTCATAOTTCAAC?0CCTTCMGOOA K L T V KS M K K L S A Q K Q L S L S S V TG 0G V S P S A 0 P K L S I A K S P S A G Y V N A A N K V S K T L A A V P 0 V> 390 420 450 480 510 540 AGT01GAT;"CTCGOOCCCCAOCTGA~rAOT CACCT'TOOCCATCOICTCAACTACCTO;CAA OT407OGIVCCTCOCTGC~CCCATTOGAGTG CC'CACCACGOACA AT0OTCACTCCACCTCCCTCOGAATOCGAC ACOTTOGAGAGCOGAAOCTOCAOCCCOOCO R R S V 0 L 0 T Q L N S S L 0 N L N Y L Q V V V P 5 L P I 0 V P L Q A P V K 0 Q A N V T P P P 5 K C 0 T L K 5 0 C S P A> 570 ccCACAOCOTCCTTOCCA797XWOTCGWOOTO POKEAS S T SOP MMWR PM' >

m3 ~~~~~~3060 90 120 150 190 M VMKMNS K T YQ Y RK VMK PL LK R K RRAS I N KC L DL K D L M VK C L Q QKOKNHV T S L EKKA I LKEL> 210 240 270 300 330 360 .~.."OOAOTATO CTGOTGTT0OCAOCCCACCCACCTCAACC AOTACCOCCCAC0TOO&OTCCT'1TCOCTCC GOCTATOTOCAT03CTOCCGATCAGArrACC CAOGCTCTCOCAACAACA TVD MR K Q-K SOOVVA---LS LQ P T S T S T A N VK S PRO 0 Y VMHAA D0Q I T Q V L L Q T Q Q T 0> 390 420 450 490 010 540 GAOATIOCCAAATAA CATCGACOCOACTAATMIAOCTOCAGACGCAG TTOCICAGCAGCAOCAOCAAOCIAOCAACAA CACCAOCAGCAOCAAATACCOCCAATCATICA GOCCGOCTAOCCTTCCCOCTOCTGOOAOGA TATGGACcOACCOTCCCA' K IOSKR INMK P L S TS L I K LQ TQ LL Q Q QQQ QQ0Q N QQ Q QI P Q S S 0 R LA P P L L 0 0 Y 0 P A A A A A A I> 570 600 630 660

S Y SOP L T S KS K L I D VT S V D N A L S K TA O VO 00Q GAO OKE P V MSR PM' > FIG. 3. Sequences of the four HLH genes mC, mB, mA, and m3 (see also Fig. 1). Genomic sequence of two contiguous Hind1ll fragments containing mB and mC was determined. Flanking sequences are not shown. Sequences of mA and m3 were determined from cDNA clones. Downloaded by guest on September 30, 2021 8734 Genetics: Delidakis and Artavanis-Tsakonas Proc. Natl. Acad Sci. USA 89 (1992) ---BASIC--- homopolymeric stretches, ofwhich only m7 carries an S, and m5 mapqsnnsttFVSKTQhYl-KVkKPLLERqRR- (31) m3 a Q, and an A,. The size variation among these proteins m8 . MeYtTKTQiYq-KVkKPMLERqRR- (23) [173-224 amino acids for the E(spl) proteins and 337 m7 . matkyEMSKTYQYR-KVMKPLLERKRR- (26) amino mA. mvlemEMSKTYQYR-KVMKPMLERKRR- (26) acids for hairy] is also largely accounted for by the size mB ... msslqMsEMSKTYQYR-KVMKPMLERKRR- (28) variation of this divergent region. m3 . mVmEMSKTYQYR-KVMKPLLERKRR- (24) With this divergent region m8, m3, and hairy mC ... mavqgqrFMTKTQhYR-KVtKPLLERKRR- (28) m5, contains h mvtgvtaanmtnvlgtavvpaqlketplksdR-RsnKPIMEKRRR- (44) sequence stretches that have relatively high PEST scores according to the algorithm developed by Rogers et al. (25). l8c svaR-R... nArERnRv- The sequences are underlined in Fig. 4 and the scores are ---HELIX I ------LOOP--- indicated in the legend. Such sequences are characterized by m5 ARMNKCLDtLKtLVAE-fq.. .GDdAIlRMD- (58) high content of proline (P), glutamic acid (E), serine (S), and m8 ARMNKCLDnLKtLVAE-lr.. .GDdGIlRMD- (SI) threonine (T), and their presence has been associated with m7 ARINKCLDELKDLMAE-CVaQtGD.A..KFE- (53) rapid intracellular degradation of the proteins carrying them. mA ARINKCLDELKDIMVE-CLtQEGEH.ITRLE- (55) mB ARINKCLDELKDLMVa-tLesEGEH.VTRLE- (57) The predicted protein products of the seven E(spl) HLH m3 ARINKCLDDLKDLMVE-CLqQEGEH.VTRLE- (53) genes exhibit a great degree of sequence similarity beginning mC ARMNlyLDELKDLIVD-tMdaqGEq.VSKLE- (57) at the amino terminus and extending over more than half ofthe h ARINnCLnELKtLILD-atkkDparh.SKLE- (73) molecule. The carboxyl-terminal half is unique for each mem- i8c kqVNngFvnLRqhLpq-tvnslsngGrgssk. . KLs- ber, except for the last four amino acids, which are identical in all proteins. The extensive similarity is in agreement with --HELIX II-- the functional redundancy exhibited by these genes in genetic m5 KAEMLEAaLvFMRKQ-vvkqQAp ...... VSPLPm (86) tests. This redundancy may be only partial, since we have m8 KAEMLEsaViFMRqQ-KtpKkVaqeeq ..... S.LP1 (80) m7 KADILEVTVqHLRKL-KesKkhvpan ...... PE (80) noted greater similarities between m5 and m8, or among m7, mA KADILELTVeHMKKL-RaQKQLrlssvtgg.VSPsaDpklsia (96) mA, and mB, suggesting that only members within each of mB KADILELTVtHLqKM-KQQRQhkrasgdes.LTPA (90) m3 KADILELTVdHMRKL-KQrggLslqgvvagvgSPptststahv (95) these two subgroups may indeed be functionally homologous. mC KADILELTVnYLKaQ-qQQRvAnpq ...... SPpPDqvnl (9o) According to this model, each subgroup and the more diver- h KADILEkTVkHLqeL-qrQqaAmqqa ...... aDpkiv (104) gent m3 and mC may perform a different function. Indeed, we

...... have evidence for the redundancy ofonly m7 and m8, although lsc KVDtLrIaVeYIRgL-qamld... the absence of point mutations in any of the seven genes m5 DSFKnGYMNAVsEISRVMAcTPAMSVDVGKTVMTHLG (123) suggests that any one ofthem is dispensable. Alternatively, the m8 DSFKnGYMNAVNEVSRVMASTPGMSVDLGKSVMTHLG (117) minor structural differences among the E(spl) HLH proteins m7 qSFRAGYIRAANEVSRALASLPrVdVaFGTTLMTHLG (117) mA ESFRAGYVHAANEVSKtLAaVPGVSVDLGTqLMSHLG (133) may be of no functional consequence. If this is the case, mB EGFRSGYIHAVNEVSRsLSqLPGMnVsLGTqLMTHLG (127) functional diversity of the genes is still possible, if each gene m3 ESFRSGYVHAAdqITqVLLqTqq.TdEIGRkI~kfLS (131) is expressed in a different subdomain of the neurogenic region, mC DkFRAGYtQAAyEVSHIFSTVPGLdLkFGThLMkqLG (127) as preliminary data have suggested (D. Hartley and A. Preiss, h nkFKAGFAdcVNEVSRF.... PGIepaqrrrLLqHLs (137) personal communication).

m5 veFQrMlQadqvqtsvttstpr...... pLSPASSGY ndsgaapkpveet Although the ability of the E(spl) HLH proteins to bind m8 .... pLSPASSGYh SDs2 spBt=.. DNA has not been documented and we do not know their m7 mRLSaNOLeOrtDlaavanttsivcasssssstvssasscssISPVSSGYaSDnEs1laisswoa .... subcellular localization, the presence mA hRLNyLQvvvps1pigvp1qapvedqamvtpppsecdt1esgLcspaseass t s ...... of a bHLH domain in mB qRLNQIQpaekevlpvtaplsvhianrdavsvDispissvaasDnsntsstshs11ttidvtkmeddsedeen these proteins makes them good candidates for DNA-binding m3 tRLieLQtq1lqqqqqqqqhqqqqipqssgrlafpllggygpaaaaaaisyssf1tskd1i dvt sydgna factors. Biochemical studies in other HLH proteins have mc hqLkdMkQeeeiidmaeepvnladqkrskspreedihhgee ...... shown that the basic domain contacts DNA, contingent upon h ncINgVktelhqqqrqqqqqsihaqmlpsppsspeqdsqqgaaapylfgiqqtasgyflpngmqviptklpn homo- or heterodimerization, which is achieved via the HLH gsialvlpqslpqqqqqqllqhqqqqqqlavaaaaaaaaaaqqqpmlvsmpqrtastgsasshsvag g c _ D evmdikpsviqrvpmeqqplssvikkqikeeeq domain (13, 26). However, additional functions, such as transcriptional activation, have also been M5 MWRPW (178) attributed to the m8 LWRPW (179) bHLH domain (27). When the bHLH domains of the E(spl) m7 VWRPW (186) proteins are compared with other bHLH domains, it is evident MA MWRPW (195) that some of the invariant residues among the seven E(spl) mB VWRPW (206) m3 VWRPW (224) proteins are unique to them, rather than being common mC VWRPW (173) characteristics of the bHLH superfamily. As such, they might h pWRPW (337) be expected to play a crucial role in the function of the E(spl) proteins and are listed below (see Fig. 4). The data base of FIG. 4. Alignment of sequences of the seven E(spl) proteins and HLH protein sequences used for the following comparisons the hairy protein (h) (23). To highlight similarities, bold uppercase was compiled from those listed in refs. 28 and 29. They letters indicate residues that are identical in all E(spl) proteins, include whereas lightface uppercase letters indicate residues that are similar proteins from Drosophila, Caenorhabditis elegans, , in at least three of the E(spl) proteins (hairy is shown only for amphibians, birds, fish, plants, and yeast. Fig. 4 includes the comparison and is disregarded for the above highlighting). Also for sequence of the Drosophila l'sc bHLH domain as an illustra- comparison we have included the bHLH region ofl'sc (of the ASC tion. The motifs unique to E(spl) proteins are as follows. complex; ref. 24); there is no homology with other regions ofl'sc. The (i) The YXK peptide at the amino end of the basic region. sequences underlined in the carboxyl-terminal parts of m5, m8, m3, No other basic region contains the Y residue, while almost all and hairy have high PEST scores (25): mS, 9.3; m8, 10.2; m3, 6.8; h, other bHLH domains have an R in the place of K. Mutation 7.3. of this R to K in the immunoglobulin gene E47 abolished DNA-binding activity (26). Hairy does not among m7, m3, mA, and mB is particularly evident in the contain this characteristic peptide. sequences of helices I and II. After the end of the 37-amino (ii) The KP in the basic region and the RAR at the acid F+XGY stretch the sequences of the E(spl) proteins basic/helix I junction, two features also shared by the hairy diverge until the common WRPW carboxyl terminus, also protein. All other bHLH families have conserved sequences present in hairy. Within this divergent region there are still at these positions; e.g., for the myogenic factors MyoD, some short similar motifs: e.g., the RLN tripeptide in m7, Myf5, myogenin, and herculin, AT is found in place of KP mA, and mB, or the P(L/I)SPUSS stretch inmS, m8, m7, and and L(R/K/S)K in place of RAR. Davis et al. (27) showed mB. The corresponding region of hairy contains a variety of that replacing the MyoD alanine with a proline abolishes Downloaded by guest on September 30, 2021 Genetics: Delidakis and Artavanis-Tsakonas Proc. Natl. Acad. Sci. USA 89 (1992) 8735

DNA binding, which might suggest that the presence of the ASC DNA, should shed light on the mechanism of action of KP motif in the E(spl) proteins makes them incapable of these proteins. binding DNA. However, both the extremely well-conserved basic region of these proteins and two further experimental We thank our colleagues in the Artavanis lab for valuable sugges- findings with MyoD strongly argue against this conclusion. tions and discussions during the course of this work. Special thanks First, substitution of the Myc basic region into MyoD also go to L. Caron and R. S. Mann for their help and to Annette Preiss abolishes DNA binding, despite the fact that Myc has been for her comments. C.D. was supported by postdoctoral fellowships shown to be capable of binding DNA (27). Second, MyoD from the Muscular Dystrophy Association and the Howard Hughes mutants with other substitutions in the AT motif maintain Medical Institute. This research was supported by the Howard their DNA-binding capacity but fail to induce , as Hughes Medical Institute and the National Institutes of Health they are inactive in transcriptional activation (30). Weintraub (GM29093 and NS26084). et al. (30) have suggested that this region in combination with 1. Doe, C. Q. & Goodman, C. S. (1985) Dev. Biol. 111, 206-219. the DNA target site generates a site for the binding of some 2. Artavanis-Tsakonas, S., Delidakis, C. & Fehon, R. G. (1991)Annu. accessory "recognition factor" needed for transcriptional Rev. Cell Biol. 7, 427-452. activation. It then appears that different basic regions can 3. Artavanis-Tsakonas, S. & Simpson, P. (1991) Trends Genet. 7, bind different DNA target sites, explaining why the proline 403-408. substitution or the Myc basic region destroys the ability of 4. Fehon, R. G., Kooh, P. J., Rebay, I., Regan, C. L., Xu, T., Muskavitch, M. A. T. & Artavanis-Tsakonas, S. (1990) Cell 61, MyoD to bind to its target. In addition to determining binding 523-534. specificity, some residues within the basic region are also 5. Jan, Y. N. & Jan, L. Y. (1990) Trends Neurosci. 13, 493-498. crucial for interactions with accessory factors, with KP being 6. Ghysen, A. & Dambly-Chaudidre, C. (1988) Genes Dev. 2,495-501. the candidate for such a region in E(spl) proteins by analogy 7. Simpson, P. (1990) Development 109, 509-519. with MyoD. 8. Cubas, P., de Celis, J.-F., Campuzano, S. & Modolell, J. (1991) (iii) The aspartate residue in the middle ofhelix I. All other Genes Dev. 5, 996-1008. HLH proteins, including hairy, have uncharged or basic 9. Skeath, J. B. & Carroll, S. B. (1991) Genes Dev. 5, 984-995. residues in this position, with the exception of myogenic 10. Cabrera, C. V. (1990) Development 110, 733-742. 11. Martin-Bermudo, M. D., Martinez, C., Rodriguez, A. & Jimenez, factors, which carry glutamate. F. (1991) Development 113, 445-454. (iv) The G- dipeptide in the loop, which is absent from all 12. Murre, C., Schonleber-McCaw, P. & Baltimore, D. (1989) Cell 56, other HLH proteins, including hairy, with the exception of 777-783. human AP-4. 13. Murre, C., Schonleber-McCaw, P., Vassin, H., Caudy, M., Jan, (v) The glutamate in the middle of helix II, also present in L. Y., Jan, Y. N., Cabrera, C. V., Buskin, J. N., Hauschka, S. D., hairy. The only other HLH protein with an acidic residue Lassar, A. B., Weintraub, H. & Baltimore, D. (1989) Cell 58, (aspartate) in this position is human Max. Most other 537-544. HLH 14. Cabrera, C. V. & Alonso, M. C. (1991) EMBO J. 10, 2965-2973. proteins carry a basic residue (e.g., see l'sc in Fig. 4). 15. Knust, E., Tietze, K. & Campos-Ortega, J. A. (1987) EMBO J. 6, As noted earlier, hairy exhibits a high degree of sequence 4113-4123. similarity with the E(spl) HLH proteins and antibodies to the 16. Delidakis, C., Preiss, A. & Artavanis-Tsakonas, S. (1991) Genetics hairy protein have shown that hairy is indeed nuclear (31). 129, 803-823. Although hairy is involved in early embryonic segmentation 17. Klambt, C., Knust, E., Tietze, K. & Campos-Ortega, J. A. (1989) EMBO J. 8, 203-210. and does not seem to participate in embryonic neurogenesis, 18. Preiss, A., Hartley, D. A. & Artavanis-Tsakonas, S. (1988) EMBO its larval/pupal function is genetically similar to that of J. 7, 3917-3927. E(spl), as it appears to limit the number of imaginal disk cells 19. Hartley, D. A., Preiss, A. & Artavanis-Tsakonas, S. (1988) Cell 55, that will become peripheral nervous system precursors (32). 785-795. Dosage-dependent interactions with genes of the ASC have 20. Reed, K. C. & Mann, D. A. (1985) Nucleic Acids Res. 13, 7207- suggested that hairy achieves this by suppressing the activity 7210. 21. Hart, A. C., Kramer, H., Van Vactor, D. L., Jr., Paidhungat, M. & of ASC genes, primarily ac, in cells neighboring a neural Zipursky, S. L. (1990) Genes Dev. 4, 1835-1847. precursor (33). Indeed the hairy protein pattern in imaginal 22. Brown, N. H. & Kafatos, F. C. (1988) J. Mol. Biol. 203, 425-437. disks appears to be complementary to the ac/sc pattern in 23. Rushlow, C. A., Hogan, A., Pinchin, S. A., Howe, K. M., Lardelli, regions where sensory organ precursors arise (9, 34), as if M. & Ish-Horowicz, D. (1989) EMBO J. 8, 3095-3103. these proteins suppress each other's gene activity. We expect 24. Alonso, M. C. & Cabrera, C. V. (1988) EMBO J. 7, 2585-2591. 25. Rogers, S., Wells, R. & Rechsteiner, M. (1986) Science 234, the action ofE(spl) proteins in the embryonic central nervous 364-368. system to be similar, since expression of ASC genes appears 26. Voronova, A. & Baltimore, D. (1990) Proc. Natl. Acad. Sci. USA to be a prerequisite for the commitment of a subset of 87, 4722-4726. embryonic neuroblasts (10, 11, 35). However, the mechanism 27. Davis, R. L., Cheng, P.-F., Lassar, A. B. & Weintraub, H. (1990) by which the suppression of ASC genes is achieved is still Cell 60, 733-746. uncertain. Another class ofHLH proteins, which lack a basic 28. Benezra, R., Davis, R. L., Lockshon, D., Turner, D. L. & Wein- traub, H. (1990) Cell 61, 49-59. region [e.g., Id or extramacrochaetae (28, 36-38)], are known 29. Blackwood, E. M. & Eisenman, R. N. (1991) Science 251, 1211- to antagonize the action of DNA-binding HLH factors by 1217. forming heterodimers unable to bind DNA (Id with MyoD 30. Weintraub, H., Dwarki, V. J., Verma, I., Davis, R., Hollenberg, S., and extramacrochaetae with ASC proteins). This titration Snider, L., Lassar, A. & Tapscott, S. J. (1991) Genes Dev. 5, mechanism sets a threshold for the activity of the bHLH 1377-1386. 31. Carroll, S. B., Laughon, A. & Thalley, B. S. (1988) Genes Dev. 2, factors. Van Doren et al. (38) showed that neither hairy nor 883-890. E(spl) m8 can block binding of ASC-da complexes to their 32. Ingham, P., Pinchin, S. M., Howard, K. R. & Ish-Horowicz, D. targets in an in vitro assay, suggesting that their in vivo (1985) Genetics 111, 463-486. mechanism of action might be different from that of Id and 33. Moscoso del Prado, J. & Garcia-Bellido, A. (1984) Rour's Arch. extramacrochaetae. Dev. Biol. 193, 242-245. A possible mechanism for the antagonistic action of E(spl) 34. Carroll, S. B. & Whyte, J. S. (1989) Genes Dev. 3, 905-916. 35. Jimenez, F. & Campos-Ortega, J. A. (1990) Neuron 5, 81-89. and hairy is that their products act as transcriptional repres- 36. Ellis, H. M., Spann, D. R. & Posakony, J. W. (1990) Cell 61, 27-38. sors of the ASC genes. Analysis of the distribution of the 37. Garrell, J. & Modolell, J. (1990) Cell 61, 39-48. different E(spl) HLH proteins in the neurogenic ectoderm, 38. Van Doren, M., Ellis, H. M. & Posakony, J. W. (1991) Develop- and in vitro studies of binding to regulatory regions of the ment 113, 245-255. Downloaded by guest on September 30, 2021