The CW Domain, a Structural Module Shared Amongst Vertebrates, Vertebrate-Infecting Parasites and Higher Plants
Total Page:16
File Type:pdf, Size:1020Kb
576 Update TRENDS in Biochemical Sciences Vol.28 No.11 November 2003 yeast, causes growth arrest with a terminal phenotype similar to that GAP activity and regulates mTOR signaling. Genes Dev. 17, caused by nitrogen starvation. Genetics 155, 611–622 1829–1834 17 Matsumoto, S. et al. (2002) Role of the Tsc1–Tsc2 complex in signaling 22 Zhang, Y. et al. (2003) Rheb is a direct target of the tuberous sclerosis and transport across the cell membrane in the fission yeast tumour suppressor proteins. Nat. Cell Biol. 5, 578–581 Schizosaccharomyces pombe. Genetics 161, 1053–1063 23 Im, E. et al. (2002) Rheb is in a high activation state and inhibits B-Raf 18 Garami, A. et al. (2003) Insulin activation of Rheb, a mediator of kinase in mammalian cells. Oncogene 21, 6356–6365 mTOR/S6K/4E-BP signaling, is inhibited by TSC1 and 2. Mol. Cell 11, 24 Onda, H. et al. (2002) Tsc2 null murine neuroepithelial cells are a 1457–1466 model for human tuber giant cells, and show activation of an mTOR 19 Castro, A.F. et al. (2003) Rheb binds TSC2 and promotes S6 kinase pathway. Mol. Cell. Neurosci. 21, 561–574 activation in a rapamycin- and farnesylation-dependent manner. 25 Kenerson, H.L. et al. (2002) Activated mammalian target of rapamycin J. Biol. Chem. 278, 32493–32496 pathway in the pathogenesis of tuberous sclerosis complex renal 20 Tee, A.R. et al. (2003) Tuberous sclerosis complex gene products, tumors. Cancer Res. 62, 5645–5650 tuberin and hamartin, control mTOR signaling by acting as a GTPase- activating protein complex toward Rheb. Curr. Biol. 13, 1259–1268 0968-0004/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. 21 Inoki, K. et al. (2003) Rheb GTPase is a direct target of TSC2 doi:10.1016/j.tibs.2003.09.003 |Protein Sequence Motif The CW domain, a structural module shared amongst vertebrates, vertebrate-infecting parasites and higher plants Jason Perry1,2 and Yunde Zhao2 1Department of Molecular and Cellular Biology, Harvard University, 7 Divinity Avenue, Cambridge, MA 02138, USA 2Section of Cell and Developmental Biology, Division of Biological Sciences, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA A previously undetected domain, named CW for its con- zinc-finger proteins participate in a host of fundamental served cysteine and tryptophan residues, appears to be biological processes ranging from DNA-based activities, a four-cysteine zinc-finger motif found exclusively in such as transcription, replication and recombination to vertebrates, vertebrate-infecting parasites and higher ubiquitination and the assembly of large protein com- plants. Of the twelve distinct nuclear protein families plexes [3]. Here, we define the CW domain, which is that comprise the CW domain-containing superfamily, composed of at least four cysteine and two tryptophan only the microrchida (MORC) family has begun to be residues and is predicted to be another building block in characterized. However, several families contain other the repertoire of zinc-binding structural modules. domains suggesting a relationship between the CW domain and either chromatin methylation status or Identification of the CW domain early embryonic development. We first noticed an unusual pattern of cysteine and tryptophan residues upon manual inspection of a pre- The term zinc finger was originally coined to describe a dicted protein sequence (CAB83140) that emerged from series of zinc-binding modules present in transcription one of our genetic screens (for auxin mutants) after no factor IIIA, but its definition has since expanded to include conserved domains were found searching the Pfam and a wide variety of compact domains whose structures are SMART databases [5–7]. We next performed iterative stabilized by the presence of one or more protein-chelated PSI–BLAST searches of the non-redundant protein zinc ions [1–3]. Grishin and colleagues have recently database using the described segment of sequence (of 61 undertaken a structural classification of zinc-binding amino acids) as a query with an inclusion cut-off score domains, and have found that there are at least eight of 0.005 [8]. From this analysis, 102 sequences were canonical protein-fold groups that fall under the umbrella retrieved, 36 of which were above the cut-off score of ‘zinc finger’ with the C H , the treble-cleft and the zinc- 2 2 (Figure 1). Of these 36 sequences, there were no false ribbon groups being the most highly represented in the positive identifications. Six additional candidate sequences current databases [4]. From a biological perspective, zinc emerged following manual inspection of the sequences fingers are perhaps even more diverse. Although they are that had been retrieved in the search but had fallen below most commonly known for interacting with nucleic-acid our prescribed threshold (Figure 1). The search converged templates, such as DNA and RNA, zinc fingers also after the fourth iteration, and the same group of proteins promote protein–protein interactions and interactions between proteins and small molecules. In these contexts, was retrieved regardless of which newly retrieved sequence was used as the query in subsequent searches. Corresponding author: Yunde Zhao ([email protected]). Domain boundaries were delimited by compiling all of the http://tibs.trends.com Update TRENDS in Biochemical Sciences Vol.28 No.11 November 2003 577 Mm AA017388 AMQVPTTIQCCW DL------- LKR TLPFQLSAVEEGYPVNWC V SMNP--DPEQDQCEAFELKQK Mm BAC36567 AMQVPTTIQCCW DL------- LKR TLPFQLSAVEEGYPINWC V SMNP--DPEQDQCEAFELKQK Mm XP139877 AMQVPTTIQCCW DL------- LKR TLPFQLSAVEEGYPINWC V SMNP--DPEQDQCEAFELKQK I Rn XP234834 AMQVPTTIQCCW DL------- LKR TLPFQLSAVEEGYPDNWC V SMNP--DPEQDQCEAFEQTQK Hs BAA74875 AMEIPTTIQCCW DL------- LKR TLPFQLSSVEKDYPDTWC V SMNP--DPEQDRCEASEQKQK Mm XP125977 AMEIPTTIQCCW DL------- LKR TLPFQLSSVETDYPDTWC V SMNP--DPEQDRCEASEQKQK Mm AAH30893 AMGIPFILQCCW DL------- LKR VLPSSSNYQEKGLPDLWC I ASNP--NNLENSCNQIERLPS Hs ADD43004 AMGIPFIIQCCW DL------- LKR VLPSSTNYQEKEFFDIWC I ANNP--NRLENSCHQVECLPS At CAB83140 VIIQEHWVACCW DK------- GKR LLPFGVFP--EDLPEKWC M TMLN-WLPGVNYCNVPEDETT II Os BAC16660 VVIEEHWVCCCW DI------- QKR LLPYKMNP--SLLPKKWC K SMLQ-WLPGMNRCEVSEDETT At BAC42503 SVDLDYWAQCCW ES------- EKR LLPYDLNT--EKLPDKWC L SMQT-WLPGMNHCGVSEEETT At CAB78615 SVDLDYWAQCCW ES------- EKR LLPYDLNT--EKLPDKWC L SMQT-WLPGMNHCGVSEEETT At AAB63089 SGEQERWATCCW DD------- SKR RLPVDA-----LLSFKWC T IDNV-WDVSRCSCSAPEESLK III At AAL32570 SGEQERWATCCW DD------- SKR RLPVDA-----LLSFKWC T IDNV-WDVSRCSCSAPEESLK Os BAC20873 SGENHQWAQCCW ED------- SKR KLPVDA-----LLPSKWC T SDNK-WDSERSSCDSAQEINM Os BAB64692 SELSETWVQCCW DA------- RKR RLLDGTA---LDSSTAWC F SMNP--DSARQKCSIPEESWD IV At CAB77566 DVESDIWMQCCW DS------- SKR RIIDEGV---SVTGSAWC F SNNN--DPAYQSCNDPEELWD At AAC34358 YSTESAWVRCCW DD------- FKR RIPASVVGS-IDESSRWC I MNNS--DKRFADCSKSQEMSN V At NP177854 YSTESAWVRCCW DD------- FKR RIPASVVGS-IDESSRWC I MNNS--DKRFADCSKSQEMSN VI At BAC43037 QCSAVNWLQCCW REEDTNGVI GKR RAPRS-----EVQTKDWECFCCFSWDPSRADCAVPQELET Pf CAD49237 1st IPDQDNWVQCCW DL------- EKR RLPQNINM--ENLPKVWC Y KLNN--DVRYNSCDIQEEVVV VII Py EAA17514 1st IQENDNWVQCCW DK------- EKR KLPSNTDI--SKLTNTWC Y SLNG--DTRYNSCEIEEEIVI Pf CAD49237 2nd VPVVVNWVQCCW EL------- KKR KLDAHINI--SLLPEKWC Y SLNF-WN-AYNNCDMEEEIYV Py EAA17514 2nd ---VVNWVQCCW EN------- KKR KVDAHVNV--TKLPDEWC Y SLNF-WN-KYNNCDAEEEIYI Hs BAC04028 FGQCLVWVQCCW SFPN----- GKR RLCGNIDP--SVLPDNWC S DQNTA-DVQYNRCDIPEETWT Hs CAB66669 FGQCLVWVQCCW SFPN----- GKR RLCGNIDP--SVLPDNWC S DQNT--DVQYNRCDIPEETWT Hs AAH02725 FGQCLVWVQCCW SFPN----- GKR RLCGNIDP--SVLPDNWC S DQNT--DVQYNRCDIPEETWT VIII Hs BAA91424 FGQCLVWVQCCW SFPN----- GKR RLCGNIDP--SVLPDNWC S DQNT--DVQYNRCDIPEETWT Rn XP222057 FGHCVIWVQCCW SSPK----- EKR QLRGDIDP--SVLPDDWC S DQNP--DLNYNRCDIPEESWT Hs XP087384 MYVNKVWVQCCW ENEN----- LKR LLSSEDSAK-VDHDEPWC Y FMNT--DSRYNNCSISEEDFP Rn XP236697 EFVNTMWVQCCW ENES----- LKR LLSPAAAAR-VDLGEPWC F SMNA--DSSYNSCAASEQDFP Hs CAD23056 KVPDQTWVQCCW DE------- LKR KLPGKIDP--SMLPARWC F YYNS--HPKYRRCSVPEEQEL Hs BAB71125 KVPDQTWVQCCW DE------- LKR KLPGKIDP--SMLPARWC F YYNS--HPKYRRCSVPEEQEL Mm BAC28032 RIPDQTWVQCCW DE------- LKR RLPGMVDP--STLPARWC F YYNP--HPKFKRCSVPEEQER IX Hs NP056173 KRPDQTWVQCCW DA------- LKR KLPDGM----DQLPEKWC Y SNNP--DPQFRNCEVPEEPED Hs BAA09485 KRPDQTWVQCCW DA------- LKR KLPDGM----DQLPEKWC Y SNNP--DPQFRNCEVPEEPED Rn XP221635 KRPDQTWVQCCW DA------- LKR KLPDGI----DQLPEKWC Y SNNP--DPQFRNCEVPEEPED X Lm CAB95222 DLKDDFWVQCCW DD------- NAH KLATRL----NPVPDTWC T GFLG------LSCSSRKKKHK CW consensus/70% ------WlQCD--------C-KWR-LP---------hP--W-C------------C---EE--- Mm NP758466 ---LPYWVQCT-----KPECGKWRQLTKEIQLT-PHMARTYRCGMKPNTITKPDTPDHCSFPEDL XI Rn XP225213 ---LPYWVQCT-----KPECRKWRQLTKEIQLT-PHMARTYRCGMKPNTITKPDTPDHCSFPEDL At AAM65572 ------TVQCA-------SCFKWRLMPSMQKYEEIREQLLENPFFCDTARE-WKPDISCDVPA Zm AKK40309 ------TVQCA-------KCFKWRLIPTKEKYEEIRERIIQEPFVCKRARE-WRPDITCNDPE XII At CAB87748 ------MVQCT-------DCKKWRLIPSMQHYNIIKETQLQTPFVCGTTSG-WTPNMSCNVPQ Zm AKK40305 -----YAVQCY-------KCYKWSTVPK-EEFETLRENFTKDPWFCSR-----RPDSSCEDDA Figure 1. Multiple sequence alignment of the described CW domains. The alignments were constructed with ClustalW (http://us.expasy.org/tools/) and further refined with guidance from the PSI–BLAST pairwise alignments [8]. Those sequences above the CW consensus line (70% of sequences, capital letters are specific amino acid residues, lowercase letters are: l, aliphatic hydrophobic; h, any hydrophobic) (protein families I–X) were retrieved from the non-redundant protein sequence database with