Secondary Structure Prediction for RNA Binding Domain in RNP Proteins Identifies Pap As the Main Structural Motif
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Volume 257, number 2, 373-376 FEB 07796 November 1989 Secondary structure prediction for RNA binding domain in RNP proteins identifies Pap as the main structural motif A. Ghetti, C. Padovani, G. Di Cesare and C. Morandi rst~tuto di Science Bio~ogi~~e, university di Verona, Strada le Grazie, 37334 Verma, Italy Received 23 August 1989 In eukaryotic cells transcript processing is strictly dependent upon binding of specific proteins. Nuclear RNA binding proteins share a common domain, which is involved in RNA binding. In order to characterize RNP-RNA interactions we have performed a secondary structure prediction based both on statistical algorithms and comparative analysis of different proteins. A high conservation for secondary structure propensity between different RNPs was observed. Ribonucle~c protein; Structure prediction I. INTRODUCTION polypeptide) confers with the protein the affinity for single-stranded nucleic acids. Further analysis of UP1 In recent years, it has been demonstrated that sequence domain revealed that it contains a tandem eukaryotic mRNAs are always associated in the duplicate of 90 residues. cytoplasm as well as in the nucleoplasm with specific Later works identified the UPl’s subdomain in other proteins which play an important role in transcript nuclear proteins from different sources: polyadenylate metabolism [ 1,2]. binding proteins [lo-121, Cl hnRNP [13], nucleolin After transcription by Pal II, the synthesized [ 141, proteins bound to small nuclear RNAs (snRNP) heterogeneous nuclear RNA (hnRNA) tightly [15-171 and others. associates with a set of proteins called ‘heterogeneous A common characteristic of these proteins is their nuclear ribonucleic proteins’ (hnRNP) to form a ability to bind single-stranded nucleic acids in vitro and NAfprotein complex in a ‘nucleosome-like’ structure, RNA in vivo. sedimenting at 200-250 S. The 90 residue sequence, now called ‘RNA binding The hnRNA intimately associated with the hnRNP in domain’, is strikingly conserved from yeast to man and the nucleus undergoes several modifications, such as it has also recently been found in plants [18]. RNA bin- capping at the 5’ end, polyadenylation at the 3’ end ding domain contains two short stretches, whose degree and splicing, before being secreted through the nuclear of conservation through evolution is total; these pores into the cytoplasm [3-61. regions contact RNA as shown by Merril and The nuclear RNA binding proteins (hnRNP) include coworkers using UV crosslinking [19]. at least 20 protein species, only partially characterized, It has generally been assumed that hnRNP proteins which are for the majority basic and with molecular bind RNA regardless of nucleic acid sequence. Recent masses ranging from 32 to 120 kDa [7]. data suggest that this could not be true. Swanson and Recently several cDNA corresponding to these pro- Dreyfuss demonstrated that in vitro Al protein has a teins have been isolated and sequenced. The first was specific affinity for sequences located at the 3 ’ end of characterized by Riva and coworkers, during a research mammalian introns [20]. Other, more indirect evidence on single-stranded DNA binding proteins (ssDBP) in comes from studies on development in Drosophila; it eukaryotic cells [8]. This work demonstrated that has been reported that sex determination during human single-stranded DNA binding protein UP1 development is regulated at the level of RNA process- (identified by Alberts, see [9]) is a degradation product ing of particular transcripts. The proteins controlling of hnRNP protein Al. It has been hypothesized that this alternative splicing contain the RNA binding do- hnRNP proteins contain distinct domains and one of main that could interact with transcripts in a sequence- these (in the specific case the one corresponding to UPI specific way, thus locating different splicing sites [21-231. Correspondence address: C. Morandi, lstituto di Scienze Biologiche, In view of all these data it would be of great interest Universit& di Verona, Strada le Grazie, 37134 Verona, italy to define the three-dimensional structure of RNA bin- Published by Elsevier Science Publishers B. V. (Biomedical Division) 00145793/89/$3.50 0 1989 Federation of European Biochemical Societies 373 Volume 257, number 2 FEBS LETTERS November 1989 a b 1 SPKEPEQLRKLFIGGLSF-ETTDESLRSHFEQWGTLTDCVVMRDPNTKRS---RGFGFVTY-ATVEEVDAAMNARPHKVDGRVVEPKRAVSREDS It*****==== **IX* =====5_ ==== l ************ ===E== I*,******======== t*t~*,**C,tE======_____ ---__ ======x= X***C+**I*** ===z 2 RPGAHLTVKKIFVGGIKE-DTEEHHLRDYFEQYGKIEVIEIMTDR---GSGKKRGFAFVTF-DDHDSVDKIVIQKYHTVNGHNCEVRKALSKQEMASAS =======I l x.*+**x****, I=== ======** l *t**tt*+=~~=~ X****t**X*t**t --------ii--_----- *** .I********* =z===== ===E=E_______----- l *******.******* 3 SITEPEHMRKLFIGGLDY-RTTDENLKAHFEKWGNIVDVVVNKDPRTKRSS---GFGFITY-SHSSMIDEAQKSRPHKIDGRVVEPKRAVPRQD l **t*+t**** ====z= ==== *t***** =ij=z= ====z===5= l ***t**,~C=E===-_____ -_--_- ===z=== l *lt**+C cc=== 0 SPNAGATVKKLFVGALKD-DHOEOSIRDYFOHFGNI”DlNIVlDKETGKK---RGFAFVEF-DDYDPVDKVVLQKQHQLNGK~VDVUKALPKQN ====*I(*** l ****** =IE==E=.C*C =11== ****C***ttX* l ******** =======z=== l *****Cl**==== I=====_________-_- =z===== ====z= 5 KTDPRSMNSR”FIGNLNTLVVKKSD”EAIFSKYGKI”GCSV~-----------KGFAFVQYVNERNARAA”AGEDGRMIAGQ”LDINLAAEPKVNRG ====z- +*t**t*c*+* ==iL== =E=== l tlXIC*1*,ttt+l**==I== ======5=========*Clt,tX*X=E====_==== =======iis *x*,x**** ===E=== b EHPQASRCIG”FG--LNT-NTSQHKVRELFNKYGPIERIQM”IDAQTQRS---RGFCFIYFEKLSDARAAKDSCSGIE”DGRRIR”DFSITQRA ====== =xc=== I====*t*l*****C** ===== =========z t***tt*t =====z== ======**********t =cI==>=====D 7 NDPRASNT-NLI”NYLPQ-DNTDRELYALFRAIGPINTCRI~RDVKTGYSF---GYAFVDFTSEMDSQRAIKVL-NGIT”RNKRLKV =====z=== ******I*** ===1P z= -----**********I====_-__- ===2== =====El===,* ********XX ===x==== == ====z *ttt*,r*c=~= ==x=== 8 PGGESIKDTNLYVTNLPR-TITDDQLDTIFGKYGSI”QKNILRDKLTGRP---RGVAF”RYNKREEAQEAISAL-NN”lPEGGSQPLSVR ===s ===Ez=== z===== ==== l *******t* cc== =========== ttt***==== ======t**tt* ====== **c,*,*c**c* ===== ==:== 9 AQGDAFK-T-LF”AR”NY-DTTESKLRREFE”YGPIKRIHIVYNK~~EGSGKPRGVAFIEYEHERDMHSAVK~AD-GKKIDGRR”LVD”ERTVKG == = ===5= l t**llt**** zc===== ==3=*+***Xlt*,,C*t*t =======z= ==== = ==s===== l *******t =========c ======*,+t*t* ****XI ==Ez===i_ 10 SVENSSA-S-LYVGDLEP-SVSEAHLYDIFSPIGSVSSIRVCRDAITKTST---GYAYVNFNDHEAGRKAIEOL-NYTPIKGRLCRIMUSQ = ==== l ******3=P=== =======ExzS= ====I *t**t**t** =====z= l X**Ctt=r===l ======3==== I======***,******** c======== II LRKKGSG--NIFIKNLHP-DIDNKALYDTFSVFGDILSSKIATDE-NGKS---KGFGFVHFEEEGAAKEAIDAL-NGMLLNGQEIYVAPHL ===== l ** x1===_____--- **** a=== l ******t*.*t,** I =,=3=== ==3 _----_-----_--_- * l ****t =r====ii l *,,* -_-__-_-__--**,******t*r** ==l=ii===l= ‘2 ETKAHY-TNLYVKNINS-ETTDEOFQELFAKFQPIVSASLEKDAD-GK-L--KGFGFVNYEUHEDAVVKAVEALNDSELNGEKLYVGRANKKNERM ===a l t*tCX**XtXCllr*~~C**Cft === ItC*~,lt*tt.tt*t~*t~****~**~~**~ =r== ======= l * *t**Xtt**t*C=s=== l **** ======*Cll’tt~tt~*l*II ====z 13 EKfiAKYQGVNLF”KNLDD-S”DDEKLEEEFAPYGTITSAK”MRTE-NGKS---KGFGF”CFSTPEEATKAITEK-NQQIVAGKPLYVAIAQRUD”R == l l CtCXCX*t~r~+*~*C*r~~=~*~* I=== l ***(l****** **=====zz=rxs ==I=_____-_---__-_ *et* tXltCIXC*,I======P=====I== ______*,*******t====_---_- ==‘=D==**t**tt*t* I4 DRITRYQG”NLYVKNLDD-GIDDERLRKEFSPFGTITSAK”UME--GRKS---KGFGF”CFSSPEEATKAVTEM-NGRI”ATKPLY”ALAQRKEERQ ==========zz===P l ******* ***t**** ==cJ ***it******* *c===~_-_---___--- ===________=_------- l ********** =====x **Xl.****** III====~=== 15 GARFIKEF-TNVYIKNFGE-DMDDERLKDL---FGPALSVKVM~DESGK--L--KGFGFVSFERHEDAQKAVDEM-NCEL-NGKQIYVGRAQKKVERE l ****** I= t** *t*t*C*tt* t**=~==**t”*t l **************** l * ==z== *****=I ===== **t l ***,**t.* 1===I====== --..-_-t********C**,*-_---_ Pz== ======z= 16 PSLRKSG”GNIFIKNLDK-SIDNKALYNTFSAFGNILSCK”VCD-ENG-S---KGYGF”HFETQCAAERAIEKM-NGNLLNDRK”F-GRFKSRKERE ===x=== l ***tic===== ====3= -____------**C*L******+4t* l *******=== = ****t*=== ====_zzIzE ==‘===*“X******CC*+t x========= = 17 PSAPSYIMASLY”CDLHP-D”TEAMLYEKFSPAGPILSIRVCRDMITRRSL---GYAY”NFQQPADAERALDlM-NFD”IKG-K-FVRIfl~SQRD l *t****t*** =I===II~______ --_--- =E==== *****XC+** l t***,* = ===== l ***.**** =ii=s===IP==== = =======l****,*t*t* = _-__---------- = ===== IS EGSESTTPFNLFIGNLNP-NKSVAELK”AISEPFAKNDLA””-D”RTGTNR--K-FGY”DFESAEDLEKALELT-GLKVFGNEIKLEKPK =z=== t*t*t*t*tt**ttt**t == === === l *****.******* l *********** ======z l ********* ====z i(==z ====I*t**tt***ttt. =/===c 19 D~KKVRAARTLLAKNLSF-NITEDELKEVFEDALEI-RL-VSQD---GKS---KGIAYIEFK~EADAEKNLEEKQGAEI-DGRSVSLYYTGSKGQ l t,*****+*+tt* t**t*t*tC***~t~C l * l * ====*t*t*t~~tt*t*tt.**~*~ =51== *r****C.***X+====== l ************* == == ======**C*l*t**t*tt+*~~* z==15== 20 NSTWSGESKTLVLSNLSY-SATEETLOEVFEKATFI-KVPQ-NQQ--GKS---KGYAFIEFASFEDAKEALNSCNKMEIEGGRTIRLELQ~PRG ===== l ************** ==‘~ttttC*t+**tC~*t~*~~~* ==m= ====== l ****~*Xt*******= ---- === ======*tt*t**tt*~*X ====z~== 21 NARSQPSKTLFVKGLSE-DTTEETLKESFEGS”RA-R~VT--DRETGSS---KGFGFVDFNSEEDAKAAKEAMEDG~IDGNK”TLD~AKR~~E ====== +C*tt*,tttt*t**ttPn~~~ == ===%x l tt*******t*t*t ===z=== l ********** ==a==== = ====c= “*t*t”fttt.**t ===== 22 MAAADVEYRCFVGGLAW-ATSNESLENAFASYGEILDSKVITDRETGRS---RGFGFVTFSSENS~LDAIENMN~KSLD~RNIT”NQAQSR t*X*+*tt*I**X.*===rn l *~*tt**t****+t == =z== ====== l ** l * l ******** ===== **“***.+*****C === =z=== Fig. 1. Secondary structure prediction for RNA binding domain of RNP proteins. Aligned sequences are shown together with secondary structure prediction; below each sequence