1 4 5

SPLIT GE NES A ND R NA SPLICI NG

Nobel Lect ure, Dece mber 8, 1993 b y

P HI L LI P A. S H ARP Depart ment of and the Center for Cancer Research, Massachusetts Institute of Technology, Ca mbridge, M A 02139 - 4307, US A

I NTRODUCTIO N By the late 70s the physical str uct ure of a was fir mly establishe d fro m work in bacteria. The seq uences of the gene, the R N A an d the were colinearly organize d an d ex presse d. Since the science of genetics s uggeste d t hat ge nes i n e ukaryotic orga nis ms be have d si milarly to t hose of prokaryotic orga nis ms, it was nat urally ass u me d that this bacterial gene str uct ure was u ni v ers al. It f oll o w e d t h at if t h e g e n e str u ct ur e w as t h e s a m e, t h e n t h e mechanis ms of regulation were probably very si milar, and thus what was tr ue of a bacteri u m wo ul d be tr ue of an ele phant. Ho we ver, ma ny descri pti ve bioc he mical as pects of t he ge netic material an d its ex pression in cells with n uclei s uggeste d that the si m ple molec ular bi ol o g y of g e n e e x pr essi o n i n b a ct eri a mi g ht n ot b e u ni v ers al. First, o b vi- o usl y, R N As transcribe d fro m nuclear are physically separate d fro m t h e tr a nsl ati o n al m a c hi n er y i n t h e c yt o pl as m. T h us, t h e n u cl e ar c o m p art- ment co ul d be the site of selective R N A processing an d trans port. Secon d, t he D N A co nte nt of e ukaryotic ger m cells varie d sig nifica ntly bet wee n orga nis ms wit ho ut a n a p pare nt variatio n i n t he n u mber of ge nes. So me or g a nis ms a p p e ar e d t o h a v e t e n ti m es as m u c h D N A as w as r e q uir e d t o enco de all of the . Thir d, the previously describe d pheno menon of heterogeneous nuclear R N A (hnR N A) s u g geste d t hat lo n g R N As were transcribe d fro m diverse n uclear seq uences (1). These hnR N As ha d a s hort h alf-lif e r el ati v e t o c yt o pl as mi c mR N As an d th us co ul d potentially be pre- c urs ors t o m R N As. F urther more, both the long hn R N A an d the shorter mR N A a p peare d to have mo difie d 5’ a n d 3’ ter mi ni i n co m mo n, a 7 m GpppX c a p ( 2, 3, 4) a n d p ol y a d e n yl ati o n tr a cts ( 5, 6, 7), res pectively. T he meaning of these observations concerning hnR N As re maine d controversial as it was not possible to establish a prec ursor- pro d uct relationshi p bet ween the nuclear R N A population and the cytoplas mic m R N A. Whether or not the str uct ure of genes in e ukaryotic cells was the sa me as that in bacteria was not really q uestione d at that ti me. The i m portant iss ue was to establis h t he exact bioc he mical pat h way bet wee n a ge ne i n t he n u cl e us a n d its mR N A i n t he cyto plas m. T his hy pot hetical pat h way was 1 4 6 Physiology or Medicine 1993

pi ct ur e d as b e gi n ni n g wit h i niti ati o n of tr a ns cri pti o n b y R N A p ol y m er as e II(B) which would proceed’through co mpletion of beyond the ter mi n us of t he ge ne. T he n uclear prec ursor R N A tra nscribe d fro m t he ge ne was pote ntially processe d a n d t he m R N A selectively tra ns porte d to t he cyto plas m. Reg ulatio n of ge ne ex pressio n, t he basis of al most all i nteresti ng biology, incl u ding cancer, cell ular res ponses to infection, an d develo p ment, w o ul d pri maril y res ult fr o m c ha n ges i n t he rates or efficie ncies of t he vario us ste ps in the path way. Th us, un derstan ding the path way of e ukaryo- tic pro mised ne w insights into the mechanis ms of regula- tio n a n d t he bioc he mical e ve nts co ntrolli ng bot h di verse biological a n d bio medical pheno mena.

BACKGROUND T he late sta ge of a de n o vir us i nfecti o n was c h ose n as t he best s yste m f or st u d yi n g t he pat h wa y of n uclear ge ne n uclear prec ursor R N A c yt o- pl as mi c mR N A protei n. We ha d previo usly establis he d several restrictio n e n d o n u cl e as e cl e a v a g e m a ps of t h e vir al g e n o m e ( Fi g. 1; 8) and had used frag me nts of t he ge no me to ma p t he positio ns of cyto plas mic mR N As ge nerate d d uri n g t he earl y a n d late sta ges of i nfecti o n (9). We ha d als o deter mine d the ab un dance (in co pies per cell) of both n uclear an d c yt o plas- mi c R N As (l 0), a n d t his s u g g est e d t h at R N As fro m both co mpart ments co ul d be obtaine d in a deq uate a mo unts to co m pare their str uct ures directly using the electron microscope an d the then recently develope d R N A .D N A h ybri dizatio n met ho ds. F urt her more, t hese earlier st u dies establis he d t hat sets of viral R N A se q ue nces were restricte d t o t he n ucle us, s u g gesti n g a selection in processing an d /or transport of only certain R N A sequences to t he cyto plas m (11). Fi nally, several st u dies ha d establis he d t hat lo ng n uclear

hexon

A B F D E C I I E c o RI

G C B J D A H L E F K I I I I II I I II I I I Hi n d III

I I I I I I I I I I I M A P U NI T 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 80 90 100 Fi g. I: M a p of cle a v a ge sites f or restrictio n e ndo n ucleases E c o RI a n d Hi n dIII. The approxi mately 35,000 b p of a dc novir us 2 D N A was assigne d as 100 ma p units. The positions of cleavage sites are de note d by vertical li nes a n d frag me nts are lettere d o n t he basis of le ngt h. T he r a n d 1 vir al stra n ds are tra nscribe d to t he rig ht a n d left, res pectively. T he seq ue nces co nstit uti ng t he bo dy of m R N A s pecifyi ng t he ab u n da nt hexo n (II) protei n are e nco m passe d by t he bar abo ut t he bo u n dary of E c o RI fra g me nts A a n d B. P hilli p A. S h ar p

Fi g. 2: El e ctr o n micrographs of h y bri d s of hexon m R N A a n dfr a g m e nts of A d 2 D N A ( 1 3). Exa m ples of t h e latter R-too p hybri ds obser ve d after i nc ubatio n of hexo n mR N A a n d d u plex Hi n d III A frag ment D N A arc s h o w n i n A a n d B a n d is diagra m med sc he matically i n C. Si milarl y, t w o exa m ples of hybri ds of hexo n m R N A an d the single-stran de d Hi n d III A frag ment are sho wn in D a n d E. A sc he matic. of t he hybri d str uct ure s ho w n i n E is give n i n F. T he si ngle-stra n de d R N A at t h e e n d of t h e h y bri d r e gi o n is r e pr es e nt e d b y a w a v e-li k e li n e. I n A, B, D, a n d E t h e p ositi o ns of t h e K N A t ails at t h e 5’ a n d 3’ e n ds of t he hybri ds are de note d by arro ws. A n exa m ple of a hybri d bet ween single-stran de d E c o RI A D N A a n d hexo n R N A is s h o w n i n G a n d di a gr a m m e d i n H. T h e h y bri d r e gi o n is i n di c at e d b y a h e a v y li n e; l o o ps A, B, a n d C (si ngle-stra n de d u n hybri- di z e d D N A) are joined by hybrid regions resulting fro m annealing of upstrea m D N A sequences t o t h e 5’ t ail of hexo n m R N A. Bars o n microgra p hs re prese nt 0.1 µ M. 1 4 8 Physiology or Medicine 1993

R N A was transcribed fro m the adenovirus geno me during the late stages of i nfectio n (12) a n d t he stable cyto plas mic R N As were s horter t ha n t he pre do minant n uclear R N As. Th us, the pro d uction of R N As d uri ng t he late stage of adenovirus infection presented a paradig m for the heterogeneous nuclear R N A pheno menon associated with cellular genes.

DIS C O VERY OF R N A SPLI CI N G Co m pariso n of t he str uct ure of a c yto plas mic mR N A to t hat of a n uclear prec ursor R N A (13) req uire d t he p urificatio n of a s pecific ho moge neo us m R N A. The most abundant m R N A, w hic h e nco de d t he a de novir us hexo n protein, was se parate d fro m other viral mR N As by gel electro phoresis an d use d in electron microsco pe ma p ping st u dies (14). Ray White, as a fello w at St a nf or d, w as t h e first t o r e c o g ni z e t h at R N A .D N A hybri ds were more st a bl e t h a n D N A .D N A duplexes in high concentrations of for ma mi de (15). This pheno menon was characterized physically by Davidson’s lab (16) and w as t h e b asis of a c o n v e ni e nt R-l o o pi n g te c h ni q u e w h er e b y R N A f or ms a h y bri d wit h a D N A stra n d, dis placi n g t he ot her stra n d of D N A i nto a n easil y observe d loo p. T he a de novir us mR N A f or hexon protei n was ma p pe d by the R-loo p metho d to the Hi n d III A frag ment of a denovir us 2 (13). Ins pection of the R-loo ps bet ween the hexon- m R N A an d the Hi n d III B D N A frag ment reveale d the presence of R N A tails at both the 5’ an d 3’ en ds of t h e h y bri d ( Fi g. 2 A, B a n d C). T h e si n gl e str a n d R N A tail at t h e 3’ e n d w as ex pecte d, as t he R N A was k no w n to be polya de nylate d post-tra nscri ptio nal- ly. The single stran d R N A tail at the 5’ en d of the hybri d was not ex pecte d; ho wever, this 5’ tail of R N A co ul d have been dis place d fro m the R N A .D N A hybri ds by for mation of duplex D N A by a process calle d branch migration. In fact, si milar 5’ tails ha d been observe d previo usly in R-loo p ma p ping of a de novir us mR N A a n d ha d bee n ascri be d to s uc h bra nc h mi gratio n (17, 1 8). Ar g ui n g a g ai nst br a n c h mi gr ati o n h o w e v er w as t h e fi n di n g t h at t h e l e n gt hs of t h e 5’ t ails w er e r el ati v el y u nif or m, 1 7 0 n u cl e oti d es. T o eli mi n at e this potential of co mpeting D N A sequences displacing R N A sequences, we use d a denat ure d single-stran d of the Hi n d II I frag me nt (Fig. 2 D, E, a n d F) t o f or m a R N A .D N A h y bri d. S ur prisi n gl y, t h e 5’ R N A t ail still di d n ot f or m a hybri d wit h t he a djace nt viral seq ue nces, s uggesti ng t hat t hese R N A sequences were derived fro m other D N A sequences. C o ul d t h e D N A s e q u e n c es tr a ns cri b e d t o f or m t h e 5’ t ail s e q u e n c e, t h e lea der R N A seq uence, be locate d u pstrea m of the bo dy of the hexon m R N A, perha ps as part of the long n uclear R N A? To test this possibility, a str a n d of t h e Eco RI A D N A frag me nt was hybri dize d wit h t he hexo n m R N A. This frag ment containe d all of the linear viral seq uences which co ul d have bee n tra nscribe d by t he poly merase before e nco u nteri ng t he bo dy of t he h e x o n m R N A. S ur prisi n gl y, a n d wo n derf ull y, t he lea der se q ue nces h ybri- dize d to three short tracts of D N A seq uences, creating three different size l o o ps A, B a n d C of i nt er v e ni n g si n gl e-str a n d D N A ( Fi g. 2 G a n d H). T h e length of these loo ps ma p pe d the positions of the lea der seq uences, Ll , L 2, Ric h ar d J. Roberts 1 4 9

L3, to a p proxi mately 16.9, 19.8, a n d 26.9 ma p u nits o n t he ge no me, respectively. The distances bet ween the bases of the three loo ps per mitte d a n esti m at e of t h e l e n gt hs of t h e t w o i nt er n al l e a d ers t o b e 8 0 a n d 1 1 0 n ucleoti des, res pectively. The 5’ proxi mal lea der was esti mate d to be q uite s hort, b ut lo nger t ha n fiftee n n ucleoti des. R N A splicing was the mechanis m we propose d for generation of the final h e x o n mR N A ( 1 3). It w as k n o w n t h at t h e n u cl e us of vir us-i nf e ct e d c ells c o nt ai n e d l o n g R N As tr a ns cri b e d fr o m t h e vir al s e q u e n c es b et w e e n t h e 5’ m ost l e a d er Ll , l ocate d at 1 7 ma p u nits, a n d t he bo dy of t he hexo n m R N A, l ocate d bet wee n 51.7 a n d 61.3 ma p u nits (19, 20). T h us, t he l o n g n uclear R N A probably containe d seq uences for all three lea ders, Ll , L 2 a n d L 3, as w ell as f or t h e b o d y of t h e m R N A. These sequences were conjectured to be joine d by excision of the intervening seq uences an d ligation of the flanking R N As, a pr o c ess d u b b e d R N A s pli ci n g ( Fi g. 3). I n f a ct, t h e n u cl e ar R N As were quite abun dant an d were easily visualize d by electron microscopy of t h e R N A .D N A h y bri ds ( 2 1). A n al ysis of t h e str u ct ur e of t h es e R N A .D N A hybri ds reveale d the presence of potential inter me diates, where only Ll ha d bee n s plice d to L2 a n d w here o nly Ll, L2 a n d L3 ha d bee n joi ne d by s pli ci n g: The R N A splicing hypothesis reconciled many paradoxes. Both the long- er n u cl e ar R N As a n d t h e s h ort er c yt o pl as mi c mR N As c o ul d s h ar e 5’ c a p t er mi ni, 7 m G p p p X, a n d 3’ po1y A tr a cts ( 2 2), beca use i nter nal seq ue nces were re mo ve d fro m t he n uclear prec ursor. More s pecifically for a de no-

1 7 2 0 2 7 5 2 m

L123

m at ur e

Fi g. 3: Pro pose d R N A s plici n g mechanis m f or sy nthesis of mR N A f or hexo n pr otei n. A l o n g n uclear prec ursor R N A is tra nscri be d fro m 16.9 ma p u nits t hro ug h t he poly A a d diti o n sit e at t h e e n d of t h e b o d y of hexo n m R N A. The region of the Ad2 geno me fro m which the precursor R N A is tr a nscri b e d is s h o w n at t h e t o p of t h e fi g ur e. T h e f o ur R N A s e g m e nts i n t h e c yt o pl as mi c mR N A are processe d fro m t his prec ursor by excisio n of i nterve ni ng se q ue nces ( de note d by das he d arro ws). 1 5 0 Physiology or Medicine 1993 vir us, pre vio us e vi de nce ha d s u g geste d t hat ma n y differe nt viral mR N As a p p e ar e d t o s h ar e a c o m m o n 5’ t er mi n al s e q u e n c e ( 2 3). T his s e q u e n c e, a l o n g Tl oli g o n u cl e oti d e, containe d the ca p an d an a p parently uniq ue seq uence eleven resi d ues in length. If many of the late viral mR N As were processe d by s plici ng fro m t he sa me ty pe of prec ursor R N A, t he n t hey coul d share a co m mon sequence at their 5’ ter mini. More i mportantly, ‘the R N A s plicing hy pothesis provi de d an ex planation for the h n R N A pheno m- e nology associate d wit h cell ular ge nes. Heteroge neo us n uclear R N A tra n- scribe d fro m diverse cell ular genes co ul d be processe d by R N A s plicing into s horter cyto plas mic m R N As. Thus, most cellular genes probably containe d seq uences which were re move d by R N A splicing, i.e. they were split genes.

SPLIT GE NES: I NTER VE NI N G SE Q UE NCES OR I NTR O NS I N CELL UL AR GE NES Shortly after the discovery of R N A s plicing an d s plit genes in a denovir us, a n u m ber of cell ular ge nes were also s ho w n to ha ve i ntro ns or i nter ve ni n g seq ue nces. F or e x a m pl e, t h e gl o bi n ge nes co ntai ne d t wo i nterve ni ng se- q ue nces (24, 2 5), the ovalb u min gene was s plit into eight sets of seq uences ( 2 6), and the i m munoglobulin genes contained both short and long (27). I n fact, t he average cell ular ge ne co ntai ns a p proxi mately eig ht i ntro ns a n d t h e pri m ar y tr a ns cri pti o n u nit is t y pi c all y f o ur ti m es l ar g er t h a n t h e fi n al m R N A. Shortly thereafter, it was recognize d that there was a li mite d set of co nser ve d se q ue nces at eac h i ntro n bo u n dary (28). I nteresti ngly, these consens us seq uences were co m mon of vertebrate, plant an d cells ( 2 9) s u g g esti n g t h e s pli ci n g pr o c ess w as e v ol uti o n aril y g e n er al. I ntr o ns i n the latter organis ms are generally shorter an d have more highly conserve d seq uences at their bo un daries. P hyloge netic co m pariso n of t he seq ue nces of ho mologo us ge nes fro m a variety of organis ms reveale d that sequences ha d drifte d much more ra pi dly t ha n exo n seq ue nces. T his s uggeste d t hat i ntro n seq ue nces are g e n er all y n ot f u n cti o n al, at le ast i n t h e c o nt e xt of r e q uiri n g l o n g tr a cts of s pecific se q ue nces. F urther more, the length of introns in ho mologo us ge nes varie d sig nifica ntly d uri ng e vol utio n, s uggesti ng little co nstrai nt. Fi nally, it was clear t hat s pecific i ntro ns co ul d be lost d uri ng e vol utio n. T he mechanis m res ponsible for the exact deletion of introns is probably relate d to gene conversion using a c D N A co p y of t he mR N A or a p arti all y s pli c e d i nter me diate R N A. T his process has bee n doc u me nte d for t he re moval of introns fro m yeast genes (30) an d raises the q uestion of why introns persist- e d d uri ng evol utio n.

M UT ATI O NS I N CELL UL AR GE NES Ma ny h u ma n diseases are ca use d by m utatio ns t hat i nterfere wit h R N A splicing. Approxi mately one quarter of all in the hu man gl o bi n genes un derlying thalasse mia are m utations in seq uences s pecifying correct P hilli p A. S h ar p 1 5 1

R N A s plici ng. T h us, co nservi ng i nfor matio n for acc urate s plici ng is a constraint on genetic syste ms. The li mite d nat ure of the m utations altering s pli ci n g i n t h e gl o bi n f a mil y is i nt er esti n g ( 3 1). All of t h e c h ar a ct eri z e d m utati o ns eit her alter t he c o nser ve d 5’ or 3’ s plice site se q ue nces at t he b o u n d ari es of t h e i ntr o n, or ar e c h a n g es wit hi n i ntr o ns t h at g e n er at e n e w c o ns e ns us s pli c e sit es ( Fi g. 4). T h e f or m er is si m pl e - m utatio n of t he hig hly conserved G U or A G sequences at the boundaries of the intron co m monly i n a cti v at es s pli ci n g at t h at sit e-t his fr e q u e ntl y le a ds t o t h e a cti v ati o n of a n e ar b y cr y pti c s pli c e sit e. T h e l att er is m or e c o m pli c at e d. I n t his c as e, a m utatio nal c ha nge wit hi n a n i ntro n- C T G T a cti v at es t his sit e as a 5’ s plice site a n d, as a co nseq ue nce, a n u pstrea m A G is activate d as a 3’ s plice site for f urt her s plici ng. T his res ults i n t he ge neratio n of a s hort exo n fro m t h e pr e vi o us i ntr o n s e q u e n c es ( Fi g. 4). T h us, m ut ati o ns c a n alt er b ot h t h e position of s plice sites an d the n u mber of to generate novel proteins. Mistakes are probably not unco m mon in splicing of R N A fro m co mplex genes. Exons can occasionally be ski p pe d an d, in so me cases, circ ular R N As can be generate d (32, 33). Since s pecific f unctions have not been assigne d to many of these R N As, t h eir g e n er ati o n m a y r efl e ct n ois e i n t h e s yst e m. I nteresti n gl y, it has rece ntl y bee n pr o p ose d t hat a n mR N A s ur veilla nce s yst e m e xists i n c ells w hi c h d estr o ys R N As as t h e y e nt er t h e c yt o pl as m if t h e y c o nt ai n a n o p e n r e a di n g fr a m e i nt err u pt e d b y a tr a nsl ati o n al ter mi n ati o n signal (34). This syste m wo ul d probably degra de most m R N As with s plicing errors as t hey are tra ns porte d to t he cyto plas m.

cr y pti c 5’ s pli c e sit e s

ne w

Fi g. 4: β - Gl o bi n m ut ati o ns. β - Gl o bi n genes of thalasse mia patients possessing mutations which affect R N A s plici ng (31). T he si ngle base c ha nges i n fo ur i n de pe n de nt patie nts are i n dicate d by arro ws fro m t he dia gra m of t he t wo i ntro n str uct ure of t he β - gl o bi n ge ne. T hree of t he m utatio ns alter co nser ve d se q ue nces of t he 5’ s plice site a n d a fractio n of t he m R N A fro m t hese m ut a nt g e n es ar e pr o c ess e d at t h e cr y pti c 5’ s pli c e sit es. T h e f o urt h m ut ati o n is wit hi n t h e seq ue nces of t he seco n d i ntro n a n d creates a 5’ s plice site at t his positio n. T his res ults i n t he induction of the i n dicate d seq ue nces as a ne w exo n. Exo n sequences are represente d by recta ngles, i ntro n se q ue nces by li nes. 1 5 2 P hysiology or Me dici ne 1993

W HY SPLIT GE NES A N D R N A SPLICI N G?

It is diff k ult to co nfi de ntly acco u nt for w hy i ntro ns have bee n co nserve d d uri n g e v ol uti o n wit hi n t he ge nes of m ost, if n ot all, e u kar y otes. Clearl y t he i ntro n-exo n str uct ure of ge nes has bee n very i m porta nt i n t he ge neratio n of ne w genes during evolution. For exa mple, the gene of m a n i s co m pose d of t hree ty pes of exo ns, eac h of w hic h e nco des a s pecific ty pe of protei n fol di ng do mai n (Fig. 5; 35). T hese sa me t hree protei n-fol di ng do mai ns a n d corres po n di ng exo n str uct ures are fo u n d i n ot her ge nes, so me of w hic h e nco de cell s urface rece ptors a n d bloo d coag ulatio n protei ns. Thus, the fibronectin gene was created by tande m and dispersed duplica- tio n of exo n u nits usi ng brea kage a n d joi ni ng wit hi n t he i ntro n se q ue nces. Si nce t hese exo n u nits are asse m ble d precisely by R N A s plici ng a n d e nco de f u nctio nal protei n do mai ns, t he fi nal protei n is sta ble a n d has m ulti ple functions. This the me of duplication and utilization of an exon unit has bee n a co m mo n mec ha nis m for t he ge neratio n of ne w ge nes e nco di ng ma ny cell s urface rece ptors a n d ot her ty pes of protei ns i n verte brates. T h us, t he prese nce of exo n u nits w hic h corres po n d to f u nctio nal protei n do mai ns has bee n critical for t he evol utio n of co m plex orga nis ms.

Fi br o n e cti n Pr ot ei n:

Fi br o n e cti n G e n e:

Protein Subdo main T y p e Also Found In Tissue Plas minogen Activator

Ty p e Also Found In Blood Coagulation Proteins

T y p e Ill Also Found In Cell Surface Receptors And Other E xtr a c ell ul ar M atri x Pr ot ei n s

Fi g. 5: Fibro necti n ge ne e v ol ve d b y exo n d u plic atio n. T he large e xtracellular matri x protei n, fi br o n- e cti n, is pri marily co mposed of three types of protein do mains, I, II, and III denoted as tilted ellipsoids, vertical ellipsoids and circles, respectively. As indicated at the botto m, ho mologous do mains are found in other cellular proteins. Eac h of these do mains are encoded by a defined exon pattern, one or t wo exons, denoted as verticle rectangles in the middle. These exon units are duplicated in the ftbronectin gene and als o in the other genes containing these do mains. In t wo cases, larger recta ngles, t he i ntro n se parati ng t he ty pical t wo e xon structures of t he ty pe III subdo main, have been deleted. These patterns suggest that descendants of a co m mon progeni- tor exon configuration was used to generate parts of these genes ( 3 6). P hilli p A. S harp 1 5 3

T he a bility to alter n atively select differe nt co m bi n atio ns of exo ns at t he stage of R N A splici ng to ge nerate protei ns wit h differe nt f u nctio ns is clearly critical for t he viability of ma ny vertebrate orga nis ms. A p proxi mately, o ne of every t we nty ge nes is ex presse d by alter native pat h ways of R N A s plici ng in different cell types or gro wth states. For exa mple, nuclear precursor R N As fro m t he fibro necti n ge ne are alter natively spliced i nto m R N As t hat e ncode 20 differe nt protei ns. T hese protei ns have slig htly differe nt f u nc- tio ns reflecti ng t heir v ari a nt str uct ures. F urt her more, a set of exo ns is not i ncl uded i n m R N As processed i n liver cells b ut is i ncl uded i n m R N As i n ot her cell ty pes ( Fig. 6). T he fi bro necti n secrete d by t he liver more re a dily circ ulates i n t he bloo d strea m beca use it has fe wer protei n do mai ns w hic h a r e r e c o g ni z e d b y r e c e pt o r s o n t h e c ell s u rf a c e t h a n t h e ot h e r f o r m s of fibro necti n (36). T he fibro necti n secreted by ot her cells is more typically fo u n d i n t he i ntr acell ul ar m atrix of soli d tiss ue. A si mil ar v ari atio n i n cell ular ad hesio n is observed wit h alter native splici ng patter ns i n a not her p a rt of t h e fi b r o n e cti n m R N A. T h us, t hro ug h alter native s plici ng, t he sa me set of ge ne se q ue nces c a n be utilize d for differe nt f u nctio ns. Alter n ative s plici ng of prec ursor R N As i n differe nt cell types m ust reflect differe nces i n t he f actors reg ul ati ng t he s plici ng process. O nly i n fe w c ases have t hese factors bee n ide ntified, pri marily by t he a nalysis of m uta nts w hic h are defective i n t he reg ulatory step. T h e develop me nt of sex ual di morp his m i n t he fr uit fly Drosop hila is reg ulated at t he level of alter native R N A splicing by expression of a cascade of genes. Early in develop ment, the

LI V E R

Fi g. 6: Alt er n ati v e s pli ci n g of R N A encoding fibronectin. T h e exo ns de note d as EIII B a n d EIII A are s plice d i nto t he mat ure R N A sy nt hesize d i n ma ny cell ty pes i ncl u di ng fi br o blasts. H o w e v er, i n t he liver, t hese t wo exo ns are not i ncl u de d, are s ki p pe d, i n sy nt hesis of t he m R N A. F urt her- m ore, fi ve variati o ns f or t he s plici n g patter ns of t he V re gi o n are s h o w n. All of t hese variati o ns o n s plici n g are f o u n d i n m ost cell t y pes. T he fi br o necti n pr otei ns e nc o de d b y t he first a n d t hir d mR N A bi n d differe ntiall y t o rece pt ors o n l y m p h oc ytes. I n t otal, at least t we nt y differe nt protei ns are sy nt hesize d fro m t he si ngle fi bro necti n ge ne. 1 5 4 Physiology or Medicine 1993 fe male or male s pecific patter n of s plici ng is set by a rea di ng of t he ratio of sex chro moso mes to autoso mal chro moso mes. After setting this s witch, the develo p me ntal process is co ntrolle d by a n a utoreg ulatory process w hic h m ai nt ai ns t h e m al e or f e m al e s pli ci n g p att er n i n m a n y if n ot all c ells ( 3 7). I nteresti n gl y, m ost of t he pr otei ns e nc o de d b y t hese sex re g ulati n g ge nes are me mbers of a fa mily w hose co m mo n feat ures are o ne or more R N A recog nitio n s ub do mai ns a n d a pro no u nce d tract of t he re peati ng a mi no a ci d s e q u e n c e Ar g- S er ( 3 8). It is li k el y t h at t h es e f a ct ors dir e ctl y i nt er a ct t hr o u g h t h e Ar g- S er tr a ct wit h ot h er pr ot ei ns criti c al f or f or m ati o n of t h e s pli ci n g m a c hi n er y.

SPLIT GE NES I N T HE PR O GE N OTE? The s plit gene str uct ure m ust be very ol d, clearly pre dating the divergence of orga nis ms t hat gave rise to pla nts, yeast a n d ma n (29). First, as will be describe d later, a co m plex s plicing machinery, which involves greater than 5 0 - 1 0 0 pr ot ei ns a n d fi v e s m all R N As, e xists i n t h e n u cl e us of all e ukary- otes. This machinery is highly conserve d an d co ul d not have arisen s ponta- neo usly in the vario us lineages. Secon d, introns are conserve d in positions within ho mologo us genes fo un d in the lineages of plants an d ani mals. This e vi de nce establis hes t hat s plit ge nes a n d R N A s plici ng were prese nt i n co m mon progenitor organis ms a billion years ago. Several scientists have s pec ulate d that genes originally evolve d as exons an d that the progenote organis m fro m which c urrent prokaryotic an d e ukaryotic orga nis ms e vol ve d may ha ve ha d a s plit ge ne str uct ure (39 - 4 2). T hese pri mor dial exo ns are pict ure d as e nco di ng seq ue nces for stable protein folding do mains. Asse mbly of a nu mber of exon sequences by R N A s plicing wo ul d be ex pecte d to pro d uce a protein co m pose d of stable fol ding do mai ns w hic h ha ve a hig h probability of bei ng f u nctio nal eit her str uct ural- l y or c at al yti c all y. If g e n es ori gi n all y e v ol v e d i n t his f as hi o n, t h e p ositi o ns of introns in relationshi p to protein secon dary str uct ure might not be ran do m. E vi de nce to s u p port t his hy pot hesis has bee n so ug ht i n t he exo n-i ntro n str uct ure of evol utionarily ol d proteins critical for energy metabolis m. For exa m ple, t he e nzy me pyr u vate ki nase ( P K) is co nser ve d i n str uct ure a n d f u nctio ns i n bacteria, yeast a n d c hic ke n ( Fi g. 7; 43). T he ge ne is not i nt err u pt e d wit h i ntr o ns i n b a ct eri a a n d y e ast, b ut c o nt ai ns te n i ntr o ns i n c hicke n, ni ne wit hi n t he co di ng seq ue nces. T he positio ns of t hese ni ne i ntro ns are not ra n do m w he n co m pare d to t he protei n str uct ure. First, i n t h e N-t er mi n al p art of t h e pr ot ei n, t h e first t hr e e i ntr o ns ar e p ositi o n e d bet wee n re peati ng str uct ural motifs of α h eli x- β s heet. Sec o n d, i n t he mo no n ucleoti de bi n di ng fol d. P K s hares a co m mo n i ntro n positio n wit h one other old metabolic enzy me, alcohol dehydrogenase. In P K, the eighth i ntro n occ u pies a positio n si milar to t hat of a n i ntro n i n t he de hy droge nase. Since the generation of these t wo enzy mes fro m a co m mon protein s ub do- mai n clearl y pre date d t he e vol utio nar y di ver ge nce of pro kar yotic a n d e u- karyotic organis ms, these res ults s uggest that the progenitor organis ms for P hilli p A. S h ar p

Fi g. 7: T h e str uct ure of pyruvate ki n ase a n d i ntr o n p ositi o ns. Pyr uvate ki nase is s ho w n sc he matically as a protei n wit h seco n dary a n d tertiary str uct ure o n t he left. O n t he rig ht, t his str uct ure is ex pa n de d at t he p ositi o ns of i ntr o ns i n t he tertiar y str uct ure of t he pr otei n. N ote t hat t he first t hree i ntro ns all fall bet wee n a- helix ( barrels) a n d β -s heet (arro ws) re peats. I ntro n 8 is posi- tio ne d i n a p proxi mately e q uivale nt positio ns i n t he mo no n ucleoti de bi n di ng fol d of P K a n d maize alco hol de hy droge nase. Re pri nte d wit h per missio n fro m ref 4 4. bot h prokaryotic a n d e ukaryotic orga nis ms may have ha d a s plit ge ne str uct ure more ty pical of a c urre nt e ukaryotic cell (for disc ussio n see 44).

IS T HE GE NE A N E X O N? T he ge ne was first ge neticall y defi ne d as a u nit of i n herita nce w hic h is associate d with a loc us on a chro moso me. The che mical definition of a gene has beco me m uc h more diffic ult ho wever, as t he co m plexity of ge netic infor mation an d its mo difications are discovere d. The existence of alterna- tive s plicing of exons, where infor mation at an exon unit can be o ptionally ex presse d i n so me cells a n d not i n ot hers, s u g gests t hat a n exo n mi g ht correspond to a gene. That is, an exon corresponds to the mini mal a mount of infor mation which is ex presse d as a discrete unit. This conce pt beco mes p arti c ul arl y r el e v a nt w h e n t h e tra ns -s pli ci n g process is co nsi dere d (45). I n t his case, exo ns tra nscribe d fro m differe nt loci, a n d i n ma ny cases differe nt chro moso mes, are joine d by R N A s plicing. Tra ns -s pli ci n g of ex o ns a n d i ntro ns has bee n establis he d i n t he parasite try pa noso mes ( 4 6), t h e fl at w or m C. elega ns ( 4 7), an d s uggeste d for so me h u man genes (48). Clearly, in t h e c as e of tra ns -s pli ci n g, b ot h t h e u nit of i n h erit a n c e a n d t h e l o c us o n a chro moso me can correspond to a single exon. 1 5 6 Physiology or Medicine 1993

It is u nlikely, ho wever, t hat t he c urre nt worki ng defi nitio n of a ge ne as a li n e ar c oll e cti o n of e x o ns w hi c h ar e j oi n e d b y R N A s pli ci n g will b e r a di c all y altere d i n t he near f ut ure. T his is probably wise, as t he existe nce of m ulti ple processes collectively calle d R N A e diting f urther co m plicates the bioche mi- cal defi nitio n of a ge ne (49). Ho we ver, gi ve n t he possibility t hat t he earliest u nit of ge netic i nfor matio n may ha ve e vol ve d as a n exo n, t he ge neral concept of exons as gene units may be more vali d than any other proposal.

SPLICI N G OF N UCLE AR PREC URS ORS T O mR N As T he develo p me nt of a reactio n co m pose d of sol uble cell ular co m po ne nts which acc urately processe d prec ursors to mR N As (pre- mR N As) w as criti c al for advancing our understanding of splicing (50). When co mbined with the use of hig hly ra dioacti ve pre- m R N A s ubstrates, a bioc he mical a nalysis of t he s pli ci n g pr o c ess b e c a m e f e asi bl e ( 5 1, 5 2). N ot s ur prisi n gl y, ki n eti c R N A i nter me diates i n t he s plici ng reactio n were soo n i de ntifie d (53, 54). S ur pris- ingly, these inter me diates ha d a lariat str uct ure where the 5’ most n ucleo- ti de of t he i ntro n was joi ne d i n a 2’-5’ p hos p ho diester bo n d to a n a de nosi ne wit hi n t h e i ntr o n ( 5 5). Si n c e t h e a d e n osi n e is c o v al e ntl y b o n d e d t hr o u g h b ot h 3’- 5’ a n d 2’- 5’ phosphodiester linkages, this for ms an R N A branch. The existence of s uch branches ha d j ust been describe d fro m n uclease digestio n st u dies of total n uclear R N A fro m h u ma n cells (56). For matio n of t he branch a p pears si m ultaneo usly with cleavage at the 5’ s plice site an d gener- at es t h e l ari at R N A w hi c h t y pi c all y mi gr at es m or e sl o wl y t h a n t h e pr e- mR N A d uring electro phoresis thro ugh a tight porosity polyacryla mi de gel. T h e s pli ci n g of pre- mR N A pr o c e e ds i n t w o st e ps ( Fi g. 8, l eft). As me n- tio ne d abo ve, t he first ste p co nsists of clea vage at t he 5’ s plice site wit h t he conco mitant for mation of the branch. At this stage, the 5’ exon R N A has a 3’ hy droxyl group an d the lariat inter me diate R N A contains the intron an d 3’ exo n. T he seco n d ste p co nsists of cleavage of t he R N A at t he 3’ s plice site wit h co nco mita nt joi ni ng of t he t wo exo ns. T he i ntro n is release d as a lariat R N A an d is reasonably stable in reactions in vitro. This contrasts with the sit uatio n i n vivo w here i ntro n R N As are al most al ways ra pi dly degra de d. T h e f a ct t h at t h e i nt er m e di at e st at e, c o nsisti n g of t w o R N As, w as effi- ciently converte d to the final pro d ucts strongly s uggeste d that these R N As r e m ai n b o u n d i n a co m plex. T he co m plex was i de ntifie d b y its rate of se di me ntatio n i n a glycerol gra die nt, 6 0 S, a n d was desi g nate d t o be a s pliceoso me or s plici ng bo dy (57, 58). As a ntici pate d fro m earlier work s u g g esti n g t h e i m p ort a n c e of s m all n u cl e ar ri b o n u cl e o pr ot ei n p arti cl es i n s pli ci n g, t h e s pli c e os o m e c o nt ai n e d t h e s m all n u cl e ar R N As (sn R N As) U 2, U4, U5 a n d U6 a n d, u n der certai n c o n diti o ns, Ul ( 5 9). T h us, t h e s pli c e o- so me, m uc h like a riboso me, co ntai ns a s ubstrate R N A a n d a n u mber of stable cellular R N A-protein co mponents. A. S h ar p 1 5 7

Spliceoso me S elf- s pli ci n g

N u cl e ar pre- m R NA Group

- p-

Fig. 8: Co mparison of self-splicing and nuclear pre- m R N A splicing mechanis m. T he first col u m n o utli nes t he m R N A prec ursor s plici ng mec ha nis m. T he s ha de d circle re prese nts a multico mpon- ent co mplex, the spliceoso me, which pro motes the splicing reaction. The second colu mn o utli nes t he s plici ng mec ha nis m of self-s plici ng i ntro ns of t he gro u p I ty pe. T his process is catalyze d by R N A str uct ures wit hi n t he i ntro n ( dar k se micircle), w hic h co ntai ns a g ua nosi ne bi n di ng site, a n d utilizes a g ua n osi ne ( C) fact or i n t he first ste p. T he t hir d c ol u m n o utli nes t he s plici n g mec ha nis m of self-s plici n g i ntr o ns of t he gr o u p II t y pe. T his pr ocess is als o catal yze d b y R N A str uct ures wit hi n t he i ntro n b ut utilizes, i nstea d of a cofactor, a n a de nosi ne resi d ue ( A) wit hi n t he i ntro n to for m a lariat R N A. All t hree mec ha nis ms procee d by t wo ste ps, reactio n at t h e 5’ s pli c e sit e a n d t h e n r e a cti o n at t h e 3’ s pli c e sit e. T h e f at e of t h e p h o s p h at e s at t h e 5’ a n d 3’ s pli c e sit e s is i n di c at e d.

GR O UP I, GR O UP II A N D T HE SPLI CE OS O ME C o m p aris o n of t h e t w o st e ps i n s pli ci n g b y t h e s pli c e os o m e t o t h e R N A- catalyze d self-s plici ng reactio ns of gro u p I a n d II i ntro ns s ho ws so me stri ki n g si mil ariti es ( Fi g. 8). I n all t hr e e c as es, t h e first st e p is cl e a v a g e at t h e 5’ s plice site. In gro u p I introns, this cleavage req uires a g uanosine which s p e cifi c all y o c c u pi es a bi n di n g sit e i n t h e c at al yti c i ntr o n s e q u e n c es ( 6 0). The 3’ hy droxyl gro u p on this g uanosine is activate d an d thro ugh a tra nses- terificatio n reactio n dis places t he 3’ hy droxyl of t he 5’ exo n. Gro u p II s elf- s pli ci n g i ntr o ns cl e a v e at t h e 5’ s pli c e sit e b y a cti v ati n g t h e 2’ O H at t h e branch site, pro d ucing a lariat R N A (61, 62) m uch like that pro d uce d by the s pliceoso me. The secon d ste p for all three processes involves a reaction at t h e 3’ s pli c e sit e t o j oi n t h e e x o ns a n d dis pl a c e t h e i ntr o n. T h e si m pl est mec ha nis m for t he seco n d ste p of s plici ng of t he gro u p I a n d II i ntro ns is a si n gl e tr a ns est erifi c ati o n r e a cti o n. T his h as b e e n pr o v e n t o b e t h e c as e f or reactio ns by gro u p I i ntro ns. 1 5 8 P hysiology or Me dici ne 1993

T he si milarities bet wee n t he s pliceoso mal process a n d t he self-s plici ng introns s uggeste d that these reactions might be evol utionarily relate d. This is partic ularly t he case for t he gro u p II a n d s pliceoso me reactio ns. T h us, t he snR N As i n t he s pliceoso me mig ht be pict ure d as a gro u p II i ntro n i n pieces, w h er e t h es e R N As f or m t h e c at al yti c sit es f or b ot h r e a cti o ns. T h e pr ot ei ns i n t he s pliceoso mes co ul d be necessary for recog nitio n of t he prec ursor R N A, arr a n gi n g t h e snR N As i n catal ytic str uct ures a n d rearra n ge me nt of t h e c o m p o n e nts t o f a cilit at e c o m pl eti o n of t h e pr o c ess. E arl y i n e v ol uti o n all introns might have been self-s plicing. Then, d uring the passage of ti me, tra m-acting co m ponents may have develo pe d which exec ute d the s plicing of i ntro ns b y reco g nitio n of se q ue nces at t he s plice sites, t h us per mitti n g atro phy of the cis-seq uence within each intron (63). Tra ns -acti n g seg me nts of gr o u p II i ntr o ns w hi c h c o m pl e m e nt t h e s pli ci n g of i ntr o ns wit h p arti al catalytic str uct ure have been doc u mente d (see reference 63). Years after t he disc o ver y of t he lariat i nter me diate i n t he cis -s pli ci n g pr ocess, st u dies of t he tra ns -s pli ci n g reaction reveale d a branche d inter me- diate state. T h us, t he c he mistry of t he tr u ns-s plici ng reactio n is very si milar to t hat of t he cis-s plici ng process (64). It is t herefore hig hly likely t hat t he ci s- a n d tr u ns-s plici ng processes are variatio ns of a si ngle f u n da me ntal mechanis m. Trypanoso me parasites which exclusively synthesize mR N A b y tr uns-splicing of a short lea der R N A do not apparently express Ul snR N A or U 5 s n R N A, but do express U2, U4 and U6 snR N As ( 6 5). The seq uence co mple mentarity bet wee n t he 5’ e n d of Ul sn R N A and the c o nse ns us se q ue nces at t he 5’ s plice site le d t o t he h y p ot hesis t hat t his i nteractio n was critical for s plici n g (66, 67). T his h y pot hesis has bee n c o nfir m e d b y a n u m b er of st u di es i n cl u di n g t h e i n hi biti o n of s pli ci n g i n vitr o b y a d diti o n of a nti-sera s pecific f or Ul sn R N P (68). After the discovery of lariat R N A, a was recognize d in the region flanking the branch site. This consensus sequence is co mple mentary to a conserved i nter nal seq ue nce i n U2 s n R N A, a n d m utatio nal a nalysis has s ho w n t hat t his i nt er a cti o n is als o criti c al i n s pli ci n g ( 6 9). Ul a n d U 2 snR N As recog nize co nse ns us seq ue nces at t he 5’ s plice site a n d bra nc h site as early ste ps i n t he s plici n g reacti o n ( Fi g. 9 A). At t he sa me sta ge of s plici n g, U4 a n d U6 snR N As are bound to one another through an extended region of co m ple- me ntarit y (70). SnR N A U6 is u niq ue relative to t he ot her snR N As i n t h at: it is n ot b o u n d b y c ore pe pti des r e c o g ni z e d b y t h e S m l u p us a ntis er a, it is tra nscribe d by a differe nt poly merase, a n d it has a differe nt ty pe of mo difi- cation at its 5’ ter minus. U6 sn R N A is also the most conserved me mber of the sn R N A fa mily and may have a unique catalytic role. The U4 /6 snR NP for ms a s pecific co m plex wit h U5 s n R N P a n d t his tri-sn R N P co m plex is t h o u g ht t o bi n d t o t h e ot h er c o m p o n e nts i n f or m ati o n of t h e s pli c e os o m e ( 7 1). After for matio n of t he s pliceoso me, t here are major rearra nge me nts of t h e snR N As ( Fi g. 9B). Both genetic and bioche mical experi ments sho w that U 6 a n d U 2 snR N As are paire d t hro ug h a n exte nsive tract of co mple mentar- ity i n t he s pliceoso me (72, 73). For matio n of t he U2 a n d U6 str uct ure P hilli p A. S h ar p

Fi g. 9: R N A i nt er a cti o ns b et w e e n s pli c e os o m al s n R N As a n d pre- m R N A substrates. ( A) ( T o p) B ase- pairi ng i nteractio ns bet wee n Ul a n d U 2 s n R N As a n d pre- m R N A are i n dicate d o n left a n d rig ht of i ntro n, res pectively. ( Botto m) Exte nsive base- pairi ng bet wee n U4 a n d U6 s n R N As. ( B) I nterac- tions bet ween Ul, U 2, U 5, a n d U 6 snR N As a n d pre- mR N A. I n b ot h A a n d B, pre- mR N A consens us seq uences an d sn R N A seq uences are those of S. cere visiae; u p percase n ucleoti des are highly conserve d bet ween S. cere visiae and the kno wn sequences of other organis ms (excluding trypanoso mes, which do not have a G U A G U A sequence in U2 sn R N A). Different shaded areas hig hlig ht se q ue nces i n U2 a n d U6 snR N As t hat c ha nge base- pairi ng part ners d uri ng t he spliceoso me cycle. Internal sn R N A secondary structures that do not change bet ween A and B are sho wn as stylize d ste ms an d loo ps. Asterisks in dicate sn R N A positions at which m utations s pecificall y bl oc k t he sec o n d ste p of s plici n g. 160 Physiology or Medicine 1993 requires dissociation of U4 sn R N A fro m U6 sn R N A. In fact, U4 sn R N A can b e r el e as e d fr o m t h e a cti v e s pli c e os o m e aft er t his tr a nsiti o n. T h e co m ple- me ntarity bet wee n Ul s n R N A a n d t h e 5’ s pli c e sit e s e q u e n c e is pr o b a bl y als o diss o ci at e d b ef or e t h e first st e p of s pli ci n g. T h e 5’ s pli c e sit e al m ost certai nly pairs wit h a not her regio n of U6 s n R N A (74, 75). U5 s n R N A is thought to be i mportant in recognition of the exon sequences i m mediately fl a n ki n g t h e s pli c e sit es. M ut ati o ns i n t h es e fl a n ki n g s e q u e n c es fr e q u e ntl y ha ve si g nifica nt effects o n s plici n g efficie nc y a n d t hese effects ca n be re d uce d by co m ple mentary changes in U5 snR N As (76). It is p ossi ble t hat there are f urther rearrange ments in the secon dary str uct ures for me d by the snR N As bet wee n t he first a n d seco n d ste ps i n s plici ng. T here are m utatio nal changes in U2 an d U6 (positions denote d by *Fig. 9) which only inactivate t he sec o n d ste p (72, 73). I ns pecti o n of t he net w or k of co mple mentarity for med bet ween snR N As a n d t h e pre- mR N A i n t h e s pli c e os o m e r e v e als a co nce ntratio n of s n R N A str uct ure, per ha ps tertiary, near t he bra nc h site a n d 5’ s pli c e sit e s e q u e n c es. T his arr a n g e m e nt of snR N As might for m the catalytic site for the first ste p. Perha ps f urther rearrange ments wo ul d for m a not her catalytic site for t he seco n d ste p.

F OR M ATI O N OF T HE SPLICE OS O ME A n u mber of stable co m plexes co ntai ni ng snR N As a n d pre- mR N A ha ve bee n partially c haracterize d a n d place d i n a s pliceoso me cycle (Fig. 10; 77). T h e c o m mit m e nt c o m pl e x, C C, f or ms o n t h e pre- mR N A b y r e c o g niti o n of t h e 5’ s pli c e sit e s e q u e n c e b y Ul sn R N P an d seq uences enco m passing the bra nc h site a n d 3’ s plice sites (78, 79, 80). For ma m malia n s yste ms, a p yri mi di ne tract near t he 3’ s plice site a n d bra nc h site is partic ularl y i m porta nt. T he s ubseq ue nt bi n di ng of U2 s n R N P res ults i n for matio n of t he ver y stable A co m plex. T he stable bi n di n g of U2 s n R N P to t he pr e- mR N A de pe n ds criticall y o n a protei n U2 A F ( 8l), w hic h recog nizes t he polypyri midine tract through prototypical R N A-binding do mains and sig- nals i nteractio ns wit h t he ot her s plici ng co m po ne nts t hro ug h a tract of Ar g- Ser a mi no aci d re peats. I nteresti ngly, Ul sn R NP which is also require d for stable U2 s n R N P bi n di ng, has a bo u n d protei n wit h a tract of Arg-Ser re peats, t he Ul - 70 kd polypeptide. Whether U2 AF and Ul - 70 k d direct- ly co m m unicate across the intron is not clear. T h e Bl s pliceoso me for ms when the U4/ U6/ U5 tri-sn R N A co mplex bi n ds t h e A c o m pl e x. It is pr o b a bl y i n Bl co mplex that U4 sn R N A dissoci- ates fro m U6 sn R N A and perhaps Ul sn R N A fro m the 5’ s plice site. These e ve nts defi ne t he B2 co m plex w hic h is t he prec ursor of t he Cl co m plex. T h e first st e p of s pli ci n g h as o c c urr e d i n t h e Cl c o m pl e x. I n g e n er ati o n of t h e C 2 c o m pl e x, it is li k el y t h at t h er e ar e f urt h er r e arr a n g e m e nts t h at ar e necessar y to catal yze t he seco n d ste p i n s plici n g. T he C2 co m plex dissociates a n d th e n e w ly jo in e d e x o n s a re re le a se d fro m th e sn R N P .in tro n c o m p le x I. I n t he cell, t he s plice d pro d uct migrates to t he cyto plas m. T he lariat i ntro n is r et ai n e d wit h t h e snR NPs an d this I co m plex t urns over. The sn R NPs are P hilli p A. S h ar p

B - 3 ’

. m R NA Fi g. 1 0: Sche matic representation of the spliceoso me c ycle i n ter ms of t he r ole of s m all n ucle ar ribonucleo- pr otei n (sn R N P) p arti cl e s i n pre- m R N A s pli ci n g. Pre- mR N A (t o p li n e), c o nt ai ni n g t w o exo ns se parate d by a n i ntro n, e nters s plici ng co m plexes wit h s n R N Ps a n d exits as m R N A (botto m li ne) a n d excise d lariat i ntro n (left bor der ). Ot h er, non-snR NP factors are req uire d for s pliceoso me for matio n, b ut ha ve bee n o mitte d for si m plicity. C C, A, Bl, B 2, Cl, C 2, a n d I r e pr es e nt co m plexes wit hi n t he s plici ng pat h way t hat have bee n disti ng uis he d bioc he mically a n d /or g e n eti c all y. 5’ S S, 3’ S S, bs, a n d P y i n di c at e 5’ a n d 3’ s pli c e sit es, br a n c h sit e, a n d p ol y p yri mi di n e tr a ct, r es p e cti v el y. T h e i n di vi d u al snR NPs i n di c at e d ar e Ul, U 2, U 4, U 5, a n d U 6. 1 6 2 Physiology or Medicine 1993

A

m R NA

Fi g. 1 1: Tr a nsiti o ns i n t h e spliceoso me c y cl e w hi c h req uire a P R P pr ot ei n. The particular P R P mutant is list e d b esi d e t h e arr o w i n di c ati n g t h e tr a nsiti o n i n t h e c y cl e i n vitr o w hi c h r e q uir es t h e m ut a nt pr ot ei n. P hilli p A. S h ar p 1 6 3

r e c y cl e d f or f urt h er s pli ci n g i n vi v o, w hil e t h e i ntr o n R N A is d e gr a d e d. Si n c e all snR N As are ver y stable, a partic ular snR N A li k el y p arti ci p at es i n m a n y s pli ci n g c y cl es. Asse mbly and rearrange ments of co mplexes as elaborate as those in the s pliceoso me cycle wo ul d be ex pecte d to req uire a n u mber of protei ns. S o p histicate d ge netic a nal ysis has i de ntifie d a n u m ber of yeast ge ne pr o- d u cts t h at ar e i m p ort a nt f or s pli ci n g eit h er i n vi v o or i n vitr o ( 8 2, 8 3). Te mperature sensitive mutations in P RP (precursor R N A processing) genes ca use t he acc u m ulatio n of u ns pli c e d pre- mR N A i n t he n ucle us a n d /or g e n er at e e xtr a cts d ef e cti v e f or s pli ci n g i n vitr o ( Fi g. 1 1). S ur prisi n gl y, it is esti mate d t hat at least a h u n dre d differe nt ge nes e nco de pro d ucts i m por- ta nt for R N A s plici ng, abo ut 2 % of t he total yeast ge no me. T h us, t he s plici ng a p parat us is a sig nifica nt co m po ne nt of t he n ucle us. T he a mi no aci d seq uences pre dicte d for so me of the P R P genes s uggests that they have functions such as R N A binding, R N A and protein-protein interac- tion pro perties. Matching these hy pothetical f unctions to processes in s pli- c e os o m es will b e c h all e n gi n g. Reass uri ngly, t he ste ps i n t he s pliceoso me cycle w here partic ular P R P pr ot ei ns ar e r e q uir e d ( Fi g. 1 1) ar e c o nsist e nt wit h t h e c y cl e as d efi n e d b y ki netic a n d bioc he mical met ho ds. Most tra nsitio ns bet wee n s pecific for ms of t he s pliceoso me re q uire o ne or more s pecific protei ns. F urt her more, a n u mber of P R P m uta nts are defective i n s plici ng beca use of t heir i nability to reasse mble snR NPs f or f urt h er s pli ci n g. T h us, b ot h g e n eti c a n d bi o c h e mi- c al r es ults pr o v e t h at t h e s pli c e os o m e c y cl e is t h e pr o c ess r es p o nsi bl e f or excisio n of i ntro ns fro m s plit ge nes.

T HE C HE MISTRY OF T HE SPLICE OS O ME Bot h ste ps i n t he s pliceoso me process i n vol ve reactio ns bet wee n h y drox yl gro u ps a n d a p hos p ho diester bo n d. T he pro d ucts a n d s ubstrates i n bot h steps contain the sa me nu mber of covalent bonds and thus each step could be acco m plis he d wit h a si n gle tra nsesterificatio n reactio n. T his has bee n s h o w n t o b e t h e c as e f or gr o u p I s elf-s pli ci n g i ntr o ns b y a n al ysis of t h e stereoc he mistry of t he t wo reactio ns (Fig. 12; 84, 85, 86). P hos p horo- t hioates i n diester bo n ds of R N A are c hiral, ha vi n g eit her a R p or S p stereoche mistry de pen ding u pon the position of the s ulf ur ato m. The s pecific stere oc he mistr y ca n be deter mi ne d b y its se nsiti vit y t o partic ular n ucleases. If a c hir al p h os p h or ot hi o at e gr o u p p arti ci p at es i n a si n gl e tra n- sesterificatio n reactio n, its stereoc he mistr y is i n verte d, for exa m ple fro m R p t o S p. T his f a ct p er mits c o u nti n g of t h e n u m b er of tr a ns est erifi c ati o n reacti o ns i n a pr ocess a n d, t h us, t he detecti o n of a n y p ote ntial tra nsie nt i nter me diate. As i m plie d above, R p an d S p phos phorothioates are not eq ually active in many catalytic sites. This is co m monly inter prete d as reflecting the uniq ue r ole of o ne of t he ox y ge n gr o u ps o n t he p h os p hate i n i nteracti o n wit h a m et al i o n, a r ol e w hi c h c o ul d n ot b e e q u all y w ell-f ulfill e d b y a s ulf ur gr o u p. 1 6 4 Physiology or Medicine 1993

- Bi n di n g Sit e - Bi n di n g Sit e

Fir st St e p: R P a cti v e S e c o n d St e p: S p a cti v e

Fig. 12: The stereoche mistry of t he first a n d sec o n d ste p re acti o ns of t he gr o u p I self-s plici ng site. A s ulf ur at o m is i n di c at e d at t h e p ositi o ns w h er e it d o es n ot i n hi bit t h e r e a cti o ns, R p a n d S p p ositi o ns f or the first and second steps, respectively (84, 85, 86). The interactions bet ween the oxygen group on the phosphate and the Mg + + groups are hypothetical, proposed to explain why a p h os- p h or ot hi o at e wit h t h e s ulf ur at o m i n t his p ositi o n is n ot r e a cti v e.

These principles of phosphorothioate che mistry have been analyze d for the gro u p I self-s plicing intron an d no w have been st u die d in the s pliceoso me. T he t wo ste ps of s plici ng by t he gro u p I i ntro n are probably exec ute d as a for war d a n d reverse reactio n at a si ngle catalytic site (Fig. 12). I n t he first st e p, t h e g u a n osi n e c of a ct or is b o u n d t o t h e G bi n di n g sit e a n d its 3’ O H gr o u p is a cti v at e d f or t h e tr a ns est erili c ati o n r e a cti o n. I n t his r e a cti o n, t h e R p chiral phos phorothioate is active while the S p is not. This s uggests that t h e o x y g e n gr o u p i n t h e e q ui v al e nt S p p ositi o n is u ni q u el y r e c o g ni z e d i n t his reactio n, per ha ps by a metal io n. Co nsiste nt wit h a si ngle tra nsesterifi- catio n mec ha nis m d uri ng t he first ste p, t he R p c hiral ce nter is co nverte d to a S p p hos p horot hioate pro d uct (84). I n t he seco n d ste p, t he g ua nosi ne gr o u p at t h e 3’ s pli c e sit e o c c u pi es t h e G bi n di n g sit e a n d t h e 3’ O H of t h e 5’ exo n is acti vate d for t he tra nsesterificatio n reactio n. As wo ul d necessarily be t he case b y t he pri nci ple of microsco pic-re versi bilit y, t he S p c hiral p h os p h or ot hi o at e is a cti v e i n t his r e a cti o n a n d t h e R p is n ot ( 8 6). T h e S p substrate phospho diester bon d generates a Rp pro duct. Thus, stereoche mi- cal a nal ysis of t he gr o u p I pr ocess str o n gl y s u p p orts t he h y p ot hesis of a si n gle catal ytic ce nter for bot h ste ps. T he stereoc he mistry of t he t wo ste ps i n t he s pliceoso me process was i n vesti gate d b y s y nt hesis of s u bstrate R N As c o nt ai ni n g eit h er a R p or S p p h os p h or ot hi oate at eit her t he 5’ or 3’ s plice site ( 8 7). T he partic ular stereoiso mer was i ncor porate d by a co mbi natio n of c he mical sy nt hesis, oligo- pri me d tra nscri ptio n a n d ligatio n of R N As b y use of a bri d gi n g oli g o d e o x y n u cl e oti d e a n d T 4 D N A li g as e ( 8 8). S pli ci n g of t h e p h os p h or o- t hioate-co ntai ni ng R N A s ubstrates was teste d i n n uclear extracts after P hilli p A. S h ar p 1 6 5

Gr o u p I Spliceoso me

R P

1st ste p 1 st st e p

G

st e p 2 n d ste p

Fig. 13: Che mical mechanis ms and stereoche mical configurations of the t wo steps of pre- m R N A splicing (rig ht) a n d gr o u p I i ntr o n self-s plici ng (left) First-ste p differences bet ween s pliceoso mal an d gro u p I s plici ng i ncl u de o p posite p hos p horot hioate diastereo mer prefere nces, differe nt reactio n n ucleo p hiles ( Y- O H vers us 3’- O H), differe ntial place me nt (5’ vers us 3’) of t he co nser ve d g ua nosi ne ( G) wit h res pect to t he p hos p hate at t he 5’ s plice site. Seco n d-ste p si milarities i ncl u de preference for the S p phos phorothioate diastereo mer, use of a 3’- O H as the n ucl e o p hil e a n d a co nser ve d g ua nosi ne ( C) attac he d t hro ug h a 3’ oxyge n to t he p hos p hate at t he 3’ s plice site. T he p hos p horot hioate diastereo mer ( R p or S p), w hic h is not sig nifica ntly i n hibitory, is i n dicate d for the s ubstrate for each ste p. The diastereo mer that res ults is in dicate d for the pro d uct of each ste p. S mall das he d arro ws i n dicate t he pote ntial ( b ut not yet o bser ve d) re versi bilit y of eac h ste p of n uclear pre- m R N A s plici ng. T he dark he micircle, i n dicate d i n t he s pliceoso me for t he seco n d ste p, desi g nates a h y p ot hetical site si milar t o t hat of gr o u p I i ntr o ns. 1 6 6 Physiology or Medicine 1993

differe nt perio ds of i nc ubatio n. A nalysis of t he s ubstrate a n d pro d uct R N As s h o w e d t h at ( a) b ot h st e ps i n s pli ci n g i n v ol v e a si n gl e tr a ns est erifi c ati o n reactio n wit h i nversio n of t he c hir al stereoc he mistry of t he active phospho- r ot hi oate, a n d ( b) t he S p diastereo mer was t he acti ve p h os p h or ot hioate i n bot h ste ps, w hile t he R p diastereo mer was not (8’7). A co mparison of the stereoche mistry an d critical co mponents in the t wo st e ps i n s pli ci n g f or t h e gr o u p I s elf-s pli ci n g pr o c ess a n d t h e s pli c e os o m e pr o c ess is i nf or m ati v e ( Fi g. 1 3). As dis c uss e d b ef or e, t h e t w o st e ps i n t h e gr o u p I case are f or war d a n d re verse reacti o ns wit hi n a si n gle catal ytic c e nt er. T his is al m ost c ert ai nl y n ot t h e sit u ati o n f or t h e s pli c e os o m e pr o- cess. I n bot h ste ps of t he latter process, t he S p p hos p horot hioate was active; t h us, t he seco n d ste p is not a re verse of t he first ste p. T hese res ults are most co nsiste nt wit h t he pro posal t hat t he s pliceoso me ge nerates t wo differe nt catal ytic ce nters f or t he t w o ste ps. T he c o nstit ue nts i n t hese ce nters ma y partially o verla p b ut t he ce nters m ust be disti nct. T his is also co nsiste nt wit h the different che mical nat ure of reaction co m ponents in the t wo ste ps. A 2’ hy droxyl on an a denosine is activate d in the first ste p, while a 3’ hy droxyl gro u p on the 5’ exon is activate d d uring the secon d ste p. F urther more, the c o ns er v e d g u a n osi n e r esi d u e is li n k e d t o t h e a cti v at e d p h os p h at e b y a 5’ b o n d i n t h e first st e p a n d a 3’ b o n d i n t h e s e c o n d. T h us, t h e s pli c e os o m e process, an d probably the gro u p II self-s plicing process, m ust involve t wo disti n ct c at al yti c sit es. T h e c at al yti c sit e i n t h e s pli c e os o m e r es p o nsi bl e f or t h e s e c o n d st e p is probably si milar to t hat of t he gro u p I self-s plici ng i ntro ns. T hese t wo sites s hare se veral co m mo n c haracteristics. Bot h catal yze t he tra nsesterificatio n reactio n usi ng (a) a S p, a n d not R p, p hos p horot hioate at t he active site, (b) the activate d phos phate is linke d to the 3’ position to a conserve d g uanosine resi d ue, a n d (c) b ot h acti vate a 3’ h y dr ox yl gr o u p o n t he 5’ ex o n. T hese si milarities are probably deter mi ne d by so me co m mo n c he mical str uct ure a mong the t wo catalytic sites. Perha ps this co m mon micro-str uct ure reflects a s hare d R N A tertiar y str uct ure for t he catal ytic ce nters a n d a s hare d e v ol uti o n ar y ori gi n.

N UCLE AR STR UCT URE A N D R N A SPLICI N G T he processi ng of i ntro ns occ urs fro m bot h nasce nt R N A w hic h is bei ng extended by poly merase and fro m post-transcriptional precursor R N A (89). T h us, t h e s pli ci n g a p p ar at us m ust b e l o c at e d pr o xi m al t o t h e g e n e, as w ell as in the regions bet ween the gene an d the n uclear pores. After co mpletion of s pli ci n g, t h e mR N A is tr a ns p ort e d t o t h e c yt o pl as m t hr o u g h o n e of t h e a p pr o xi m at el y 4, 0 0 0 n u cl e ar p or es of a t y pi c al m a m m ali a n c ell ( 9 0). T h e mechanis m by which nuclear R N As are tra ns porte d fro m sites of tra nscri p- tion an d s plicing to pores re mains a mystery. T he s u b n uclear l ocati o ns of a n u m ber of pr otei ns a n d R N As i m porta nt for R N A splicing have been studied by light and electron microscopy. The m ost stri ki n g f e at ur e is t h e c o n c e ntr ati o n of sn R NPs, U 2, U4 /6 an d U5 in p oly A+ R N A U 2. S C 3 5.

c oil e d b o di e s U 2. U 5.

Fi g. 1 4: T h e n u cl e us. S u b n uclear localizatio n of s plici ng factors. S pecific a ntisera a n d i n sit u hy bri dizatio n met ho ds have bee n use d to localize co m po ne nts of t he s pliceoso me a n d s plici ng fact ors i n t he n ucle us of ma m malia n cells. T he o utli ne of t he cell is s h o w n wit h o ut str uct ure i n t he cyto plas m. T hree n uclear pores are diagra m me d i n t he n uclear me m bra ne. T he large dar k str uct ures are n ucleoli,.site of ri boso mal R N A sy nt hesis. So me of t he co m po ne nts of t he s pec kles, regio ns co ntai ni ng pre- m R N A, are liste d. T he coile d bo dies are differe nt a n d do not a p pear t o c o ntai n pre- m R N As (93). T he ge neral s ha do wi ng re prese nts t he n uclear distri b utio n of Ul snR NP a n d U 2 A F.

2 0 - 50 s peckle d str uct ures an d also in 1 - 5 f o ci c all e d c oil e d b o di es ( Fi g. 14; 91, 92). T his c o ntrasts wit h t he s u b n uclear l ocati o n of Ul s n R N P, w hic h, alt ho ug h also co nce ntrate d i n s peckles a n d foci, is more u nifor mly dis perse d t hro ug ho ut t he n ucle us. T hese s peckle d str uct ures corres po n d to regio ns describe d by electro n microsco pists as i nterc hro mati n gra n ule cl us- t ers a n d t h e p eri c hr o m ati n fi bril n et w or k. Hi g h r es ol uti o n i n sit u h y bri di z a - tio n met ho ds a n d p ulse labeli ng st u dies locate ne wly sy nt hesize d R N A i n t he s p e c kl e d r e gi o ns a n d als o i n c ur vili n e ar tr a c ks w hi c h e xt e n d fr o m t h e g e n e to war d the n uclear peri phery (93, 94). The f urther the n uclear prec ursor R N A is locate d along the tracks which lea d fro m its gene of origin, the lo wer t he relati ve c o nce ntrati o n of i ntr o n se q ue nces as c o m pare d t o ex o n se- q ue nces. T his s uggests t hat t he prec ursor R N A moves t hro ug h t he s peckle d str uct ures as it is bei n g pr ocesse d a n d tra ns p orte d (94). Acti ve s plice os o mes are probably co nce ntrate d i n t hese regio ns. S e v er al pr ot ei ns i m p ort a nt f or R N A s pli ci n g ar e als o c o n c e ntr at e d i n s peckle d str uct ures wit h t he sn R NPs. These incl u de the Arg /Ser protei ns SF2/ ASF ( 9 5, 9 6) a n d S C 3 5 ( 9 7). S ur prisi n gl y, t h es e pr ot ei ns, as w ell as t h e snR NPs in active s pliceoso mes, are probably attache d to an o perationally d efi n e d n u cl e ar m atri x. T his m atri x is t h e str u ct ur e t h at r e m ai ns i n t h e 1 6 8 Physiology or Medicine 1993 n ucle us after al most all of the chro matin proteins an d D N A are extracte d by se q ue ntial treat me nts wit h D Nase 1 a n d hig h salt (98). T he i ntegrity of t he fi brill ar m atri x w hi c h r e m ai ns, is s e nsiti v e t o R Nase s uggesting a major role for R N A i n its str uct ure. S pliceoso mes ha ve bee n s ho w n to be associate d wit h t he n uclear matrix as it has bee n p ossi ble t o pr ocess p ulse-la bele d en dogenous precursor R N A extracte d as part of the matrix (98). Active spliceoso mes may be associated with the nuclear matrix through i nteractio ns wit h protei ns i n t he Arg /Ser f a mil y ( 9 9, 1 0 0). A p a n el of mo noclo nal a ntibo dies has bee n ge nerate d to co m po ne nts i n t he n uclear matrix of h u ma n cells. T hree of t hese mo noclo nal a nti bo dies s pecificall y stai n t he matrix a n d exte nsively co-localize wit h t he snR NPs i n t he s peckle d str uct ures. S ur prisi ngly, t hese t hree a ntibo dies s pecifically i m munoprecipi- tate spliceoso me co mplexes which contain exon sequences (101). Thus, the a nti b o di es i m m u n o pr e ci pit at e a fr a cti o n of t h e pr e c urs or R N A, t h e l ari at inter me diate R N A an d the associate d 5’ exon, an d the splice d exon pro duct R N As. The antibo dies do not i m m uno preci pitate the intron containing s pliceoso me co m plex I eve n t ho ug h it co ntai ns most of t he s n R N Ps a n d associate d protei n co m po ne nts. T hese re markable res ults s uggest t hat t he matrix co m po ne nts bi n d to exo n R N A seq ue nces t hat are s pecifically asse m- bl e d i nt o s pli ci n g c o m pl e x es. T he protei ns recog nize d by t he mo noclo nal a ntibo dies to t he n uclear m atri x ar e m ost li k el y m e m b ers of t h e Arg /Ser f a mil y of pr ot ei ns or ar e associate d with the Arg/Ser proteins. Precursor R N As ar e n ot i m munopre- ci pitate d fro m reactio ns co ntai ni n g n uclear extracts de plete d of Arg /Ser protei ns. F urt her a d ditio n of pre paratio ns of p urifie d Arg /Ser pr ot ei ns will block specific i m munofluoresence staining of subnuclear structures. These res ults are co nsiste nt wit h a mo del w here t he Ar g /Ser fa mil y of protei ns are associated with or for m the nuclear matrix (101). Ne wly synthesized R N A s e q u e n c es ar e r e c o g ni z e d b y t h e Arg /Ser proteins s uch as S C35, ASF, the U 1 7 0 k d an d U P AF an d beco me associate d with the matrix, perha ps, t hro u g h i nteractio ns of t he Arg /Ser tracts. O n t his str uct ure, c o m plete s pliceoso mes wo ul d for m enco m passing the matrix proxi mal s plice sites an d exec ute s plicing. The exon pro d uct R N As would re main matrix bound and move by so me unkno wn mechanis m to the nuclear pore.

CO NCLUSIO N T he disc o ver y of s plit ge nes a n d R N A s plici n g has bee n critical f or st u dies of t h e bi ol o g y of e u k ar y oti c or g a nis ms. G e n e r e g ul ati o n is c e ntr al t o all bi o- logical p he no me na a n d R N A s plici ng is i m porta nt i n t he reg ulatio n of genes, partic ularly when prec ursor R N As are processe d by alter native pat h- wa ys t o ge nerate mR N As e nco di ng differe nt protei ns. T he mec ha nis m of s plici ng by t he s pliceoso me is probably relate d to t he self-s plici ng process of gr o u p I a n d II i ntr o ns. T h e s pli c e os o m e pr o c ess is ol d i n a n e v ol uti o n ar y sense, perha ps as ol d as the riboso mal process res ponsible for . Th us, the e ukaryotic cell can be conject ure d as consisting of t wo co m part- P hilli p A. S h ar p 1 6 9 ments, the nucleus where the spliceoso me processes R N A precursors by R N A catalysis a n d t he cyto plas m w here t he riboso me tra nslates mR N As b y R N A catal ysis. T he disti nct s u b n uclear locatio ns of s pliceoso me-relate d co m po ne nts s uggest a co m part me ntalize d orga nizatio n for t he n ucle us. F urther st u dies of R N A splicing an d transport hol d pro mise of revealing t he nat ure of t he or ga nizati o nal s pecificit y of t he n ucle us.

REFERENCES 1. D a r n ell, J. E., J r. ( 1 9 7 5) H ar ve y Lect ures, 69 , l- 4 7. 2. F ur uic hi, Y., Morga n, M., M ut h u kris h na n, S. a nd S hat ki n, A. J. (1975) Pr o c. N atl. A c a d. S ci. U S A, 7 2, 3 6 2 - 3 6 6. 3. W ei, C. M. a n d M o s s, B. ( 1 9 7 4) Pr o c. N atl. A c a d. S ci. U S A, 71, 3014-3018. 4. Rott ma n, F., S hat ki n, A. a n d Perry, R. P. (1974) Cell, 3, 197-199. 5. E d m o n d s, M., V a u g h a n, M. H., Jr., a n d N a k a z ot o, H. ( 1 9 7 1) Pr o c. N atl. A c a d. S ci. U S A , 6 8, 1 3 3 6- 1 3 4 0. 6. L e e, S. Y., M e n d e c ki, J. a n d B r a w e r m a n, G. ( 1 9 7 1) Pr o c. N atl. Ac a d. S ci. U S A, 6 8, 1 3 3 1- 1 3 3 5. 7. D a r n ell, J. E., J r., W all, R. a n d T u s hi n s ki, R. J. ( 1 9 7 1) Pr o c. N atl. A c a d. S ci. U S A , 6 8, 1 3 2 1- 1 3 2 5. 8. P ett e r s s o n, U., M ul d e r, C., D eli us, H. a n d S h a r p, P. A. ( 1 9 7 3) Pr o c. N atl. A c a d. S ci. U S A, 7 0, 2 0 0 - 2 0 4. 9. S h a r p, P. A., G alli m o r e, P. H. a n d Fli nt, S. J. ( 1 9 7 4) Cold Spring Harbor Sy mp. Q u a nt. Bi ol., 3 4, 4 5 7 - 4 7 4. 1 0. Fli nt, S. J., G alli m o r e, P. H., a n d S h a r p, P. A. ( 1 9 7 5) J. M ol. Bi ol., 9 6, 4 7 - 6 8. 1 1. Fli nt, S. J. a n d S h a r p, P. A. ( 1 9 7 6) J. M ol. Bi ol., 1 0 6, 7 4 9- 7 7 1. 12. Bac he n hei mer, S. a n d Dar nell, J. E. (1975) Pr o c. N atl. A c a d. S ci. U S A, 7 2, 4 4 4 5 - 4 4 4 9. 1 3. B er g et, S. M., M o or e, C. a n d S h ar p, P. A. ( 1 9 7 7) Pr o c. N atl . A c a d. S ci. U S A, 7 4, 3171-3175. 1 4. L e wi s, J., At ki n s, J. F., A n d e r s o n, C., B a u m, P. R. a n d G e st el a n d, R. F. ( 1 9 7 5) Pr o c. N atl. A c a d. Sci. US A, 72, 4445-4449. 1 5. T h o m a s, M., W hit e, R. L. a n d D a vi s, R. W. ( 1 9 7 6) Pr o c. N atl. A c a d. S ci. U S A, 73,2294 - 2 2 9 8. 1 6. C a s e y, J. a n d D a vi d s o n, N. ( 1 9 7 7) N u cl ei c A ci d R e s., 4, 1539-1552. 1 7. W e st p h al, H., M e y e r, J. a n d M ai z el, J. ( 1 9 7 6) Pr o c. N atl. A c a d. S ci. U S A, 7 3, 2069-2071. 1 8. C h o w, L. T., R o b e rt s, J. M., L e wi s, J. B. a n d B r o k e r, T. R. ( 1 9 7 7) C ell, 1 1, 819-836. 1 9. G ol d b e r g, S., W e b e r, J. a n d D a r n ell, J. E., J r. ( 1 9 7 7) C ell, 1 0, 617-622. 2 0. W e b e r, J., J eli n e k, W. a n d D a r n ell, J. E., J r. ( 1 9 7 7) C ell, 1 0, 611-616. 2 1. B e r g et, S. M. a n d S h a r p, P. A. ( 1 9 7 9) J. M ol. Bi ol., 1 2 9, 547-565. 2 2. P e r r y, R. P. a n d K ell e y, D. E. ( 1 9 7 6) C ell, 8, 4 3 3 - 4 4 2. 2 3. G eli n a s, R. E. a n d R o b e rt s, R. J. ( 1 9 7 7) C ell, 11, 533-544. 2 4. J eff r e y s, A. J. a n d Fl a v ell, R. A. ( 1 9 7 7) C ell, 1 2, 1 0 9 7- 1 1 0 8. 2 5. Til g h m a n, S. M., Ti e m ei er, D. C., S ei d m a n, J. G., P et erli n, B. M., S ulli v a n, M., M ai z el, J. V. a n d L e d e r, P. ( 1 9 7 8) Pr o c. N atl. A c a d. S ci. U S A, 7 5, 7 2 5 - 7 2 9. 2 6. Breathnach, R., Manel, J- L. and Cha mbon, P. (1977): N at ur e, 2 7 0, 314-319. 2 7. T o n e g a w a, S., M a x a m, A. M., Ti z a r d, R., B e r n a r d, 0. a n d Gil b e rt, W. ( 1 9 7 8) Pr o c. N atl. A c a d. S ci. U S A, 7 5, 1 4 8 5 - 1489. 2 8. Breat h nac h, R. a nd C ha mbo n, P. (1981) Annu. Rev. Bioche m., 50, 349-384. 2 9. P a d g ett, R. A., Gr a b o w s ki, P. J., K o n ar s k a, M. M., S eil er, S. a n d S h ar p, P. A. ( 1 9 8 6) A n n u . R e v. Bi o c h e m., 5 5, 1119-1150. 1 7 0 Physiology or Medicine 1993

3 0. Fi n k, G. R. ( 1 9 8 7) C ell, 4 9, 5 - 6. 3 1. Tr ei s m a n, R, Or ki n, S. a n d M a ni atis, T. ( 1 9 8 3) N at ur e, 3 0 2, 591-596. 3 2. Ni gr o, J. M., C h o, K. R., Fears o n, E. R., K er n, S. E., R u p p ert, J. M., Oli v er, J. D., Ki n zl er, D. W. a n d V o g elst ei n, B. ( 1 9 9 1) C ell, 6 4, 607-613. 33. Coc q uerelle, C., Da ubersies, P., Majer us, M.- A., Kerckaert, J.- P. a n d Baille ul, B. ( 1 9 9 2) E MB O J., 1 1, 1 0 9 5 - 1 0 9 8 . 3 4. P ulok, R. a n d A n derso n, P. (1993) Ge nes D e v. , 7, 1885- 1 8 9 7. 3 5. P at el, R. S., O d er m att, E., S c h w ar z b a u er, J. E. a n d H y n e s, R. 0. ( 1 9 8 6) E MB O J., 6 , 2 5 6 5 - 2 5 7 2 . 3 6. H y n es, R. 0. ( 1 9 8 9) Fi br o necti n , Springer- Verlag, Ne w York. 3 7. B a k er, B. S. ( 1 9 8 9) N a t u r e , 3 4 0, 521-524. 38. Tia n, M. a n d Ma niatis, T. (1992) Scie nce , 256, 237-240. 3 9. Gil b ert, W. ( 1 9 7 8) N a t u r e , 2 7 1, 5 0 1. 4 0. D a r n ell, J. E., Jr. ( 1 9 7 8) Scie nce , 202, 1257. 4 1. D o olittl e, W. F. ( 1 9 7 8) N at ure, 272, 581. 4 2. Bl a k e, C. C. F. ( 1 9 7 8) N at ure, 273, 267. 4 3. St o n e, E. M. a n d S c h w art z, R. J., e dit or s ( 1 9 9 0) Intervening Sequences in Evolution and Develop ment , Oxfor d University Press, Ne w York. 44. Lo nberg, N. a n d Gilbert, W. (1985) C ell , 4 0, 8 1- 9 0. 45. A ga bia n, N. (1990) Cell, 61, 1157-1160. 4 6. B or st, P. ( 1 9 8 6) A n n. Re v. Rioche m., 55, 701-732. 47. Kra use, M. a n d Hirs h, D. (1987) Cell, 49, 753-761. 4 8. V ell ar d, M., S ur e a u, A., S or et, J., Marti nerie, C. a n d Per bal, B. (1992) Pr o c. N atl. A c a d. S ci. U S A, 8 9, 2 5 1 l-2515. 4 9. B e n n e, R., v a n d e n B ur g, J., Bra n he n hoff,J. P. J., Sl o of, P., v a n B o o m, J. H. a n d Tr a m p, M. C. ( 1 9 8 6) C ell , 46, 819-826. 5 0. P a d g ett, R. A., H ar d y, S. F. a n d S h ar p, P. A. ( 1 9 8 3) Pr o c. N atl. Ac a d. Sci. U S A, 80,523 O - 5234. 5 1. Gree n, M. R., Ma niatis, T. a n d Melto n, D. A. (1983) Cell, 32, 681-694. 52. Her na n dez, N. a n d Keller, W. (1983) C ell, 3 5, 8 9- 9 9. 5 3. Gr a b o w s ki, P. J., P a d g ett, R. A. a n d S h ar p, P. A. ( 1 9 8 4) C ell , 37, 415-427. 54. Krai ner, A. R., Ma niatis, T., R us ki n, B. a n d Gr e e n, M. R. ( 1 9 8 4) C ell, 3 6, 9 9 3 - 1005. 5 5. K o n ar s k a, M. M., Gr a b o w s ki, P. J., P a d g ett, R. A. a n d S h ar p, P. A. ( 1 9 8 5) Nat ure, 313, 552-557. 5 6. W all a c e, J. C. a n d E d mo n ds, M. (1983) Pr o c. N atl. A c a d. S ci. U S A, 8 0, 9 5 0- 9 5 4. 5 7. Gr a b o w s ki, P. J., S eil er, S. R. a n d S h ar p, P. A. ( 1 9 8 5) C ell, 4 2, 3 4 5 - 3 5 3. 5 8. Br o d y, E. a n d A b el s o n, J. ( 1 9 8 5) Scie nce, 228, 963-967. 5 9. St eit z, J. A., Bl a c k, D. L., G er k e, V., P ar k er, K. A., Kr a m er, A., Fr e n d e w a y, D., a n d K ell er, W. ( 1 9 8 8) I n Structure and function of major and minor s mall nuclear ribonucleoprotein particle s ( e d. M. L. Bir nsti el), p p. 1 1 5 - 1 5 4, S pri n ger- Verlag, Ne w York. 6 0. C e c h, T. R. (1985) Cell, 43, 713-716. 6 1. P e e bl e s, C. L., P erl m a n, P. S., Meckle nb urg, K. L., P etrill o, M. L., T a b or, J. H., J arr ell, K. A. a n d C h e n g, H.- L. ( 1 9 8 6) Cell, 44, 213-223. 6 2. v a n d er V e e n, R., Ar n ber g, A. C., v a n d er H or st, G., B o n e n, L., T a b a k, H. F. a n d Gri v ell, L. A. ( 1 9 8 6) Cell, 44, 225-234. 6 3. S h ar p, P. A. ( 1 9 9 1) Scie nce, 254, 663. 6 4. Hannon, G. J., M ar o n e y, P. A., D e n k er, J. A. a n d Nils e n, T. W. ( 1 9 9 0) C ell, 6 1, 1 2 4 7 - 1 2 5 5 . 6 5. W at ki ns, K. R., D ungan, J. M. a n d A g a bi a n, N. ( 1 9 9 3) C ell , i n press. 6 6. L er n er, M. R., B o yl e, J. A., M o u nt, S. M., W oli n, S. L. a n d St eit z, J. A. ( 1 9 8 0) N at ur e, 2 8 3, 2 2 0- 2 2 4. 6 7. R o gers, J. a n d W all, R. ( 1 9 8 0) Pr o c. N atl. A c a d. S ci. U S A, 7 7. 1877- 1 8 7 9. P hilli p A. S h ar p 1 7 1

6 8. P a d g ett, K. A., M o u nt, S. M., St eit z, J. A. a n d S h a r p, P. A. ( 1 9 8 3) C ell, 3 5, 1 0 1 - 1 0 7. 6 9. P a r k e r, R., Sili ci a n o, P. G. a n d G ut h ri e, C. ( 1 9 8 7) C ell, 4 9, 2 2 9- 2 3 9. 70. G ut hrie, C. a nd Patterso n, B. (1988) A n n u. R e v. G e n et., 2 2, 3 8 7- 4 1 9. 7 1. K o n a r s k a, M. M. a n d S h a r p, P. A. ( 1 9 8 7) C ell, 4 9, 7 6 3- 7 7 4. 7 2. M a d h a ni, H. D. a n d G ut h ri e, C. ( 1 9 9 2) C ell, 7 1, 8 0 3- 8 1 7. 7 3. Mc P heeters, D. S. a nd Abelso n, J. (1992) Cell, 7 1, 8 1 9- 8 3 1. 7 4. S a w a, H. a n d A b el s o n, J. ( 1 9 9 2) Pr o c. N atl. A c a d. S ci. U S A, 8 9, 11269- 11273. 7 5. W a s s ar m a n, D. A. a n d St eit z, J. A. ( 1 9 9 2) S ci e n c e, 2 5 7, 1 9 1 8- 1 9 2 5. 7 6. N e w m a n, A. a n d N o r m a n, C. ( 1 9 9 2) C ell, 6 8, 7 4 3- 7 5 4. 7 7. M o o r e, M. J., Q u e r y, C. C. a n d S h a r p, P. A. ( 1 9 9 3) I n T h e R N A W orl d, C ol d Spri ng Harbor Laboratory Press, pp. 303 - 3 5 7. 78. Seraphin, B. and Rosbash, M. (1991) E M B O J., 10, 1209-1216. 7 9. Mi c h a u d, S. a n d R e e d, R. ( 1 9 9 1). G e n e s D e v., 5, 2534-2546. 8 0. J a mi s o n, S. F. a n d G a r ci a- Bl a n c o, M. A. ( 1 9 9 2) Pr o c. N atl. A c a d. S ci. U S A, 8 9, 5 4 8 2 - 5 4 8 6. 8 1. R us ki n, B., Z a m o r e, P. D. a n d G r e e n, M. R. ( 1 9 8 8) C ell, 5 2, 2 0 7- 2 1 9. 8 2. R u b y, S. W. a n d A b el s o n, J. ( 1 9 8 8) Tr e n d s G e n et., 7, 7 9- 8 5. 8 3. G ut hrie, C. (1991) Scie nce, 253, 157-163. 8 4. Mc S wiggen, J. A. a n d C e c h, T. R. ( 1 9 8 9) Scie nce, 244, 679-683. 8 5. R aj ago p al, J., D o u d n a, J. A. a n d S z o st a k, J. W. ( 1 9 8 9) Scie nce, 244, 692-694. 8 6. S u h, E.- R. a n d W a ri n g, R. B. ( 1 9 9 2) Nucleic Acids Res., 20, 6303 -6309. 8 7. M o o r e, M. J. a n d S h a r p, P. A. ( 1 9 9 3) N at ur e, 3 6 5, 3 6 4- 3 6 8. 8 8. M o o r e, M. J. a n d S h a r p, P. A. ( 1 9 9 2) Scie nce, 256, 992-997. 8 9. B e y e r, A. L. a n d O s h ei m, Y. N. ( 1 9 8 8) G e n e s D e v., 2, 7 5 4- 7 6 5. 9 0. M e hli n, H., D a n e h olt, B. a n d S k o gl u n d, U. ( 1 9 9 2) C ell, 6 9, 605-613. 9 1. S p e ct o r, D. L. ( 1 9 9 3) A n n u. R e v. C ell Bi ol., 9, 2 6 5- 3 1 5. 9 2. N y m a n, U., Hall ma n, I-I., H a dl ac z k y, G., P ett er s s o n, I., S h ar p, G. a n d Ri ngertz N. R. ( 1 9 8 6) J . C ell. Biol ., 1 0 2, 1 3 7- 1 4 4. 9 3. C art er, K. C., B o w m a n, D., C arri n gt o n, W., Fog arty, K., Mc N eil, J. A., F a y, F. S. a n d La wre nce,J. B. (1993) Scie nce, 259, 1330 - 1 3 3 4. 9 4. Xi n g, Y., J o h n s o n, C. V., D o b n e r, P. R. a n d L a w r e n c e, J. B. ( 1 9 9 3) S ci e n c e 2 5 9, 1326-1330. 9 5. G e, H. a n d M a nl e y, J. L. ( 1 9 9 0) C ell, 6 2, 2 5- 3 4. 9 6. K r ai n e r, A. R., C o n w a y, G. C. a n d K o z a k, D. (1990b) C ell, 6 2, 3 5 - 4 2. 9 7. F u, X.- D. a n d M a ni ati s, T. ( 1 9 9 0) N at ure, 343, 437-441. 9 8. Zeitli n, S., Wilso n, R. C. a nd Efstratiadis, A. (1989) J. C ell. Bi ol., 1 0 8, 7 6 5- 7 7 7. 9 9. R ot h, M. B., M u r p h y, C. a n d G all, J. G. (199 O)J. C ell. Bi ol., 111, 2217-2223. 1 0 0. R ot h, M. B., Z a hl e r, A. M. a n d St ol k, J. A. (1991)J. C ell. Bi ol., 1 1 5, 5 8 7- 5 9 6. 1 0 1. Bl e n c o w e, B. J., Ni c k e r s o n, J. A., I s s n e r, R., P e n m a n, S. a n d S h a r p, P. A. (1993) i n pre paratio n.