MOLECULAR AND CELLULAR BIOLOGY, Aug. 1984, p. 1460-1468 Vol. 4. No. 8 0270-7306/84/081460-09$02.00/0 Copyright ©D 1984, American Society for Microbiology Sequences on the 3' Side of Hexanucleotide AAUAAA Affect Efficiency of Cleavage at the Site MOSHE SADOFSKY AND JAMES C. ALWINE* Department of Microbiology, School of Medicine/G2, UnivSersity of Pennsylvania, Phtiladelphia, Pennsylvania /9104 Received 3 April 1984/Accepted 8 May 1984

The hexanucleotide AAUAAA has been demonstrated to be part of the signal for cleavage and polyadenyla- tion at appropriate sites on eucaryotic mRNA precursors. Since this sequence is not unique to polyadenylation sites, it cannot be the entire signal for the cleavage event. We have extended the definition of the polyadenylation cleavage signal by examining the cleavage event at the site of polyadenylation for the simian virus 40 late mRNAs. Using viable mutants, we have determined that deletion of sequences between 3 and 60 nucleotides on the 3' side of the AAUAAA decreases the efficiency of utilization of the normal polyadenylation site. These data strongly indicate a second major element of the polyadenylation signal. The phenotype of these deletion mutants is an enrichment of viral late transcripts longer than the normally polyadenylated RNA in infected cells. These extended transcripts appear to have an increased half-life due to the less efficient cleavage at the normal polyadenylation site. The enriched levels of extended transcripts in cells infected with the deletion mutants allowed us to examine regions of the late transcript which normally are difficult to study. The extended transcripts have several discrete 3' ends which we have analyzed in relation to polyadenylation and other RNA processing events. Two of these ends map to nucleotides 2794 and 2848, which lie within a region of extensive secondary structure which marks the putative processing signal for the formation of the simian virus 40- associated small RNA. A third specific 3' end reveals a cryptic polyadenylation site at approximately nucleotides 2980 to 2985, more than 300 nucleotides beyond the normal polyadenylation site. This site appears to be utilized only in mutants with debilitated normal sites. The significance of sequences on the 3' side of an AAUAAA for efficient polyadenylation at a specific site is discussed.

The 3' ends of polyadenylated messages in higher eucary- type and mutants the cleavage at the normal polyadenylation otes appear to be generated by cleavage from much larger site is relatively efficient. However, cells infected with primary transcripts. The specificity of cleavage and poly- certain mutants, which contain deletions on the 3' side of the adenylation is partly determined by the sequence AAUAAA AAUAAA, generated enriched levels of late RNAs which (17, 37), or close homologs, found 11 to 30 bases 5' to the extend beyond the normal polyadenylation cleavage site. actual cleavage site. This mechanism has been demonstrated The enrichment of the population of primary, extended in viral (1, 3, 11, 18, 19, 28, 29, 34) and cellular (22, 26) genes. transcripts in the mutants suggests that the normal process- Although the necessity for the AAUAAA sequence has been ing of these RNAs has been made less efficient by the clearly demonstrated (17, 24, 33), this sequence alone cannot specific deletions. These observations imply that specific be sufficient to signal the processing, because it also occurs sequences downstream from the AAUAAA are part of the within coding regions of messages, such as within the simian processing recognition signal. virus 40 (SV40) early coding region (43), the adenovirus type The enriched levels of extended transcripts have allowed 12 ElA unit (36), and the chicken ovalbumin us to examine regions of the primary late transcript which gene (31). are normally difficult to detect. By mapping distinct 3' ends To understand better the mechanism underlying the utili- of the extended transcripts, several important observations zation of specific AAUAAA signals as polyadenylation sites, have been made. (i) Through alteration of the efficient we chose to study the polyadenylation of the late messages polyadenylation at the normal site, we have been able to of SV40. The 3' ends of SV40 late mRNAs (see Fig. 1) are detect an alternative polyadenylation site within the extend- formed specifically at nucleotide 2674 (SV numbering) (43) ed region at approximately nucleotide 2980. The resulting by cleavage of much larger nuclear transcripts (3, 28, 29). polyadenylated RNA is transported to the cytoplasm. (ii) We The region of the viral genome surrounding the polyadenyla- have located the processing site for the 5' end of the SV40- tion site is untranslated in both the early and the late senses; associated small RNA (SAS RNA; 4, 5, 6, 30). This site lies for this reason it is amenable to extensive mutation and within a region of extensive secondary structure indicative deletion analysis without loss of viral viability. This feature of a processing signal. allowed Fitzgerald and Shenk (17) to demonstrate that deletions surrounding the AAUAAA of the late polyadenyla- MATERIALS AND METHODS tion site permitted polyadenylation to occur, but deletion of the hexanucleotide prevented polyadenylation at this. site. Cells and infection conditions. All experiments were per- Since these studies analyzed only polyadenylated RNAs, the formed with the established line CV-1P of African green effect of the deletions on the efficiency of polyadenylation at monkey kidney cells grown in Dulbecco modified Eagle the normal site could not be determined. Using the mutants medium supplemented with glutamine (2 mM), penicillin of Fitzgerald and Shenk (17), we have found that in both wild (100 U/ml), streptomycin (100 FLg/ml), and 10% fetal bovine serum for propagation or 2% serum for maintenance and viral infection. * Corresponding author. T75 flasks with a confluent monolayer of cells were 1460 VOL. 4, 1984 EFFICIENCY OF POLYADENYLATION 1461 infected with a specific virus at 10 PFU per cell for 2 h in 3 ml used directly or further treated with a specific restriction of medium (2% fetal bovine serum) at room temperature with enzyme to generate a molecule labeled on only one strand. rocking. Cells were then fed and incubated at 37°C. The probes are purified by gel electrophoresis to eliminate Viruses. The wild-type SV40 strain 776 was used. Deletion unincorporated material and to isolate specific restriction mutant strains d1882 (41) and d11455, d11457, dl1458, d11453, fragments. and dl1465 (17) were generously provided by T. Shenk. Nuclease S1 analysis. Si analysis followed the procedure of Deletion strains dl1263 and d11265 (13, 14) were gifts of C. Berk and Sharp (10), with hybridization temperatures opti- Cole. The double deletion strain d11465-1263 was construct- mized for each probe. The probe generated from the EcoRI ed by ligating the BelI-to-BglI B fragment (nucleotides 2771 site was hybridized at 53°C. The BamHI probe was hybrid- to 5235) of dl1263 to the corresponding A fragment (nucleo- ized at 46°C. tides 5236 to 2770) of dl1465. Viral stocks were prepared Computer analysis of DNA sequences. RNA secondary after plaque purification. structure was deduced from a dot-matrix analysis of the RNA and DNA extraction. RNA was extracted from cells corresponding DNA sequences by using the programs of between 40 and 48 h after infection either as total cellular Fristensky et al. (20), as modified for an Apple II Plus RNA or as separate nuclear and cytoplasmic pools by the computer with Epson MX-80 printer. method of Villareal (47). The RNA preparations were freed of DNA by treatment with RNase-free DNase prepared by RESULTS the method of Tullis and Rubin (45). Viral DNA was Experimental design. To establish the effect of specific extracted by the method of Hirt (25), followed by CsCl deletions on the efficiency of polyadenylation processing at equilibrium gradient centrifugation in the presence of ethi- the SV40 late RNA polyadenylation site, we analyzed the dium bromide (38). normally polyadenylated RNA as well as the extended Selection of polyadenylated RNA. For somne experiments, primary transcripts by quantitative nuclease S1 hybridiza- polyadenylated RNA was selected by column chromatogra- tion analysis (10). However, we found that this analysis was phy on oligodeoxythymidylate cellulose (8). The RNA prep- not straightforward, due to the marked change in the AT aration was passed through the column three times in loading content of the genomic region surrounding the late polyaden- buffer (0.5 M NaCl, 10 mM Tris-hydrochloride [pH 7.5]), and ylation site (71% AT within the 160 nucleotides preceding the flowthrough was designated the nonpolyadenylated pool. the AAUAAA signal and 53% AT within the 150 nucleotides The retentate was eluted in 10 mM Tris-hydrochloride (pH following it). The unique BamHI site at base no. 2533 (Fig. 1 7.5), and the whole process was repeated for a total of three and 2) is only 141 bases upstream from the polyadenylation selections. The final elution was designated the polyadeny- site in the wild-type virus and less in the case of many of the lated pool. Between samples, the column was cleared with deletions. The richness of AT content on the 5' side of the 0.1 M NaOH and then reequilibrated with loading buffer. polyadenylation signal caused hybrid instability in this re- DNA hybridization probe preparation with T4 DNA poly- gion. Thus, a DNA probe extending from the BamHI site merase. The T4 polymerase replacement synthesis technique was effective in quantitating and mapping the extended of O'Farrell et al. (35) was adapted to label DNA for Si transcripts but formed short, unstable hybrids with the probes. By this technique a segment of DNA is labeled to the normally polyadenylated RNAs. Therefore, an alternative same extent as by nick- methods without leaving a probe extending from the EcoRI site, containing more se- residual nick that would interfere with nuclease Si analysis. quence homology with the polyadenylated RNAs, was used The resulting probe incorporates considerably more label to quantitate these RNAs. Although the extended transcripts than the equivalent terminally labeled probe and is corre- were detected with the EcoRI probe, they could be mapped spondingly more sensitive. By choosing conditions such that and quantitated more readily with the smaller BamHI probe. the labeled portion of the probe is fully protected in the Using this set of probes labeled at their 3' ends (see above), RNA-DNA hybrid, the intensity of the resulting signal is we were able to map the extended transcripts and quantitate solely dependent on the number of hybrid molecules formed them relative to the normally polyadenylated RNA dis- and not on the size of the additionally protected regions, in cussed below. contrast to the situation with uniformly labeled probes. Analysis of wild-type and deletion mutant late transcripts. Thus, under appropriate conditions, better quantitation is The deletion mutants surrounding the late polyadenylation possible. site which we used in this study are diagrammed in Fig. 1 and Restriction enzyme-digested DNA was adjusted to 1x in are shown in sequence form in Fig. 2. The initial isolation T4 polymerase buffer (33 mM Tris-acetate [pH 7.9], 66 mM and characterization of these mutants has been described potassium acetate, 10 mM magnesium acetate, 0.1 mg of elsewhere (14, 17, 41). Most of these deletions cluster bovine serum albumin per ml, 0.5 mM dithiothreitol), and T4 around the AAUAAA hexanucleotide of the late polyadeny- polymerase (Amersham Corp.) was added to a concentration lation site, with the exception of d11263 and d11265, which of 1.25 U/pug of DNA and incubated at 37°C. Under these begin at nucleotides 2798 (Fig. 1; see Fig. 5) and 2682 (Fig. 1 conditions, the enzyme acts as a 3' exonuclease with a and 2), respectively. removal rate of ca. 7 to 10 nucleotides per min from each 3' Total cellular RNA was isolated from cells infected for 40 end. After sufficient digestion, three unlabeled deoxynucleo- h with deletion mutants or wild-type SV40 strain 776. By side triphosphates (dNTPs) were added to a final concentra- using nuclease S1 hybridization analysis (10), the late RNA tion of 100 nM. The fourth dNTP was supplied as [cx- was quantitatively analyzed (with probe excess hybridiza- 3-P]dNTP (3,000 Ci/mmol) and added in excess of that tion conditions) with a variety of labeled DNA probes. First, required for resynthesis. The polymerization reaction was each RNA was hybridized with a wild-type SV40 776 DNA incubated for twice the duration of the digestion and then probe which had been linearized and 3' end labeled at the chased with unlabeled dNTP for a similar period to ensure EcoRI site with T4 polymerase (Fig. 3A). As discussed uniform 3' ends. Typically, 1 ,ug of the BamHI probe, above, this probe will form stable hybrids with both the described below, was prepared by digestion for 4 min before correctly polyadenylated and the extended late transcripts. resynthesis with 50 ,Ci of dTTP. The resulting probe can be With wild-type RNA, it is clear that virtually all of the late 1462 SADOFSKY AND ALWINE MOL. CELL. BIOL. / A A_. EARLY 1 500 1000 1500 2000 2500 B 3000 3500 4000 4500 5000 5243 ,- . L L t t A t 294 1782 1988 2533 2770 4739 Kpnl EcoRI Pstl BamHI BcIl Taql

-In_ SAS-RNA LATE -lAn Region of Extended Late Transcripts

An EARLY 2533 2586 2600 2700 2800 2900 3000 - T BamHl Late Poly A dl1265 Bcll dl1263 Deletions Region SAS-RNA LATE 2674 Region of Extended Late Transcripts

* * * (An) 2728 2794 2848 2980-2985 FIG. 1. Maps of the SV40 genome. (A) Map of the entire SV40 genome linearized at the origin of replication. The nucleotide numbering is the SV numbering described previously (43). The early and late mRNAs are drawn above and below the numbered line, respectively. In addition, two features discussed in this paper, the region of extended late transcripts and the position of the SAS RNA (at approximately nucleotides 2848 to 2913), are shown. Several restriction enzyme cleavage sites are indicated, and of these the EcoRI and BamnHl sites are of specific relevance to these studies. The bold segment indicates the region expanded in (B). (B) Expanded drawing of the sequence between nucleotides 2533 and 3000. The general features are as described for (A). The regions deleted by deletion mutants d1263 and d1265 are specifically indicated. In addition, the general area of deletions within the late polyadenylation region is shown. These deletion mutants are shown in more detail in Fig. 2, which is the sequence of the stippled region. The asterisks and arrows denote the nucleotide position of specific 3' ends of extended transcripts which have been mapped and analyzed in this paper. transcripts end at the normal polyadenylation site; this fact is late RNA, both normally polyadenylated transcripts and represented by the band migrating at 892 nucleotides in Fig. extended transcripts, since all will end at the site of the 3A. Hybridizations of the various deletion mutant RNAs deletion in this Si analysis with a heterologous DNA probe. with this probe are represented by bands migrating slightly Measurement of the relative intensities of these bands in smaller than 892 nucleotides, due to the disruption of the other experiments shows total late RNA levels are similar nuclease Si-protected region by the specific deletions. Thus, for the various mutants and the wild type (data not shown). for the deletion mutapts, this band is representative of total This fact indicates that the transcription rate of the late

CAM HI 2540 2550 2560 2570 2580 2590 2600 2610 2620

CCTRGGTCTCCTGTGTCTCCCACCTACCCGTCGGATACTTTOOTOTTGATCTTRCOTCACTTTTTTTACGRARTRARCmOGATCCAGAGGACRCAGAGGGTGGATGGGCAOCCTATGORACCACAACTAOAATGCAGTGARARAATGCyTTTATTTGTGAAATTTGTGRTGCCTTTARaACATAmC An(E)-14--1458

2630 2640 2650 2660 2670 2680 2690 2700 2710 2720 An(L) TATT6CYTTATTTGTARCCRTTATRIGCTGCRATAA'RCRROTTRACRR6RRCRAtTOCRTTCRTfTTRTOTTTCA-0GTTCROGGGGROGTGTGGGRGGTT. . I ATAROGRARTARACATTGGTRATATninnwwnnninnnwnlvwwvnngnilwwnwwtlnlllwICOROGTTATTTGTTORATTOTTOTTGTTAROGTRRGTARRRTRCRRAOTCCRROTCCCOCTCCRCOCOCTCCRRI w I 9 rvrv_ 9 rvrww t rororlrwv Ww-

882 1453 _1458 1465- 1265 ------FIG. 2. Sequence of the polyadenylation region. The stippled region of Fig. 1B is shown here as sequence. The late RNA is the same as the top strand reading left to right. The late AATAAA signal is overlined, and the predominant wild-type site of late polyadenylation, An(L), is shown. The early RNA is the same as the bottom strand. The two possible early AATAAA signals are underlined, and the predominant site of wild-type early polyadenylation, An(E), is noted. The sequences deleted by seven deletion mutants used in these studies are noted as dashed lines. Note that mutants d11457, d11458, and dl1465 have deletions on each side of the late AATAAA. VOL. 4, 1984 EFFICIENCY OF POLYADENYLATION 1463

A B forms short unstable hybrids with the normally polyadenyl- ated RNAs (141 nucleotides of homology with wild-type RNAs and less with the deletion mutants); thus, these bands An in in 0 'O are very weak and underrepresented in Fig. 3B. However, in , co ~ each of the deletion mutant lanes (and in d11458, not shown in Fig. 3) one or more prominent bands are clearly seen 3 33 migrating at a size much larger than 141 nucleotides. These I Ii I ~~bands vary in size as predicted by the extent of the specific '0 K4ini l I I 3 deletion. They clearly represent extended transcripts with 752 nUCSa discrete ends. Brackets in Fig. 3B indicate families of bands I 3 generated by extended transcripts sharing the same discrete end. The detailed mapping of these and other ends is shown I w 540 752 *III 3 in Fig. 4, discussed below. Analogous bands can be weakly _do detected in equivalent amounts of the wild-type RNA (Fig. 3B, lane 776); however, the weakness of the signal indicates 540 0 373 that the abundance of extended transcripts in the wild type 3 " must be very low compared with that in the mutants. These results demonstrate that deletions around the late polyaden- ylation site correlate with an increase in the amount of 373 0 300 4 extended transcripts present in the infected cells. This result suggests that the deletions have lowered the efficiency of 325 0 ., utilization of the polyadenylation site, which results in a 300 ] longer half-life of the extended precursor transcripts. Table 1 i~ i *shows a quantitative comparison of the data after densitome- tric analysis, with correction for the differences in actual amount of SV40 mRNA hybridized as determined from the * data in Fig. 3A. The results are expressed as the percentage of late RNA extending beyond the normal late polyadenyla- tion site. The results indicate that a deletion of as few as three bases (dl882) to the 3' side of the AAUAAA increases the level of extended transcripts 5-fold and that increasing the size of this deletion increases the levels as much as 60- fold. Furthermore, the results with the set of d1453, d11457, and d11465 (Fig. 2), which share the same deletion on the 3' side but differ in the extent of the deletion on the 5' side of the AAUAAA, indicate that adding deletions on the immedi- ate 5' side of the AAUAAA does not seem to alter the abundance of the extended transcripts. The lower precen- FIG. 3. Effects of deletions near AAUAAA on abundance of tage of extended transcripts in d11465 is insignificant, since extended late transcripts. (A) Quantitation of twofold differences are within the error of this This from cells infected with wild-type or polyadenyllation region deletion analysis. mutants. RNA was extracted from CV-1P cells iinfected for 40 h (see result suggests that the effect is mediated predominantly or the text). One microgram of total cellular RNA from each infection totally by sequences 3' to the hexanucleotide signal. Finally, was analyzed by the nuclease Si procedure with a vit776 viral DNA d11265 also showed an increase in the percentage of extend- probe 3' end labeled at the EcoRI site. Proteccted fragments were ed transcripts despite the fact that this deletion is further electrophoresed on a denaturing 5% acrylamide gel. The numbers of downstream (Fig. 2), beginning 8 nucleotides beyond the site the lanes denote either wild-type virus (776) or the number of the of cleavage and polyadenylation as mapped in the wild type. deletion mutant used in the infection. Lane m sihows nucleotide size Based on these observations, that deletions over the range of markers, and lane u shows the uninfected cell RINA control. (B) Five 3 to 60 bases downstream from the AAUAAA signal alter the micrograms of the same RNA preparations ex:amined in (A) were of we that within this analyzed by the nuclease SI procedure with hoimuologous viral DNA efficiency cleavage, suggest sequences probes labeled at the BamHI site (see the text). The brackets region form part of the signal for cleavage at a specific indicate abundant families of extended transjcripts found in the polyadenylation site (see below). deletion mutants which have coterminal 3' ends but differ in size in Mapping the discrete ends of the extended transcripts. The relation to the size of the deletion in the corres ponding viral DNA. extended transcripts which accumulate in the deletion mu- tant infections appear to have discrete 3' ends (Fig. 3B). These ends could represent nuclease Si artifacts generated within regions of the hybrid rich in dA-rU base pairs, or they region is similar among all the viruse s. Therefore, the may represent authentic processing or termination sites. We relative intensities of the bands in Fig 3A allow us to therefore mapped the positions of the 3' ends. For these normalize the total quantities of late 'SV40 RNA when experiments we utilized wild-type RNA as a control and comparing results in the subsequent experiments. The bands dl1465 as our test deletion mutant RNA to be mapped. We migrating below the expected band in ea(ch lane in Fig. 3A substantially increased the amount of wild-type RNA in this represent artifacts of the nuclease SI anaJysis. hybridization analysis to more easily detect and map the Additional samples of the same RNA preparations were extended transcripts from the wild-type infection. As a next hybridized to probes of their homolc)gous DNA linear- result, the signal strengths of wild-type lanes in Fig. 4 are as ized and 3' end labeled at the BamHI site . Figure 3B shows strong as those of the mutant. Figure 4A shows an example the results of these analyses. As describedI above, this probe of a nuclease Si hybridization experiment in which the 1464 SADOFSKY AND ALWINE MOL. CELL. BIOL.

A B

wt-776 d I- 1465 M T - + T N N C C dl-1465 - dI-1465 1263 3'end nuc s. T N C T II'~~~~~nucs. 3'end 540 540 -2980 -!n -2980 373 -~~~A444 -2 950 300 -2880 . NN . 2848 300 _ -2880 227 2794 284 8

245 0 179

2794 s 2728

179

2 728

FIG. 4. Characterization of extended transcripts. (A) Total cellular RNA (T) from sit776- or dl1465-infected cells, as well as polyadenylated (+) and nonpolyadenylated (-) fractions, were analyzed by the nuclease SI procedure. In addition, the d11465 RNA was further fractionated into nuclear (N) and cytoplasmic (C) components. The DNA probe was the homologous viral DNA 3' end labeled at the BamHl site. The amount of wild-type RNA used in this analysis was increased severalfold to detect the low abundance of wild-type extended transcripts. This procedure provided bands of similar signal intensity between the i-t776 and dc1465 samples. Lines connect wt776 and d11465 bands which have coterminal 3' ends but which differ in absolute size by the size of the dl1465 deletion. The nucleotide positions of these 3' ends are noted on the right. Lanes M are the nucleotide size markers. (B) Extended transcripts in RNA of cells infected with d11465 or the dou- ble deletion mutant dl1465-1263 are compared by nuclease S1 analysis with homologous viral DNA probes 3' end labeled at the BaimHI site. Lines connect bands with coterminal 3' ends which differ in absolute size by the size of the d/1263 deletion.

extended transcript bands were sized with the BainHI linear The ends mapping at nucleotides 2794 and 2848 lie within a probe. In other experiments (not shown), the samples were region of strong potential secondary structure and may electrophoresed next to a sequence ladder of DNA end generate the 5' end of the SAS RNA, discussed below. The labeled at the BarmHI site, enabling us to map the 3' ends to 3' end mapping between nucleotides 2980 and 2985 lies the nucleotide. In these detailed experiments, more discrete within a region containing a possible polyadenylation hexa- ends could be detected. We have mapped and further analyzed only the major bands indicated in Fig. 4A. The ends of these transcripts mapped at nucleotides 2728, 2794, TABLE 1. Relative abundance of extended transcripts' and 2848. In addition, a fourth band is mapped less precisely, Virus S% Viral late between nucleotides 2980 and 2985. These positions are Virus ~~~RNA indicated in Fig. 4A and shown schematically in Fig. 1B. vit776 ...... 0.1 Additional minor bands as well as significant hyridization at d1882 ...... 0.5 higher molecular weights can also be noted, but will not be d1l453 ...... 6.3 discussed further here. d1l455 ...... 6.0 The nucleotide sequences in the regions of the mapped 3' d1l457 ...... 6.2 ends of the extended transcripts were examined to determine d1l465 ...... 3.0 whether the specific 3' ends indicated possible processing or d1l265 ...... 2.3 termination sites or possible regions rich in dA-rU base " Quantitative densitometry of multiple exposures of Fig. 3B was used to pairing, which would produce artifacts in the Si analysis. determine the relative abundance of extended transcripts from the deletion Such examination indicates that the 2728 end is within a mutants as compared with the wild type. The raw data were corrected for differences in input RNA levels, as calculated by densitometric analysis of region of dA-rU richness and is therefore probably an Fig. 3A. and in the specific activities of the various probes. The fraction of artifact of the analysis. However, the other ends cannot be viral late RNA represented as extended transcripts is based on data which accounted for by artifactual mechanisms. estimate the wild-type level of extended transcripts at 0.1% (not shown). VOL. 4, 1984 EFFICIENCY OF POLYADENYLATION 1465 nucleotide. We show that this alternative polyadenylation one side of the stem and the entire loop of the secondary signal is utilized by the mutants. structure, which we suggest determines the site of the Secondary structure determines the processing of the 5' end cleavages. Thermodynamic calculations on the resulting of the SAS RNA. By using computer sequence analysis, we RNA predict that the secondary structure would be unstable have generated a proposed secondary structure that maxi- above the bifurcation shown in Fig. 5. Although this deletion. mizes base pairing for the RNA that spans the region from removes none of the SAS RNA sequences, it is known to base 2760 to 2930 (Fig. 5). The calculated thermodynamic greatly lower or abolish production of the SAS RNA in stability (39) of this structure is -47 kcal (- 196.6 kJ). Within dl1263-infected cells (6). We introduced the dll263 deletion this proposed secondary structure are located two of the into a d11465 background so that we would be able to precisely mapped 3' ends of the extended transcripts, 2794 produce increased amounts of extended transcripts which and 2848, exactly opposite each other. In addition, the site at include the d11263 defect. This double deletion allows us to base 2848 is located within a few bases of the approximated determine whether the lack of SAS RNA correlated with a 5' end of the SAS RNA (6), a 64-nucleotide RNA (4, 5, 6, 30) failure to generate the extended transcripts with 3' ends at previously shown to be processed from the extended late 2794 and 2848. Total RNA from cells infected with the virus transcripts (4). These correlations may indicate the sites d11465-1263 was analyzed by hybridizing with a homologous utilized by the cleavage mechanism which generates the SAS DNA probe labeled at the BamHI site. Similar RNA from RNA. To obtain further evidence that the secondary struc- d11465-infected cells was analyzed with a d11465 DNA ture plays a functional role in the processing at this site, we probe. Figure 4B shows the results; the lane of total d11465 introduced the d11263 (13, 14) deletion into this region. RNA (T) should be compared with the dl1465-1263 lane. It is dl1263 is a 33-base viable deletion which is mapped graphi- apparent that the 3' end mapping to 2794 has been complete- cally in Fig. 1 and shown at the nucleotide level as the ly eliminated by the presence of the d11263 deletion and that overlined sequence in Fig. 5. This deletion removes much of the 2848 3'end has been greatly reduced in quantity relative

293 I I 2810 , 83 4A~ 2842840 ,

2800 SAS-RNA % ~%tt2802860.eol

2790 U*G G-C 2900 2890 U U C-G R-U Rq --G-C 2780 C A.RU R-U G-C U-R *2910 A-U C C U-A4 A-U 0-C 2770 U-R A-U U C U-R A-U - 2920 G-C U-A C C 0 U 2760. U U A ~~U U FIG. 5. Potential secondary structure within the extended transcript. The sequence of the extended transcript region between nucleotides 2759 and 2928 is shown as a possible secondary structure aligned for maximal base pairing. Asterisks indicate G-U base pairs. Arrows at 2794 and 2848 show cleavage sites discussed in the text. The SAS RNA is predominantly contained within the large loop; its 5' end and 3' ends map at approximately nucleotides 2848 and 2913 (small arrow), respectively. The heavy line outlines the sequences deleted in mutant d/1263. 1466 SADOFSKY AND ALWINE MOL. CELL. BIOL.

to the parallel dl1465 lanes. Since the deletion reduces the preliminary upper limit to the region in which additional stability of the potential secondary structure without remov- signal elements may exist. Since our experiments in viruses ing the specific sites of cleavage, we suggest that the were limited to mutants which remained viable, we feel that reduction in cleavage demonstrates that the secondary struc- our data indicate the existence of a major element in the ture serves as a signal for the processing events at sites 2794 polyadenylation signal, but at this point they do not precisely and 2848 and is therefore responsible for the formation of the locate it, since its total deletion may be inviable. We 5' end of the SAS RNA. therefore predict that a major element of the polyadenylation Alternative polyadenylation site for late RNA. We exam- signal may exist between 60 and 104 nucleotides downstream ined the extended transcripts of both wild-type RNA and of the AAUAAA for SV40 late polyadenylation. This area is dl1465 by oligodeoxythymidylate cellulose chromatography presently being studied under conditions which do not followed by nuclease S1 analysis to determine whether any require viral viability. of the distinct 3' ends were polyadenylated. In the case of Extended late transcripts are produced due to deletions dl1465, the RNA was also fractionated into separate nuclear which affect efficient late RNA polyadenylation. Cells infected and cytoplasmic pools (Fig. 4, lanes marked N or C). Each with the deletion mutants which affected the efficiency of RNA preparation was hybridized to a homologous BamHI late polyadenylation were found to contain enriched levels of probe. It is evident (Fig. 4A) that the bulk of the nuclear extended late transcripts. These transcripts appear to have extended transcripts do not partition specifically by fraction- increased half-lives due to the inefficient cleavage at the ation on the oligodeoxythymidylate column (lanes marked normal polyadenylation site. Extended transcripts of the plus [+] are the RNAs retained on the column), indicating SV40 late region have been reported previously (3, 29, 48); that they do not have extensive polyadenylic acid tails and however, the increased amount of these transcripts present may be partially retained by internal adenylate tracts. How- in the mutant-infected cells allowed us to easily map and ever, a band mapping between nucleotides 2980 and 2985 in characterize their distinct 3' ends. For the transcripts whose the d11465 RNA is specifically retained in the nuclear 3' ends were studied, we found that cleavage can account for fraction and, moreover, is detected in relative abundance in their formation. This is discussed below. the cytoplasmic polyadenylated fraction (Fig. 4A, lane Extended transcripts reveal an alternative late polyadenyla- dl1465 C+). No equivalent polyadenylated species is appar- tion site. One of the extended transcripts we characterized is ent in the corresponding wild-type RNA lane, despite the itself polyadenylated at an alternative site ca. 300 nucleo- fact that other nonpolyadenylated extended transcripts can tides downstream from the normal site (Fig. 6). In Fig. 4B, be detected in the wild-type samples under these conditions this alternative site is represented by a distinct band in the of increased RNA in the hybridization mixture. This lack of d11465 cytoplasmic polyadenylic acid lane, with its 3' end utilization of the alternative site in the presence of the wild- mapping to approximately nucleotides 2980 to 2985. In the type normal site is addressed below. Examination of the equivalent wild-type sample, no such band is detected, sequences surrounding nucleotide 2980 reveals two hexanu- despite the fact that other, nonpolyadenylated extended cleotides that could serve as polyadenylation signals: transcripts are seen (in these experiments the amount of AUUAAA at 2964 and ACUAAA at 2994. The mapping data wild-type RNA in the hybridization samples was substantial- suggest that the 2964 signal is utilized. There is precedent for ly increased to detect the low levels of wild-type extended the use of AUUAAA as a polyadenylation signal in adenovi- transcripts). This lack of alternative-site utilization in the rus type 2 E3-1 (42), mouse pancreatic ct-amylase (44), and presence of a wild-type normal site implies either that the chicken lysozyme (27). kinetics of normal-site utilization are so rapid that a less recognizable downstream site is rarely used or that a specific mechanism may exist that scans the entire precursor RNA DISCUSSION and chooses a best site to the exclusion of all other sites. Sequences 3' to AAUAAA affect efficiency of cleavage at the Since the signal AAUAAA can be found internal to mes- polyadenylation site. Current understanding of the mecha- sages, such a scanning model must be able to bypass unused nism of 3' end processing of eucaryotic mRNA features the signals 5' or 3' to the correct end. Merely choosing the first production of a extending well beyond the site from either direction is ruled out. In addition, once a ultimate processed end, followed by cleavage and polyaden- polyadenylation event takes place, the utilization of other ylation at a well-defined site. The hexanucleotide AAUAAA potential sites must be inhibited. For example, in the present is essential in designating this site but is apparently not case the utilization of the alternative polyadenylation site in sufficient to specify cleavage, since it can also exist within messages. Polyadenylation of the SV40 late mRNAs con- forms to this general model. The library of existing SV40 2533 2650 2690 2730 2770 2810 2850 2890 2930 2970 3010 deletion mutants allowed us to explore whether sequences NNII\ 1 BamHlBamHIJ "iL.ATEA,L A,. SASSS 2964 2969 surrounding the AAUAAA contribute to the processing of I the late messages. Using nuclease S1 analysis, we have 2674 Extended Transcript ~~~~~~AUUAAAAUg shown that variable sized deletions within a region between 3 and 60 nucleotides downstream of the AAUAAA decrease -2980-2985 the efficiency of cleavage at the normal late polyadenylation 12aa site. This region includes sequences beyond the actual 66aa cleavage site which would exist only in the precursor RNA. 42aa of the These data begin to define a region on the 3' side FIG. 6. Coding regions within the extended transcript region. AAUAAA which appears to be an additional element of the The major late polyadenylation site is indicated at 2674, and the polyadenylation signal. Previous data of Cole and Santan- alternative site is indicated at ca. 2980 to 2985. Potential coding gelo (15) have shown that the segment of SV40 DNA ending regions within the region between the two polyadenylation sites are 104 base pairs downstream of the late AAUAAA is sufficient noted, including the number of amino acids (aa) encoded. Methio- for efficient late mRNA polyadenylation. These data define a nines are indicated by asterisks. VOL. 4, 1984 EFFICIENCY OF POLYADENYLATION 1467 the mutant transcripts results in a polyadenylated RNA structure (Fig. 5). This deletion would destabilize the struc- which still contains the polyadenylation site at the normal ture but does not disturb the actual cleavage sites or the SAS position. To maintain this species, processing at the normal RNA sequences. It has been shown that the SAS RNA is not position must not occur. This could be accomplished by a produced in d11263-infected cells (6), indicating that the mechanism which either directs only one polyadenylation deletion has affected its 5' end formation. The present data event per RNA or rapidly removes the newly polyadenylated confirm this: by introducing the d1263 deletion into a d1465 RNA from the pool of polyadenylation substrates. The latter background, we show that the resulting extended transcripts mechanism could be accomplished by coupling the poly- are depleted in termini at 2848 and 2794. These results adenylation reaction to the transport apparatus. The possi- suggest that a deletion which alters and destabilizes the bility of such coupling has been suggested by Villareal and putative secondary structure but does not remove the cleav- White (48), who have constructed a late region deletion age sites results in an inability to utilize these sites. At this mutant in which the mutant RNA fails to be both properly time we cannot determine whether the same secondary transported and polyadenylated. The utilization of alterna- structure is involved in the cleavage which generates the 3' tive polyadenylation sites within a single precursor RNA end of the SAS RNA at approximately nucleotide 2913 (Fig. may be a level of control. Many well- 5). The apparent lack of an extended transcript ending at studied genes have multiple polyadenylation sites; for exam- 2913 may imply that the SAS RNA is formed from a ple, the adenovirus major late transcript (19), the mouse precursor cleaved further downstream, or that the 5' cleav- DHFR gene (40), the chicken ovomucoid gene (21), the age always precedes the 3' event, thus rendering the latter mouse alpha-amylase gene (23, 43), and the human and invisible to analysis with our 3'-end-labeled probes. mouse collagen genes (2, 32). In addition, the mouse immunoglobulin M (16) and the chicken vimentin (12) genes are examples which suggest that the use of variable poly- ACKNOWLEDGMENTS adenylation sites may be developmentally regulated or tissue We thank Sherri Adams. Janis Keller. Susan Carswell. Chris specific. The availability of alternative polyadenylation sites Dabrowski, and Libby Blakenhorn for helpful discussion and good in the SV40 late messages, as well as the ability to manipu- nature. Tom Shenk and Chuck Cole for deletion mutants, Edna late their utilization, provides a model system in which the Matta for technical assistance, and Kathryn Vance for moral mechanism of site choice can be studied. The utilization of support. the alternative polyadenylation site in the late mRNAs may This investigation was supported by Public Health Service grant CA28379-04, awarded by the National Cancer Institute, and by allow new coding regions to be expressed. Figure 6 shows Biomedical Science Research Grant S07-RR-05415-22. awarded by the additional region contained in the extended transcript. the Biomedical Research Support Grant Program, Division of Re- As noted, there are three possible coding regions which search Resources. National Institutes of Health. M.S. is a medical could be translated into of 12, 42, or 66 amino acids. scientists trainee supported by Public Health Service grant 5-T32- Since we have noted that this extended transcript does not GM-07170 from the National Institutes of Health. appear to be present in wild type, we might conclude that these coding regions are fortuitous. However, we cannot LITERATURE CITED rule out that this transcript may be produced by the wild 1. Acheson, N. H. 1978. Polyoma giant RNAs contain tandem type under certain conditions; nor can we rule out the repeats of the nucleotide sequence on the entire viral genome. possibility of a wild-type RNA species extending between Proc. Natl. Acad. Sci. U.S.A. 75:4754-4758. nucleotides 2675 and 2985. In other words, a transcript could 2. Aho, S., V. Tate, and H. Boedtker. 1983. Multiple 3' ends of the exist whose 5' end is produced through cleavage events at chicken proa2(1) collagen gene. Nucleic Acids Res. 11:5443- the normal late polyadenylation site and whose 3' end is 5450. formed at the alternate polyadenylation site. 3. Aloni, Y. 1974. Biogenesis and characterization of SV40 and Secondary structure in the extended transcript RNA deter- polyoma RNAs in productively infected cells. Cold Spring mines other sites and the end of the Harbor Symp. Quant. Biol. 39:165-178. cleavage defines 5' SAS 4. Alwine, J. C. 1982. Hybrid selection of small RNAs by using RNA. In addition to the polyadenylated extended transcript, simian virus 40 DNA: evidence that the simian virus 40- Fig. 4 shows several other distinct bands which represent associated small RNA is synthesized by specific cleavage from extended transcripts with discrete, nonpolyadenylated 3' large viral transcripts. J. Virol. 43:987-996. ends. The mapped ends of several of these RNAs are noted 5. Alwine, J. C., R. Dhar, and G. Khoury. 1980. A small RNA on Fig. 4. The ends mapping at nucleotides 2794 and 2848 induced late in simian virus 40 lytic infection can associate with were of particular interest because they occur directly early viral mRNAs. Proc. Natl. Acad. Sci. U.S.A. 77:1379- opposite each other on the predicted secondary structure 1383. that may form within the extended transcripts (Fig. 5). This 6. Alwine, J. C., and G. Khoury. 1980. Simian virus 40-associated structure was determined base and small RNA: mapping on the simian virus 40 genome and by maximizing pairing characterization of its synthesis. J. Virol. 36:701-708. includes a stem previously reported (46). It has been suggest- 7. Apirion, D. 1983. RNA processing in a unicellular microorgan- ed that secondary structure of RNA is involved in RNA ism: implications for eukaryotic cells. Prog. Nucleic Acid Res. processing in both procaryotes and eucaryotes (7, 9). In this Mol. Biol. 30:1-40. regard, several pieces of evidence favor interpreting the 8. Aviv, H., and P. Leder. 1972. Purification of biologically active structure in Fig. 5 as a processing site and the 3' termini that globin mRNA by chromatography on oligothymidylic acid- map within it as the specific sites of cleavage. The 3' end at cellulose. Proc. Natl. Acad. Sci. U.S.A. 69:1408-1412. nucleotide 2848 falls very close to the predicted 5' end of the 9. Balmain, A., L. Frew, G. Cole, R. Krumlauf, A. Ritchie, and SAS RNA (5, 6, 30), a small stable RNA shown to be a G. D. Birnie. 1982. Transcription of repeated sequences of the mouse Bi family in Friend erythroleukaemic cells. Intermolecu- product of processing from late primary transcripts (4). lar duplex formation between polyadenylated and non-poly- Thus, the 3' end mapping at 2848 is likely to represent the adenylated nuclear RNAs. J. Mol. Biol. 160:163-179. cleavage that generates the 5' end of the SAS RNA. This 10. Berk, A. J., and P. Sharp. 1977. Sizing and mapping of early possibility is supported by the analysis of deletion mutant adenovirus mRNAs by gel electrophoresis of S1 endonuclease d11263 (14), which deletes 33 bases within the secondary digested hybrids. Cell 12:721-732. 1468 SADOFSKY AND ALWINE MOL. CELL. BIOL.

11. Birg, F., J. Favaloro, and R. Kamen. 1977. Analysis of polvoma 31. McReynolds, L., B. W. O'Malley, A. D. Nisbet, J. E. Fothergill, virus nuclear RNA by mini-blot hybridization. Proc. Nati. D. Givol, S. Fields, M. Robertson, and G. G. Brownlee. 1978. Acad. Sci. U.S.A. 74:3138-3142. Sequence of chicken ovalbumin mRNA. Nature (London) 12. Capetanaki, Y. G., J. Ngai, C. N. Flvtzanis, and E. Lazarides. 273:723-728. 1983. Tissue-specific expression of two mRNA species tran- 32. Meyers, J. C., L. A. Dickson, W. J. deWet, M. P. Bernard, scribed from a single vimentin gene. Cell 35:411-420. M.-L. Chu, M. D. Liberto, G. Pepe, F. 0. Sangiorgi, and F. 13. Cole, C. N., L. V. Crawford, and P. Berg. 1979. Simian virus 40 Ramirez. 1983. Analysis of the 3' end of the human pro-a2(l) mutants with deletions at the 3' end of the early region are collagen gene. Utilization of multiple polyadenylation sites in defective in adenovirus helper function. J. Virol. 30:683-691. cultured fibroblasts. J. Biol. Chem. 258:10128-10135. 14. Cole, C. N., T. Landers, S. P. Goff, S. Manteuil-Brutlag, and P. 33. Montell, C., E. F. Fisher, M. H. Caruthers, and A. J. Berk. Berg. 1977. Physical and genetic characterization of deletion 1983. Inhibition of cleavage but not polyadenylation by a point mutants of simian virus 40 constructed in vitro. J. Virol. 24:277- mutation in the mRNA 3' consensus sequence AAUAAA. 294. Nature (London) 305:600-605. 15. Cole, C. N., and G. M. Santangelo. 1983. Analysis in Cos-1 cells 34. Nevins, J. R., J. M. Blanchard, and J. E. Darnell. 1980. of processing and polyadenylation signals by using derivatives Transcription units of adenovirus type 2: transcription beyond of the herpes simplex virus type 1 thymidine kinase gene. Mol. the poly(A) addition site in early regions 2 and 4. J. Mol. Biol. Cell. Biol. 3:267-279. 144:377-386. 16. Early, P., J. Rogers, M. Davis, K. Calame, M. Bond, R. Wall, 35. O'Farrell, P. H., E. Kutter, and M. Nakanishi. 1980. A restric- and L. Hood. 1980. Two mRNAs can be produced from a single tion map of the bacteriophage T4 genome. Mol. Gen. Genet. immunoglobulin ,. gene by alternative RNA processing path- 179:421-435. ways. Cell 20:313-319. 36. Perricaudet, M., J.-M. leMoullec, P. Tiollais, and U. Petterson. 17. Fitzgerald, M., and T. Shenk. 1981. The sequence 5'-AAUAAA- 1980. Structure of two adenovirus type 12 transforming poly- 3' forms part of the recognition site for polyadenylation of late peptides and their evolutionary implications. Nature (London) SV40 mRNAs. Cell 24:251-260. 288: 174-176. 18. Ford, J. P., and M.-T. Hsu. 1978. Transcription pattern of in 37. Proudfoot, N., and G. G. Brownlee. 1976. 3' Non-coding region vivo-labeled late simian virus 40 RNA: equimolar transcription sequences in eukaryotic messenger RNA. Nature (London) beyond the mRNA 3' terminus. J. Virol. 28:795-801. 263:211-214. 19. Fraser, N. W., J. R. Nevins, E. Ziff, and J. E. Darnell. 1979. The 38. Radloff, R., W. Bauer, and J. Vinograd. 1967. A dye-bouyant- major late adenovirus type 2 transcription unit: termination is density method for the detection and isolation of closed circular downstream from the last poly(A) site. J. Mol. Biol. 129:643- duplex DNA: the closed circular DNA in HeLa cells. Proc. 656. NatI. Acad. Sci. U.S.A. 57:1514-1521. 20. Fristenskv, B., J. Lis, and R. Wu. 1982. Portable microcomputer 39. Salser, W. 1977. Globin mRNA sequences: analysis of base- software for nucleotide sequence analysis. Nuclear Acids Res. pairing and evolutionary implications. Cold Spring Harbor 10:6451-6463. Symp. Quant. Biol. 42:985-1002. 21. Gerlinger, P., A. Krust, M. LeMeur, F. Perrin, M. Cochet, F. 40. Setzer, D., M. McGrogan, and R. T. Schimke. 1982. Nucleotide Gannon, D. Dupret, and P. Chambon. 1982. Multiple initiation sequence surrounding multiple polyadenylation sites in the and polyadenylation sites for the chicken ovomucoid transcrip- mouse dihydrofolate reductase gene. J. Biol. Chem. 257:5143- tion unit. J. Mol. Biol. 162:345-364. 5147. 2". Groudine, M., A. Larsen, and H. Weintraub. 1981. cx-Globin- 41. Shenk, T. E., J. Carbon, and P. Berg. 1976. Construction and gene switching during the development of chicken embryos: analysis of viable deletion mutants of simian virus 40. J. Virol. expression and chromosome structure. Cell 24:333-344. 18:664-671. 23. Hagenbuchle, O., R. Bovey, and R. A. Young. 1980. Tissue- 42. Stalhandske, P., H. Persson, M. Perricaudet, L. Philipson, and specific expression of mouse ox-amylase genes: nucleotide se- U. Petterson. 1983. Structure of three spliced mRNAs from quence of isoenzyme mRNAs from pancreas and salivary gland. region E3 of adenovirus type 2. Gene 22:157-165. Cell 21:179-187. 43. Tooze, J. (ed.). 1981. Molecular biology of tumor viruses. Part 2. 24. Higgs, D. R., S. E. Y. Goodburn, J. Lamb, J. B. Clegg, D. J. DNA tumor viruses. Cold Spring Harbor Laboratory. Cold Weatherall, and N. J. Proudfoot. 1983. oQ-Thalassemia caused by Spring Harbor. N.Y. a polyadenylation signal mutation. Nature (London) 306:398- 44. Tosi, M., R. A. Young, 0. Hagenbuchle, and U. Schibler. 1981. 400. Multiple polyadenylation sites in a mouse Qt-amylase gene. 25. Hirt, B. 1967. Selective extraction of polyoma DNA from Nucleic Acids Res. 9:2313-2322. infected mouse cell cultures. J. Mol. Biol. 26:365-369. 45. Tullis, R. H., and J. Rubin. 1980. Calcium protects DNase I 26. Hofer, E., and J. E. Darnell. 1981. The primary transcription from proteinase K: a new method for removal of contaminating unit of the mouse ,B-major globin gene. Cell 23:585-593. RNase from DNase 1. Anal. Biochem. 107:260-264. 27. Jung, A., A. E. Sippal, M. Grez, and G. Schutz. 1980. Exons 46. van Heuverswyn, H., C. Cole, P. Berg, and W. Fiers. 1979. encode functional and structural units of chicken lysozyme. Nucleotide sequence analysis of two simian virus 40 mutants Proc. NatI. Acad. Sci. U.S.A. 77:5759-5763. with deletions in the region coding for the carboxyl terminus of 28. Khoury, G., P. Howley, D. Nathans, and M. Martin. 1975. the T antigen. J. Virol. 30:936-941. Posttranscriptional selection of simian virus 40-specific RNA. J. 47. Villarreal, L. P. 1981. A perinuclear extract contains a unique Virol. 15:433-437. set of viral transcripts late in SV40 infection. Virology 113:663- 29. Lai, C. J., R. Dhar, and G. Khoury. 1978. Mapping the spliced 671. and unspliced late lytic SV40 RNAs. Cell 14:971-982. 48. Villarreal, L. P., and R. T. White. 1983. A splice 30. Mark, D., and P. Berg. 1979. A third splice site in SV40 early junction deletion deficient in the transport of RNA does not RNA. Cold Spring Harbor Symp. Quant. Biol. 44:55-62. polyadenylate nuclear RNA. Mol. Cell. Biol. 3:1381-1388.