Quick viewing(Text Mode)

The Interrupted Gene

The Interrupted Gene

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC 14 NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

A self-splicing© Jones in & an Bartlett rRNA of the Learning, large ribosomal LLC subunit. © Kenneth Eward/Photo© Researchers, Jones &Inc. Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION The Interrupted Edited by Donald Forsdyke www © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALEchapter chapterOR DISTRIBUTION outline outline NOT FOR SALE OR DISTRIBUTION

4.1 Introduction • The positions of are usually conserved when homologous are compared between different or- 4.2 An Consists of and Introns ganisms. The lengths of the corresponding introns may • Introns are removed by RNA splicing, which occurs in vary greatly, though. © Jones & Bartlettcis inLearning, individual RNA LLC molecules. © Jones & Bartlett• Introns Learning, usually do not encodeLLC . NOT FOR SALE• ORMutations DISTRIBUTION in exons can affect polypeptide sequence; NOT FOR4.5 SALE ORSequences DISTRIBUTION Under Negative Selection mutations in introns can affect RNA processing and hence may influence the sequence and/or production Are Conserved but Introns Vary of a polypeptide. • Comparisons of related genes in different species show 4.3 that the sequences of the corresponding exons are Exon and Intron Base Compositions Differ usually conserved but the sequences of the introns are • The four “rules”© forJones DNA base & compositionBartlett areLearning, the first LLC much less similar.© Jones & Bartlett Learning, LLC and second parityNOT rules, FOR the clusterSALE rule, OR and DISTRIBUTIONthe GC • Introns evolve muchNOT more FOR rapidly SALE than exons OR DISTRIBUTION rule. Exons and introns can be distinguished on the ­because of the lack of selective pressure to produce a basis of all rules except the first. ­polypeptide with a useful sequence. • The second parity rule suggests an extrusion of struc- 4.6 tured stem-loop segments from duplex DNA, which Exon Sequences Under Positive Selection Vary would be greater in introns. but Introns Are Conserved © Jones• The rules & relateBartlett to genomic Learning, characteristics, LLC or ©• UnderJones positive & Bartlett selection an Learning, individual with LLC an NOT“ ­pressures,”FOR SALE that constitute OR DISTRIBUTION the phenotype. NOT­advantageous FOR SALE mutation OR survives DISTRIBUTION (i.e., is able to 4.4 ­produce more fertile progeny) relative to others Organization of Interrupted Genes May Be ­without the mutation. ­Conserved • Due to intrinsic genomic pressures, such as that which • Introns can be detected when genes are compared with conserves the potential to extrude stem-loops from du- their RNA transcription products by either ­restriction plex DNA, introns evolve more slowly than exons that © Jones & Bartlettmapping, Learning, electron microscopy,LLC or . © Jones & Bartlettare under Learning, positive selection LLC pressure. NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

8181 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. © Jones & Bartlettchapter Learning, outline, continuedLLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION 4.7 Genes Show a Wide Distribution of Sizes Due • The exons of some genes appear homologous to ­Primarily to Intron Size and Number Variation the exons of others, suggesting a common exon ­ancestry. • Most genes are uninterrupted in Saccharomyces ­cerevisiae but are ­interrupted© in Jonesmulticellular & .Bartlett Learning, LLC4.10 Members of a Gene Family© Jones Have a Common& Bartlett Learning, LLC • Exons are usually short,NOT typically FOR encodingSALE fewerOR thanDISTRIBUTION ­Organization NOT FOR SALE OR DISTRIBUTION 100 amino acids. • A set of homologous genes should share common • Introns are short in unicellular/oligocellular eukary- ­features that preceded their evolutionary separation. otes but can be many kilobases (kb) in multicellular • All globin genes have a common form of organization ­eukaryotes. with three exons and two introns, suggesting that they • The overall length of a gene is determined largely by are descended from a single ancestral gene. ©its Jones introns. & Bartlett Learning, LLC • Intron© Jones positions & in Bartlettthe gene Learning, family are highly LLC 4.8 SomeNOT DNA FOR Sequences SALE Encode OR DISTRIBUTIONMore Than One variable,NOT which FOR suggests SALE that OR introns DISTRIBUTION do not separate functional domains. Polypeptide 4.11 There Are Many Forms of Information in DNA • The use of alternative initiation or termination codons allows multiple variants of a polypeptide chain. • Genetic information includes not only that related to • Different polypeptides can be produced from the same characters corresponding to the conventional pheno- © Jones & Bartlettsequence Learning, of DNA when theLLC mRNA is read in different © Jones & Bartletttype, but also Learning, that related to LLC characters (pressures) NOT FOR SALEreading OR frames DISTRIBUTION (as two overlapping genes). NOT FOR SALEcorresponding OR DISTRIBUTION to the genome “phenotype.” • Otherwise identical polypeptides, differing by the • In certain contexts, the definition of the gene can be presence or absence of certain regions, can be gener- seen as reversed from “one gene–one ” to “one ated by differential (alternative) splicing when certain protein–one gene.” exons are included or excluded. This may take the form • Positional information may be important in of including or excluding individual exons, or of choos- ­development. ing between alternative© Jones exons. & Bartlett Learning, LLC • Sequences transferred ©“horizontally” Jones & from Bartlett other Learning, LLC 4.9 Some Exons Can BeNOT Equated FOR with SALE Protein ORFunctional DISTRIBUTION ­species to the germlineNOT could FORland in intronsSALE or OR DISTRIBUTION ­intergenic DNA and then transfer “vertically” through Domains the generations. Some of these sequences may be • Proteins can consist of independent functional mod- ­involved in ­intracellular nonself-recognition. ules, the boundaries of which, in some cases, can be 4.12 Summary equated with those of exons. © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

is ­operational with respect to phenotype. 4.1 Introduction ­Protein-encoding sequences can be inter- The simplest form of a gene is a length of DNA rupted, as can the 59 and 39 sequences (UTRs) © Jones & Bartlett Learning,that directly LLC corresponds to its polypeptide© Jones & Bartlettthat flank Learning, the ­protein-encoding LLC sequences NOT FOR SALE OR DISTRIBUTION­product. Bacterial genes are almost NOTalways FOR of SALEwithin OR mRNA. DISTRIBUTION The interrupting sequences this type, in which a continuous sequence of are removed from the primary (RNA) 3N bases encodes a polypeptide of N amino ­transcript (or ­pre-mRNA) during gene acids. However, in eukaryotes ribosomal expression, generating an mRNA that includes (rRNAs), transfer RNAs (tRNAs), and mes- a continuous base sequence ­corresponding to senger© Jones RNAs (mRNAs)& Bartlett are Learning, first synthesized LLC the polypeptide product© Jones as determined & Bartlett by Learning,the LLC as longNOT precursor FOR SALE transcripts OR DISTRIBUTION that are subse- . The sequencesNOT FOR of SALEDNA compris OR DISTRIBUTION- quently shortened (see the chapter titled RNA ing an interrupted protein-encoding gene are ­Splicing and Processing). Thus eukaryotic genes divided into the two categories­ depicted in are much longer than the functional transcripts FIGURE 4.1: they produce. It is reasonable to assume that • Exons are the sequences retained in the © Jones &the Bartlett shortening Learning, involved LLC a trimming of addi- © Jonesmature & RNABartlett product. Learning, A mature LLC tran- NOT FOR SALEtional, perhapsOR DISTRIBUTION regulatory, sequences at the NOT scriptFOR SALEstarts and OR ends DISTRIBUTION with exons that 59 and/or 39 ends of transcripts, leaving the correspond to the 59 and 39 ends of the rRNA or protein-encoding­ sequence of the RNA. precursor intact. • Introns are the intervening sequences However, as it happens a eukaryotic that are removed when the primary © Jones & Bartlett Learning,gene can LLC include additional sequences© Jones that & BartlettRNA Learning, transcript LLCis processed to give the NOT FOR SALE OR DISTRIBUTIONlie both within and outside the regionNOT thatFOR SALE ORmature DISTRIBUTION RNA product.

82 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. The exon sequences are in the same order in EXON 1 EXON 2 EXON 3 © Jones the& Bartlett gene and Learning,in the RNA, butLLC an interrupted DNA© Jones &INTRON Bartlett 1 Learning,INTRON LLC 2 NOT FORgene SALE is longer OR DISTRIBUTION than its mature RNA product NOT FOR SALE OR DISTRIBUTION because of the presence of the introns. RNA synthesis pre-mRNA The processing of interrupted genes INTRON 1 INTRON 2 requires an additional step that is not necessary­ in uninterrupted genes.© Jones The DNA & ofBartlett an inter -Learning, LLC Splicing© Jones removes & Bartlett Learning, LLC rupted gene is transcribed to an RNA copy introns (a transcript) that is exactlyNOT complementaryFOR SALE OR to DISTRIBUTIONmRNA NOT FOR SALE OR DISTRIBUTION the original DNA sequence. This RNA is only a precursor, though; it cannot yet be used to Protein synthesis produce a polypeptide. First, the introns must Protein be removed© Jones from & the Bartlett RNA to giveLearning, a messenger LLC © Jones & Bartlett Learning, LLC RNA that consists only of a series of exons. This processNOT is called FOR RNA SALE splicing OR (seeDISTRIBUTION the chapter NOT FOR SALE OR DISTRIBUTION Length of precursor RNA (not mRNA) defines region of gene titled Genes Encode RNAs and Polypeptides) and involves precisely deleting the introns from the and then joining the ends of Individual coding regions are separated in gene © Jones &the Bartlett RNA on eitherLearning, side of LLCeach intron to form FIGURE© Jones 4.1 Interrupted & Bartlett genes are Learning, expressed via LLCa precursor RNA. a covalently intact molecule (see the chapter Introns are removed when the exons are spliced together. The mRNA NOT FOR SALE OR DISTRIBUTION has NOTonly the FOR sequences SALE of the OR exons. DISTRIBUTION titled RNA Splicing and Processing). The original eukaryotic gene comprises the region in the genome between points Genomic DNA ­corresponding to the 59 and 39 terminal bases Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 of mature RNA. We ©know Jones that &transcription Bartlett Learning, A LLC B © Jones & BartlettC Learning, LLC starts at the DNA templateNOT FOR corresponding SALE OR to DISTRIBUTION NOT FOR SALE OR DISTRIBUTION the 59 end of the mRNA and usually extends A–B B–C beyond the complement to the 39 end of the mRNA mature RNA, which is generated by cleavage Exon 1 Exon 2 Exon 3 A B C of the 39 extension. The gene is also considered to include© Jones the regulatory & Bartlett regions Learning, on both sides LLC A–B © Jones B–C & Bartlett Learning, LLC of the gene that are required for the initiation NOT FOR SALE OR DISTRIBUTION FIGURE 4.2 Exons remainNOT in theFOR same SALE order in OR mRNA DISTRIBUTION as in DNA, but and (sometimes) termination of transcription. distances along the gene do not correspond to distances along the mRNA or polypeptide products. The distance from A–B in the gene is smaller than the distance from B–C, but the distance from A–B in the mRNA (and 4.2 An Interrupted Gene polypeptide) is greater than the distance from B–C. © Jones & BartlettConsists Learning, of Exons LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION (as ­NOTdetermined FOR bySALE recombination OR DISTRIBUTION analysis) do and Introns not correspond to the distances between sites in the processed mRNA. The length of a gene is Key concepts defined by the length of the primary mRNA tran- • Introns are removed by RNA splicing, which occurs script instead of the length of the mature mRNA. in cis in individual RNA molecules. © Jones & Bartlett Learning,All exons LLC of a gene are on one RNA ©molecule, Jones & Bartlett Learning, LLC • Mutations in exons can affect polypeptide and their splicing together is an intra­ molecular sequence; mutations inNOT introns FOR can affect SALE RNA OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION processing and hence may influence the sequence reaction. There is usually no joining of exons and/or production of a polypeptide. carried by different RNA molecules, so there is rarely cross-splicing of sequences. (However, How does the existence of introns change our in a process known as trans-splicing, sequences view© of Jones the gene? & BartlettDuring splicing, Learning, the exons LLC from different mRNAs© Jones are ligated & Bartletttogether into Learning, LLC are alwaysNOT FORjoined SALE together OR in DISTRIBUTIONthe same order a single molecule forNOT translation.) FOR SALE OR DISTRIBUTION they are found in the original DNA, so the cor- Mutations that directly affect the sequence respondence between the gene and polypeptide of a polypeptide must occur in exons. What sequences is maintained. FIGURE 4.2 shows that are the effects of mutations in the introns? the order of exons in a gene remains the same The introns are not part of the mature mRNA, © Jones &as Bartlettthe order ofLearning, exons in the LLC processed mRNA, so mutations© Jones in& Bartlettthem cannot Learning, directly affectLLC NOT FORbut SALE the distancesOR DISTRIBUTION between sites in the gene theNOT polypeptide FOR sequence.SALE OR However, DISTRIBUTION they may

4.2 An Interrupted Gene Consists of Exons and Introns 83 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. affect the processing of the mRNA production equal amounts of A and T, and equal amounts © Jones & Bartlett Learning,by inhibiting LLC the splicing of exons. A© mutation Jones & Bartlettof C and G, Learning, in each single LLC strand of the duplex. NOT FOR SALE OR DISTRIBUTIONof this sort acts only on the allele thatNOT carries FOR it. SALELike the OR first DISTRIBUTION parity rule, this extends to oligo- Mutations that affect splicing are usually nucleotide sequences: For example, in a very deleterious. The majority are single-base sub- long strand there are approximately equal num- stitutions at the junctions between introns and bers of AC and TG dinucleotides. The reasons for the existence of this rule are not clear, but exons.© Jones They may & causeBartlett an exon Learning, to be left outLLC of © Jones & Bartlett Learning, LLC the product, cause an intron to be included, or sequencing of many has shown it to be makeNOT splicing FOR occur SALE at a different OR DISTRIBUTION site. The most nearly universally NOTtrue. TheFOR ­second SALE parity OR rule DISTRIBUTION common outcome is a termination codon that applies more closely to introns than to exons, shortens the polypeptide sequence. Thus, intron partly due to a further rule—that purines tend mutations may affect not only the ­production to cluster on one DNA strand and pyrimidines tend to ­cluster on the other. This cluster rule as © Jones &of Bartlett a polypeptide, Learning, but also LLC its sequence. About © Jones & Bartlett Learning, LLC 15% of the point mutations that cause human applied to exons is that the purines, A and G, NOT FOR diseasesSALE ORdisrupt DISTRIBUTION splicing. tendedNOT to FOR be ­clustered SALE in OR one DISTRIBUTION DNA strand of the DNA duplex (usually the nontemplate strand) Some eukaryotic genes are not interrupted and these are complemented by clusters of the and, like prokaryotic genes, correspond directly pyrimidines, T and C, in the template strand. with the polypeptide product. In the yeast The fact that in single-stranded DNA an Saccharomyces­ cerevisiae, most genes are unin- © Jones & Bartlett Learning, LLC © Jones & Bartlettoligonucleotide Learning, is accompanied LLC in series by terrupted. In multicellular eukaryotes most NOT FOR SALE OR DISTRIBUTION NOT FOR SALEequal quantitiesOR DISTRIBUTION of its reverse complementary genes are interrupted, and the introns are usu- oligonucleotide suggests that duplex DNA has ally much longer than exons, so that genes are the potential to extrude folded stem-loop struc- considerably larger than their coding regions. tures, the stems of which can display base parity and the loops of which can display some degree © Jones & Bartlett Learning, LLC of base clustering. Indeed,© Jones the potential& Bartlett for such Learning, LLC 4.3NOT Exon FOR andSALE Intron OR DISTRIBUTION Base secondary structureNOT is found FOR to SALE be greater OR DISTRIBUTIONin Compositions Differ introns than exons, especially in exons under positive selection pressure (see the section titled Key concepts Exon Sequences Under Positive Selection Vary but • The four “rules” for DNA base composition are Introns Are Conserved later in this chapter). © Jones & Bartlettthe first and Learning, second parity LLC rules, the cluster rule, © Finally,Jones there& Bartlett is the GC Learning, rule, which LLC is that NOT FOR SALEand the OR GC rule.DISTRIBUTION Exons and introns can be distin- theNOT overall FOR proportion SALE OR of GDISTRIBUTION1C in a genome guished on the basis of all rules except the first. (GC content) tends to be a species-specific ­character • The second parity rule suggests an extrusion of (although individual genes within that genome structured stem-loop segments from duplex DNA, tend to have distinctive values). The GC con- which would be greater in introns. tent tends to be greater in exons than in introns. • The rules relate to genomic characteristics, or ­Chargaff’s four rules are seen to relate to charac- © Jones & Bartlett Learning,“pressures,” LLC that constitute the genome ­phenotype.© Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALEters or “pressures”OR DISTRIBUTION that are intrinsic to the genome, contributing to what was termed the genome In the 1940s initiated ­studies of ­phenotype (see the ­section titled There Are Many DNA base composition that led to four “rules,” Forms of Information in DNA later in this chapter). beginning with the first parity rule for duplex DNA (see the chapter titled Genes Are DNA). This rule applies© Jones to most ®ions Bartlett of DNA, Learning, including LLCboth © Jones & Bartlett Learning, LLC 4.4 Organization of exonsNOT and FOR introns. SALE Base AOR in oneDISTRIBUTION strand of the NOT FOR SALE OR DISTRIBUTION duplex is matched by a complementary base (T) Interrupted Genes in the other strand, and base G in one strand of the duplex is matched by a complementary base May Be Conserved (C) in the other strand. By extension, the rule Key concepts © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC applies not only to single bases, but also to dinu- • Introns can be detected when genes are compared NOT FOR SALEcleotides, OR trinucleotides, DISTRIBUTION and oligonucleotides. NOTwith their FOR RNA transcriptionSALE OR products DISTRIBUTION by either restric- Thus, GT pairs with its reverse complement AC, tion mapping, electron microscopy, or ­sequencing. and ATG pairs with its reverse complement CAT. • The positions of introns are usually conserved In addition to the well known first parity rule, when homologous genes are compared between later work by Chargaff led him to propose a sec- different organisms. The lengths of the corre- © Jones & Bartlett Learning,ond parity LLC rule. The little-known second© Jones parity & Bartlettsponding Learning, introns may vary LLC greatly. NOT FOR SALE OR DISTRIBUTIONrule is that, to a close approximation,NOT there FORare SALE• Introns OR usually DISTRIBUTION do not proteins.

84 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. When a gene is uninterrupted, the restriction protists, and one metazoan (a sea anemone), © Jones map& Bartlett of its DNA Learning, corresponds LLC with the map of and© in Jones chloroplast & Bartlett genes. Genes Learning, with introns LLC NOT FORits SALE mRNA. OR When DISTRIBUTION a gene possesses an intron, haveNOT been FOR found SALE in every OR DISTRIBUTIONclass of eukary- the map at each end of the gene corresponds to otes, archaea, bacteria, and bacteriophages, the map at each end of the message sequence. although they are extremely rare in ­prokaryotic Within the gene, however, the maps diverge genomes. because additional regions© Jones that &are Bartlett found in Learning, Some LLC interrupted genes have ©only Jones one & Bartlett Learning, LLC the gene are not represented in the mature or a few introns. The globin genes provide a mRNA. Each such regionNOT correspondsFOR SALE to OR an DISTRIBUTIONmuch studied example (see the section­ NOT titled FOR SALE OR DISTRIBUTION intron. The example in FIGURE 4.3 compares Members of a Gene ­Family Have a Common Orga- the restriction maps of a b-globin gene and its nization later in this chapter). The two general mRNA. There are two introns, each of which classes of globin gene, a and b, share a common contains© Jones a series & of Bartlett restriction Learning, sites that LLCare organization. They originated© Jones from & Bartlett an ancient Learning, LLC absent from the cDNA. The pattern of restric- gene duplication event and are described as tion sitesNOT in FOR the exons SALE is the OR same DISTRIBUTION in both the paralogous­ genes,NOT or paralogs FOR SALE. The consis OR -DISTRIBUTION cDNA and the gene. The finer comparison of tent structure of mammalian globin genes is the base sequences of a gene and its mRNA per- evident from the “generic” globin gene pre- mits precise identification of introns. An intron sented in FIGURE 4.4. © Jones &usually Bartlett has no Learning, open reading LLC frame. An intact ©Introns Jones are & foundBartlett at homologousLearning, LLCposi- reading frame is created in an mRNA sequence tions (relative to the coding sequence) in all NOT FORby SALE the removal OR DISTRIBUTION of the introns from the primary knownNOT active FOR globin SALE genes, OR including DISTRIBUTION those of transcript. ­mammals, birds, and frogs. Although intron The structures of eukaryotic genes show lengths vary, the first intron is always fairly extensive variation. Some genes are uninter- short and the second is usually longer. Most rupted and their sequences© Jones are colinear& Bartlett with Learning, of the variationLLC in the lengths of© different Jones & Bartlett Learning, LLC those of the corresponding mRNAs. Most ­globin genes results from length variation ­multicellular eukaryoticNOT genes FOR are ­intSALEerrupted, OR DISTRIBUTIONin the ­second intron. For example,NOT the secFOR- SALE OR DISTRIBUTION but the introns vary enormously in both ond intron in the mouse a-globin gene is ­number and size. only 150 bp of the total 850 bp of the gene, Genes encoding polypeptides, rRNA, or whereas the homologous intron in the mouse tRNA© may Jones all have & Bartlett introns. Learning,Introns also LLCare major b-globin gene© Jones is 585 bp& Bartlettof the total Learning, LLC found in mitochondrial genes of plants, fungi, 1382 bp. The difference in length of the genes NOT FOR SALE OR DISTRIBUTION is much greater thanNOT that FOR of theirSALE mRNAs OR DISTRIBUTION (a-globin mRNA 5 585 bases; Genomic DNA Sites cleaved by restriction b-globin mRNA 5 620 bases). Exon 1 Intron 1 Exon 2 Intron 2 Exon 3 The example of dihydrofo- late reductase (DHFR), a some- © Jones & Bartlett Learning, LLC © Joneswhat & Bartlett larger gene, Learning, is shown LLC in NOT FOR SALE OR DISTRIBUTION NOT FORFIGURE­ SALE 4.5 OR. The DISTRIBUTION mammalian DHFR gene is organized­ into cDNA six exons that correspond to a cDNA map corresponds to ­2000-base mRNA. The gene itself exon1 + exon2 + exon3 of genomic map is long because the introns are © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC FIGURE 4.3 Comparison of the restriction maps of cDNA and genomic DNA very long. In three mammal spe- for mouse b-globin shows thatNOT the geneFOR has SALE two introns OR that DISTRIBUTION are not present­ cies the exons are essentiallyNOT FORthe SALE OR DISTRIBUTION in the cDNA. The exons can be aligned exactly between cDNA and the gene. same and the relative positions of

Intron length 116–130 573–904 © Jones & BartlettExon 1 Intron Learning, 1 Exon 2 LLC Intron 2 © JonesExon & Bartlett3 Learning, LLC NOTExon FOR length SALE 142–145 OR DISTRIBUTION222 NOT FOR216–255 SALE OR DISTRIBUTION

Contains 5 UTR Amino acids Coding 105–end + coding 1–30 31–104 + 3 UTR © Jones & BartlettFIGURE Learning, 4.4 All functional LLC globin genes have an interrupted© Jones structure & Bartlett with three Learning,exons. The LLC NOT FOR SALE lengthsOR DISTRIBUTION indicated in the figure apply to the mammalianNOT b-globin FOR genes. SALE OR DISTRIBUTION

4.4 Organization of Interrupted Genes May Be Conserved 85 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. ␤–globin Major © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC 5Ј Exon Exon Exon 3Ј NOT FOR SALE OR DISTRIBUTION NOT FOR SALE5 ORЈ DISTRIBUTION1 2 3

Exon 1

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC Exon 2 NOT1 2 FOR 3 SALE 4 5 OR DISTRIBUTION6 Exons NOT FOR SALE OR DISTRIBUTION

0 5 10 15 20 25 30 kb FIGURE 4.5 Mammalian genes for DHFR have the same relative organization of rather short exons and very long ␤ –globin Minor © Jones &introns, Bartlett but vary Learning, extensively inLLC the lengths of introns. © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Exon 3

the introns are unaltered, but the lengths of 3Ј individual introns vary extensively, resulting © Jones & Bartlett Learning, LLC © Jones & BartlettFIGURE 4.6 Learning, The sequences LLC of the mouse bmaj- and bmin- in a ­variation in the length of the gene from globin genes are closely related in coding regions but differ NOT FOR SALE OR DISTRIBUTION25 to 31 kb. NOT FOR SALEin the flanking OR DISTRIBUTION UTRs and the long intron. Data provided by The globin and DHFR genes are examples , . of a general phenomenon: genes that share a common­ ancestry have similar organizations with conservation of the positions (of at least some) of © Jones & Bartlett Learning, LLC both copies, with© substitutions Jones & Bartlett restricted Learning, in LLC the introns. the exons by the need to encode a functional NOT FOR SALE OR DISTRIBUTION polypeptide. NOT FOR SALE OR DISTRIBUTION As we will see in the chapter titled Genome 4.5 Exon Sequences Under Evolution, where we consider the evolution of Negative Selection the genome, exons can be considered basic © Jones & Bartlett Learning, LLC building© Jones blocks & thatBartlett may be Learning, assembled inLLC vari- Are Conserved but ous combinations. It is possible for a gene to NOT FOR SALE ORIntrons DISTRIBUTION Vary haveNOT some FOR exons SALE related OR to DISTRIBUTION those of another gene, with the remaining exons unrelated. Key concepts ­Usually, in such cases, the introns are not • Comparisons of related genes in different species related at all. Such homologies between genes show that the sequences of the corresponding may result from duplication and translocation © Jones & Bartlett Learning,exons areLLC usually conserved but the sequences© Jones of & Bartlettof individual Learning, exons. LLC NOT FOR SALE OR DISTRIBUTIONthe introns are much less similar. NOT FOR SALEThe OR homology DISTRIBUTION between two genes can be • Introns evolve much more rapidly than exons plotted in the form of a dot matrix compari- because of the lack of selective pressure to pro- duce a polypeptide with a useful sequence. , as in FIGURE 4.6. A dot is placed in each position that is identical in both genes. The Is a single-copy completely dots form a solid line on the diagonal of the unique© Jones among & other Bartlett genes Learning, in its genome? LLC matrix if the two© sequences Jones & are Bartlett completely Learning, LLC TheNOT answer FOR depends SALE on OR how DISTRIBUTION “completely identical. If they NOTare not FOR identical, SALE the OR line DISTRIBUTION unique” is defined. Considered as a whole, the is broken by gaps that lack homology and gene is unique, but its exons may be related is displaced laterally or vertically by nucle- to those of other genes. As a general rule, otide deletions or insertions in one or the other sequence. © Jones &when Bartlett two genesLearning, are related, LLC the relationship © Jones & Bartlett Learning, LLC between their exons is closer than the rela- When the two mouse b-globin genes are NOT FOR SALEtionship OR between DISTRIBUTION their introns. In an extreme comparedNOT FOR in SALEthis way, OR a DISTRIBUTIONline of ­homology case, the exons of two genes may encode the extends through the three exons and the small same polypeptide sequence while the introns intron. The line disappears in the flanking UTRs are different. This situation can result from and in the large intron. This is a typical ­pattern in related genes; the coding sequences and © Jones & Bartlett Learning,the duplication LLC of a common ancestral© Jones gene & Bartlett Learning, LLC f­ollowed by unique base substitutions in areas of introns adjacent to exons retain their NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

86 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. ­similarity, but there is greater divergence in lon- 4.6 © Jones ger& Bartlett introns and Learning, in the regions LLC on either side of © JonesExon & Sequences Bartlett Learning, Under LLC NOT FORthe SALE coding OR sequence. DISTRIBUTION NOTPositive FOR SALE Selection OR DISTRIBUTION Vary The overall degree of divergence between two homologous exons in related genes but Introns Are Conserved ­corresponds to the differences between the Key concepts polypeptides. It is mostly a result of base sub- © Jones & Bartlett Learning,• Under LLCpositive selection an individual with© Jonesan & Bartlett Learning, LLC stitutions. In the translated regions, changes advantageous mutation survives (i.e., is able to in exon sequences areNOT constrained FOR SALE by selec OR- DISTRIBUTIONproduce more fertile progeny) relative toNOT others FOR SALE OR DISTRIBUTION tion against mutations that alter or destroy the without the mutation. function of the polypeptide. In other words, the • Due to intrinsic genomic pressures, such as that exon sequences are conserved by the negative which conserves the potential to extrude stem-loops ­selection of individuals in which the sequences from duplex DNA, introns evolve more slowly than © Jones & Bartlett Learning, LLC exons that are under ©positive Jones selection & Bartlett pressure. Learning, LLC have changed (have not been conserved) to NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION result in a phenotype that is less able to sur- A mutation that confers a more advanta- vive and produce fertile progeny. Many of geous phenotype to an organism, relative to the preserved changes do not affect codon ­individuals in the same population without the meanings because they change a codon into mutation, may result in the preferential survival © Jones &another Bartlett for theLearning, same amino LLC acid (i.e., they (positive© Jones selection &) ofBartlett that organism. Learning, Pathogenic LLC are ­synonymous substitutions). In this case, NOT FOR SALE OR DISTRIBUTION bacteriaNOT are FOR killed SALE by an OR antibiotic, DISTRIBUTION but a bac- the polypeptide will not change and negative terium with a mutation that confers antibiotic selection will not ­operate on the phenotype resistance survives (i.e., is positively selected). conferred by the polypeptide. Similarly, there Mutations conferring­ venom-resistance­ to prey are higher rates of change in untranslated of venomous snakes can result in the positive regions of the gene (specifically, those that are © Jones & Bartlett Learning,selection LLC of that prey relative to its fellows© Jones that & Bartlett Learning, LLC transcribed to the 59 UTR [leader] and 39 UTR succumb to the poison­ (i.e., are negatively [trailer] of the mRNA).NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION selected). Likewise, a snake that, when con- In homologous introns, the pattern of fronted by a venom-resistant prey population, divergence involves both changes in length has a mutation that enhances the power of its (due to deletions and insertions) and base venom, will be positively selected. This can trig- substitutions. Introns evolve much more rap- © Jones & Bartlett Learning, LLC ger an attack–defense© Jones cycle—an & Bartlett“arm’s race” Learning, LLC idly than exons when the exons are under negativeNOT selection FOR SALE pressure. OR When DISTRIBUTION a gene is between two protagonistNOT species.FOR SALE OR DISTRIBUTION compared among different species, there are In such situations the pattern of exon con- instances where its exons are homologous but servation and intron variation seen in genes its introns have diverged so much that very lit- under negative selection can be reversed because tle homology is retained. Although mutations exons evolve faster than introns. Thus, a plot © Jones &in certainBartlett intron Learning, sequences LLC (branch site, splic- similar© Jones to Figure & 4.6Bartlett will have Learning, lines in introns LLC NOT FORing SALE ­junctions, OR DISTRIBUTION and perhaps other sequences andNOT gaps in FOR exons. SALE Another OR way DISTRIBUTION of showing this influencing splicing) will be subject to selec- is to plot base substitutions along the length of tion, most intron mutations are expected to be a gene. FIGURE 4.7 shows a plot of the substitu- selectively neutral. tions observed when two snake venom alkaline In general, mutations occur at the same rate phosphatase genes are compared. The protein- in both exons and introns,© Jones but exon & Bartlettmutations Learning, encoding LLC parts of exons (2, 3, and the first© Jones half of & Bartlett Learning, LLC are eliminated more NOTeffectively FOR by SALE selection. OR DISTRIBUTIONexon 4) have many base substitutionsNOT (i.e., they FOR SALE OR DISTRIBUTION However, because of the low level of functional are varying), whereas the three introns have constraints, introns may more freely accumu- relatively few (i.e., they are conserved). late point substitutions and other changes. What is being conserved in introns? First, Indeed, it is sometimes possible to locate exons intron sequences needed for RNA ­splicing—the in uncharted© Jones sequences & Bartlett by virtue Learning, of their conLLC- 59 and 39 splice sites© andJones the branch & Bartlett site—are Learning, LLC servationNOT relative FOR SALE to introns OR (see DISTRIBUTION the chapter conserved (see the NOTchapter FOR titled SALE RNA Splicing OR DISTRIBUTION The Content of the Genome). From this description and Processing). In addition to these, base order it is all too easy to conclude that introns do not has been adapted to promote the potential of have a sequence-specific function.­ Genes under the duplex DNA in the region to extrude stem– positive selection, however, cast a different­ light loop structures (fold potential). Thus, a plot of © Jones &on Bartlettthe problem. Learning, LLC base© order–dependent Jones & Bartlett fold potentialLearning, along LLC the NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

4.6 Exon Sequences Under Positive Selection Vary but Introns Are Conserved 87 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC–20 100 NOT FOR SALE OR DISTRIBUTION 1 NOT2 FOR SALE3 OR DISTRIBUTION4 80 –15 Substitutions 60 –10 Substitutions 40 © Jones &–5 Bartlett Learning, LLC © Jones & Bartlett Learning, LLC 20 NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

Fold Potential 0 0

–20 5 Fold Potential

5´ UTR 3´ UTR –40 10 © Jones & Bartlett Learning, LLC500 1000 1500© Jones 2000 & Bartlett 2500 Learning, LLC NOT FOR SALE OR DISTRIBUTION Length of SequenceNOT (bases) FOR SALE OR DISTRIBUTION

FIGURE 4.7 The sequences of snake venom phospholipase genes differ in coding regions, but are closely related in introns and flanking regions. Fold potential (here the contribution of base order to the potential to extrude stem–loop structures) is low (more positive) in the protein-encoding © Jones & Bartlett Learning, LLCexons and high (more negative) in© introns. Jones The positions& Bartlett of the fourLearning, exons are shown LLC as numbered boxes. Modified from D. R. Forsdyke, Conservation of Stem-Loop Potential in Introns of Snake NOT FOR SALE OR DISTRIBUTIONVenom Phospholipase A2 Genes: NOTAn Application FOR ofSALE FORS-D OR Analysis, DISTRIBUTION Mol. Biol. Evol., vol. 12(6), pp. 1157–1165, by permission of Oxford University Press.

length© Jones of the gene& Bartlett shows thatLearning, fold potential LLC selection. Indeed, the© Joneslow (more & positive)Bartlett value Learning, LLC (measuredNOT FOR in negative SALE ORunits) DISTRIBUTION is high (more of fold potential in NOTan exon FOR provides SALE evaluation OR DISTRIBUTION negative) in introns, and low (more positive) of the extent to which it has been under posi- in exons (Figure 4.7). This reciprocal relation- tive selection, without the need to compare two ship between substitution frequency and the sequences (the classical way of determining if contribution of base order to fold potential is a selection is positive or negative). © Jones &characteristic Bartlett Learning, of DNA sequences LLC under ­positive © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION 4.7 Genes Show a Wide Distribution of Sizes Due 96% S. cerevisiae Primarily to Intron Size 80 © Jones & Bartlett Learning, LLC © Jones & Bartlettand Learning, Number LLC Variation NOT FOR60 SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION 40 Key concepts • Most genes are uninterrupted in Saccharomyces 20 cerevisiae but are interrupted in multicellular eukaryotes. 30 © JonesD. melanogaster & Bartlett Learning, LLC • Exons are usually short,© Jones typically & encoding Bartlett fewer Learning, LLC than 100 amino acids.

Percent 20 17% NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION • Introns are short in unicellular/oligocellular 10 eukaryotes but can be many kb in multicellular eukaryotes. 15 Mammals • The overall length of a gene is determined largely by its introns. 10 © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC 6% 5 NOT FOR SALE OR DISTRIBUTION FIGURENOT 4.8 FOR compares SALE the OR organization DISTRIBUTION of genes in a yeast, an insect, and mammals. In the yeast 12 10 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 <40 >60 Saccharomyces cerevisiae, the majority of genes Number of exons (.96%) are uninterrupted, and those that have FIGURE 4.8 Most genes are uninterrupted in yeast, but most genes are interrupted exons generally have three or fewer. There are © Jonesin flies & andBartlett mammals. Learning, (Uninterrupted LLC genes have only one exon and are© totaled Jones in & Bartlettvirtually noLearning, S. cerevisiae LLC genes with more than NOT FORthe leftmost SALE column OR in DISTRIBUTION red.) NOT FOR SALEfour exons. OR DISTRIBUTION

88 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. In insects and mammals the situation is © Jones & Bartlett Learning, LLC © 6Jones & Bartlett Learning, LLC reversed. Only a few genes have uninterrupted 5 NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION coding sequences (6% in mammals). Insect 4 genes tend to have a small number of exons, Yeast 3 typically fewer than 10. Mammalian genes are 2 split into more pieces and some have more 1 than 60 exons. Approximately© Jones 50%& Bartlett of mam -Learning, LLC © Jones & Bartlett Learning, LLC malian genes have more than 10 introns. If we 6 ­examine the effect of NOTintron FOR number SALE variation OR DISTRIBUTION5 NOT FOR SALE OR DISTRIBUTION on the total size of genes, we see in FIGURE 4.9 4 Fly that there is a striking difference between yeast 3 and multicellular eukaryotes. The average yeast % exons 2 gene ©is 1.4Jones kb long, & andBartlett very few Learning, are longer than LLC 1 © Jones & Bartlett Learning, LLC 5 kb. The predominance of interrupted genes in multicellularNOT FOR eukaryotes, SALE however,OR DISTRIBUTION means that 6 NOT FOR SALE OR DISTRIBUTION the gene can be much larger than the sum total 5 4 of the exon lengths. Only a small percentage of Human genes in flies or mammals are shorter than 2 3 © Jones &kb, Bartlett and most Learning, have lengths LLC between 5 kb and © 2Jones & Bartlett Learning, LLC 100 kb. The average human gene is 27 kb long. 1 NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION The gene, with a length of 2000 kb, 100 300 500 700 900 is the longest known human gene. Exon length in nucleotides The switch from largely uninterrupted FIGURE 4.10 Exons encoding polypeptides are to largely interrupted genes seems to have ­usually short. occurred with the evolution© Jones of &multicellular Bartlett Learning, LLC © Jones & Bartlett Learning, LLC eukaryotes. In fungi other than S. cerevisiae, the majority of genes NOTare interrupted, FOR SALE but theyOR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION have a relatively small number of exons (,6) FIGURE 4.10 shows that exons encoding and are fairly short (,5 kb). In the fruit fly, stretches of protein tend to be fairly small. In gene sizes have a bimodal distribution—many multicellular eukaryotes, the average exon codes for 50 amino acids, and the general dis- are short© Jones but some & Bartlett are quite Learning, long. With LLCthis © Jones & Bartlett Learning, LLC increase in the length of the gene due to the tribution is consistent with the hypothesis that increasedNOT number FOR SALE of introns, OR DISTRIBUTIONthe correlation genes have evolvedNOT by the FOR gradual SALE addition OR of DISTRIBUTION between and organism complexity exon units that encode short, functionally inde- becomes weak. pendent protein domains (see the Genome Evo- lution chapter). There is no significant difference in the average size of exons in ­different mul- © Jones & Bartlett Learning, LLC ticellular© Jones eukaryotes, & Bartlett although Learning, the size rangeLLC NOT FOR SALE OR DISTRIBUTIONS. cerevisiae is smallerNOT FOR in vertebrates SALE OR for DISTRIBUTIONwhich there are 50 few exons longer than 200 bp. In yeast, there 40 30 are some longer exons that represent uninter- 20 rupted genes for which the coding sequence is 10 intact. There is a tendency for exons containing © JonesD. melanogaster & Bartlett Learning,untranslated LLC 59 and 39 regions to be longer© Jones than & Bartlett Learning, LLC 50 those that encode proteins. 40 NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION 30 FIGURE 4.11 shows that introns vary widely Percent 20 in size among multicellular eukaryotes. (Note 10 that the scale of the x-axis differs from that of 50 Mammals Figure 4.10.) In worms and flies, the ­average 40© Jones & Bartlett Learning, LLC intron is not longer© than Jones the exons. & Bartlett There areLearning, LLC 30 20NOT FOR SALE OR DISTRIBUTION no very long intronsNOT in worms, FOR SALEbut flies OR con DISTRIBUTION- 10 tain many. In vertebrates, the size distribution < 0.5 < 1 < 2 < 5 < 10 < 25 < 50 < 100 > 100 is much wider, extending from approximately Size of gene (kb) the same length as the exons (,200 bp) up to FIGURE 4.9 Yeast genes are short, but genes in flies and 60 kb in extreme cases. (Some fish, such as fugu, © Jones &mammals Bartlett have a Learning, dispersed bimodal LLC distribution extending have© compressedJones & genomesBartlett with Learning, shorter introns LLC NOT FORto SALE very long OR sizes. DISTRIBUTION andNOT intergenic FOR regions SALE than OR mammals DISTRIBUTION have.)

4.7 Genes Show a Wide Distribution of Sizes Due Primarily to Intron Size and Number Variation 89 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. 20 Very long genes are the result of very © Jones & Bartlett Learning, LLC © Jones & Bartlettlong introns, Learning, not the resultLLC of encoding lon- NOT FOR SALE OR DISTRIBUTION15 NOT FOR SALEger ­products. OR DISTRIBUTION There is no correlation between Worm total gene size and total exon size in multicel- 10 lular eukaryotes, nor is there a good correlation between gene size and number of exons. The 5 © Jones & Bartlett Learning, LLC size of a gene is therefore© Jones determined & Bartlett ­prima Learning,­rily LLC by the lengths of its individual introns. In 20 NOT FOR SALE OR DISTRIBUTION ­mammals and insects,NOT theFOR “average” SALE geneOR DISTRIBUTIONis 15 approximately 53 that of the total length of Fly its exons.

% introns 10 © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC 5 4.8 NOT FOR SALE OR DISTRIBUTION NOT SomeFOR SALE DNA OR Sequences DISTRIBUTION 20 Encode More Than 15 Human One Polypeptide 10 © Jones & Bartlett Learning, LLC © Jones & BartlettKey concepts Learning, LLC 5 • The use of alternative initiation or termination NOT FOR SALE OR DISTRIBUTION NOT FOR SALEcodons OR allows DISTRIBUTION multiple variants of a polypeptide 1 5 10 15 20 25 30 chain. Intron length in kb • Different polypeptides can be produced from the FIGURE 4.11 Introns range from very short to very long. same sequence of DNA when the mRNA is read in different reading frames (as two overlapping © Jones & Bartlett Learning, LLC genes). © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION • Otherwise identicalNOT polypeptides, FOR differingSALE byOR the DISTRIBUTION presence or absence of certain regions, can be Full-length protein generated by differential (alternative) splicing when certain exons are included or excluded. This may take the form of including or excluding indi- vidual exons, or of choosing between alternative © Jones & Bartlett Learning, LLC ©exons. Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Part-length protein Many structural genes consist of a sequence that encodes a single polypeptide, although the gene may include noncoding regions at both ends and introns within the . However, © Jones & Bartlett Learning, LLC © Jones & Bartlettthere are someLearning, cases in whichLLC a single sequence NOT FOR SALE OR DISTRIBUTION NOT FOR SALEof DNA OR encodes DISTRIBUTION more than one polypeptide. In one simple example, a single DNA START Triplet codons STOP sequence may have two alternative start codons Alternative START in the same reading frame (see ­FIGURE 4.12). FIGURE 4.12 Two proteins can be generated from a single gene by starting Thus, under different conditions one or the (or terminating) expression© Jonesat different & points. Bartlett Learning, LLC other of the start codons© Jones may be& used,Bartlett ­allowing Learning, LLC NOT FOR SALE OR DISTRIBUTION the production of eitherNOT a shortFOR form SALE of the OR poly DISTRIBUTION- peptide or a full-length form, where the short form is the last portion of the full-length form. START An actual overlapping gene occurs Codons used for protein 1 when the same sequence of DNA encodes © Jones & Bartlett Learning, LLC two© Jonesnonhomologous & Bartlett proteins Learning, because LLC it uses BasesNOT FOR SALE OR DISTRIBUTION moreNOT than FOR one SALE reading OR frame. DISTRIBUTION Usually, a cod- ing DNA sequence is read in only one of the three potential reading frames. In some viral Codons used for protein 2 and mitochondrial genes, however, there is START some overlap­ between two adjacent genes that © Jones &FIGURE Bartlett 4.13 Learning,Two genes may LLC overlap by reading the same DNA ©sequence Jones in & Bartlettare read inLearning, different reading LLC frames, as illus- NOT FOR SALEdifferent frames.OR DISTRIBUTION NOT FOR SALEtrated ORin FIGURE DISTRIBUTION 4.13. The length of overlap is

90 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. W X ␣ Z Exon 1 Intron Exon 2 Intron Exon 3 © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTIONOne mRNA has the ␣ exon NOT FOR SALE OR DISTRIBUTION RNA synthesis Exons W X ␣ ␤ Z

Only introns spliced out One mRNA has the ␤ exon 5Ј 3Ј W © XJones ␤ Z & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC FIGURE 4.14 AlternativeNOT splicing FOR generates SALE the a OR and DISTRIBUTION NOT FOR SALE OR DISTRIBUTION b variants of troponin T. 5Ј 3Ј mRNA

C ­usually© short,Jones so that& Bartlett most of the Learning, DNA sequence LLC N © Jones & Bartlett Learning, LLC encodesNOT a unique FOR SALEpolypeptide OR sequence.DISTRIBUTION NOTLong FOR protein SALEhas all 3 exonsOR DISTRIBUTION In some cases, genes can be nested. This Introns + exon 2 spliced out occurs when a complete gene is found within 5Ј 3Ј the intron of a larger “host” gene. Nested genes often lie on the strand opposite to that of the © Jones &host Bartlett gene. Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALEIn some OR genes DISTRIBUTION there are switches in the NOT FOR 5SALEЈ OR DISTRIBUTION3Ј mRNA pathway for splicing the exons that result in alternative patterns of . A single C gene may generate a variety of mRNA products N that differ in their exon content. Certain exons Short protein has only exon 1 + exon 3 may be optional; in other© Jones words, & they Bartlett may be Learning, FIGURE 4.15 LLC uses the same© Jones pre-mRNA & to Bartlett generate mRNAs Learning, LLC included or spliced out.NOT There FOR also may SALE be a ORpair DISTRIBUTIONthat have different combinations of exons. NOT FOR SALE OR DISTRIBUTION of exons treated as mutually exclusive—one or the other is included in the mature transcript, but not both. The alternative proteins have one primary transcript can be spliced in either of part in common and one unique part. two ways. In the first (more standard) path- In© someJones cases, & Bartlett the alternative Learning, means LLC of way, two introns are© spliced Jones out & and Bartlett the three Learning, LLC expressionNOT FORdo not SALE affect theOR sequence DISTRIBUTION of the exons are joined together.NOT FOR In the SALE second ORpath -DISTRIBUTION polypeptide. For example, changes that affect way, the second exon is excluded as if a single the 59 UTR or the 39 UTR may have regula- large intron is spliced out. This intron consists of tory consequences, but the same polypeptide is intron 1 1 exon 2 1 intron 2. In effect, exon 2 made. In other cases, one exon is substituted for has been treated in this pathway as if it were © Jones &another, Bartlett as in Learning, FIGURE 4.14. LLCIn this example, the part© of Jones a single & intron. Bartlett The Learning,pathways produce LLC NOT FORpolypeptides SALE OR produced DISTRIBUTION by the two mRNAs con- twoNOT polypeptides FOR SALE that areOR the DISTRIBUTION same at their tain sequences that overlap extensively, but are ends, but one has an additional sequence in the different within the alternatively spliced region. middle. (Other types of combinations that are The 39 half of the troponin T gene of rat muscle produced by alternative splicing are discussed contains five exons, but only four are used to in the RNA Splicing and Processing chapter). construct an individual© mRNA. Jones Three & Bartlett exons (W ,Learning, Sometimes LLC two alternative splicing© Jones path- & Bartlett Learning, LLC X, and Z) are includedNOT in all mRNAs.FOR SALE However, OR DISTRIBUTIONways operate simultaneously, withNOT a certain FOR SALE OR DISTRIBUTION in one alternative splicing pattern the a exon proportion of the primary RNA transcripts is included between X and Z, whereas in the being spliced in each way. However, sometimes other pattern it is replaced by the b exon. The the pathways are alternatives that are expressed a and b forms of troponin T therefore differ in under different conditions, for example, one the sequence© Jones of the& Bartlett amino acids Learning, between W LLCand in one cell type and© one Jones in another & Bartlett cell type. Learning, LLC Z, dependingNOT FOR on which SALE of theOR alternative DISTRIBUTION exons So, alternativeNOT (or FORdifferential) SALE ORsplic DISTRIBUTION- (a or b) is used. Either one of the a and b exons ing can generate different polypeptides with can be used in an individual mRNA, but both related sequences from a single stretch of DNA. cannot be used in the same mRNA. It is curious that the multicellular eukaryotic FIGURE 4.15 shows that alternative splicing genome is often extremely large with long © Jones &can Bartlett lead to the Learning, inclusion LLCof an exon in some genes© Jonesthat are &often Bartlett widely Learning, dispersed along LLC a NOT FORmRNAs SALE while OR leavingDISTRIBUTION it out of others. A single chromosome,NOT FOR but SALE at the OR same DISTRIBUTION time there may

4.8 Some DNA Sequences Encode More Than One Polypeptide 91 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. be multiple products from a single locus. Due we can equate the functional domains of cur- © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning,to alternative LLC splicing, there are ,15% more rent proteins with the individual exons of the NOT FOR SALE OR DISTRIBUTIONpolypeptides than genes in flies and worms,NOT FORbut SALEcorresponding OR DISTRIBUTION genes, then this would suggest it is estimated that the majority of human genes selective interdomain interruptions rather than are alternatively spliced (see the chapter titled random ones. Genome Sequences and Gene Numbers). In some cases there is a clear relationship © Jones & Bartlett Learning, LLC between the structures© Jones of a gene & andBartlett its ­prote Learning,in LLC product, but these may be special cases. The 4.9NOT Some FOR SALEExons OR Can DISTRIBUTION Be example par excellenceNOT is provided FOR SALE by the ORimmu DISTRIBUTION- noglobulin () proteins—an extracel- Equated with Protein lular system for self-/nonself-­discrimination Functional Domains that aids in the elimination of foreign patho- gens. Immunoglobulins are encoded by genes © Jones & KeyBartlett concepts Learning, LLC © Jones & Bartlett Learning, LLC in which every exon corresponds exactly to NOT FOR SALE• Proteins OR can DISTRIBUTIONconsist of independent functional a NOTknown FOR functional SALE protein OR DISTRIBUTION domain. Banks modules, the boundaries of which, in some cases, can be equated with those of exons. of alternate sequence domains are tapped so • The exons of some genes appear homologous to that each cell acquires the ability to secrete a the exons of others, suggesting a common exon cell-­specific immunoglobulin with distinctive © Jones & Bartlett Learning,ancestry. LLC © Jones & Bartlettbinding capacity Learning, for a foreignLLC antigen that the organism may one day encounter again (see NOT FOR SALE OR DISTRIBUTIONThe issue of the evolution of interruptedNOT genes FOR SALEthe chapter OR DISTRIBUTION titled Somatic Recombination and is more fully considered in the Genome ­Evolution Hypermutation in the Immune System). FIGURE 4.16 chapter. If proteins evolve by recombining ­compares the structure of an immunoglobulin parts of ancestral proteins that were originally with its gene. separate, the accumulation of protein domains © Jones & Bartlett Learning, LLC An immunoglobulin© Jones is a &tetramer Bartlett of twoLearning, LLC is likely to have occurred sequentially, with light chains and two heavy chains that cova- oneNOT exon addedFOR atSALE a time. OR Each DISTRIBUTION addition would lently bond to generateNOT aFOR protein SALE with severalOR DISTRIBUTION have to improve upon the advantages of prior distinct domains. Light chains and heavy chains additions in a sequence of positive selection differ in structure, and there are several types events. Are the different function-encoding of heavy chains. Each type of chain is pro- © Jones &segments Bartlett from Learning, which these LLC genes may have duced© Jones from a& gene Bartlett that has Learning, a series of LLC exons originally been pieced together reflected in corresponding to the structural domains of NOT FOR theirSALE present OR DISTRIBUTIONstructures? If a protein sequence theNOT protein. FOR SALE OR DISTRIBUTION were randomly interrupted, sometimes the In many instances, some of the exons of interruption would intersect a domain and a gene can be identified with particular func- sometimes it would lie between domains. If tions. In secretory proteins, such as insulin, © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTIONL exon V-J exon C exonNOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOTLight chainFOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION Leader Variable Constant Hinge Constant 2 Constant 3 Protein domains Heavy chain

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

L exon V-D-J exon C1 exon Hinge exon C2 exon C3 exon FIGURE 4.16 Immunoglobulin light chains and heavy chains are encoded by genes whose structures © Jones & Bartlett Learning,(in their LLC expressed forms) correspond to© the Jones distinct & domains Bartlett in the Learning, protein. Each protein LLC domain NOT FOR SALE OR DISTRIBUTIONcorresponds to an exon; introns are numberedNOT FORI1 to I5. SALE OR DISTRIBUTION

92 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. the first exon that encodes for the N-­terminal 4.10 © Jones region& Bartlett of the Learning, polypeptide LLC often specifies a © Jones Members & Bartlett of a Learning, Gene LLC NOT FORsignal SALE sequence OR DISTRIBUTION needed for transfer across a NOTFamily FOR SALE Have OR a DISTRIBUTION membrane. Common Organization The view that exons are the functional building blocks of genes is supported by cases Key concepts in which two genes ©may Jones share &some Bartlett related Learning, • A set ofLLC homologous genes should share common© Jones & Bartlett Learning, LLC exons but also have unique exons. FIGURE 4.17 features that preceded their evolutionary separation. summarizes the relationshipNOT FOR between SALE ORthe DISTRIBUTION• All globin genes have a common form of organizationNOT FOR SALE OR DISTRIBUTION receptor for human LDL (plasma low den- with three exons and two introns, suggesting that sity lipoprotein) and other proteins. The LDL they are descended from a single ancestral gene. receptor gene has a series of exons related • Intron positions in the actin gene family are highly variable, which suggests that introns do to the exons of the EGF (epidermal growth © Jones & Bartlett Learning, LLC not separate functional© Jonesdomains. & Bartlett Learning, LLC factor) precursor gene and another series of NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION exons related to those of the blood protein Many genes in a multicellular eukaryotic complement factor C9. Apparently, the LDL genome are related to others in the same receptor gene evolved by the assembly of genome, either in series (nonallelic) or in parallel modules for its various functions. These mod- (allelic). A gene family is defined as a group of © Jones &ules Bartlett are also Learning, used in different LLC combinations genes© Jonesthat encode & Bartlett related or Learning, identical products LLC in other proteins. NOT FOR SALE OR DISTRIBUTION as aNOT result FORof gene-duplication SALE OR DISTRIBUTION events. After the Exons tend to be fairly small and are first duplication event, the two copies are iden- around the size of the smallest polypeptide that tical, but then they diverge as different muta- can assume a stable folded structure (,20 to tions accumulate in them. Further ­duplications 40 residues). It may be that proteins were orig- and divergence extend the family. The globin inally assembled from© ratherJones small & Bartlett ­modules. Learning, genes are LLC an example of a family that© Jones can be & Bartlett Learning, LLC Each individual module need not correspond NOT FOR SALE OR DISTRIBUTIONdivided into two subfamilies (a globinNOT and FOR b SALE OR DISTRIBUTION to a current function; several modules could globin), but all its members have the same basic have combined to generate a new functional structure and function (see the unit. Larger genes tend to have more exons, chapter). In some cases, we can find genes that which is consistent with the view that ­proteins are more distantly related but that still can be acquire© Jones multiple & functionsBartlett Learning,by successively LLC recognized as having© common Jones ancestry.& Bartlett Such Learning, a LLC ­adding appropriate modules. NOT FOR SALE OR DISTRIBUTION group of gene familiesNOT is called FOR a superfamilySALE OR DISTRIBUTION. This suggestion might explain another A fascinating case of evolutionary conser- aspect of protein structure. It appears that the vation is presented by the a and b globins and sites represented at exon–intron boundaries­ two other proteins related to them. Myoglo- often are located at the surface of a pro- bin is a monomeric oxygen-binding protein tein. As modules are added to a protein, the © Jones & Bartlett Learning, LLC in .© Jones Its &amino Bartlett acid sequence Learning, suggests LLC a ­connections—at least of the most recently common (though ancient) origin with a and b NOT FORadded SALE modules—could OR DISTRIBUTION tend to lie at the surface. NOT FOR SALE OR DISTRIBUTION globins. Leghemoglobins are oxygen-binding proteins present in legume plants; like myoglo- bin, they are monomeric and share a common LDL receptor origin with the other heme-binding proteins. C9 complement EGF© precursorJones & Bartlett Learning,Together, LLC the globins, myoglobins, and© Jones leghe- & Bartlett Learning, LLC homology homology NOT FOR SALE OR DISTRIBUTIONmoglobins make up the globin superfamily—NOT FOR SALE OR DISTRIBUTION a set of gene families all descended from an ancient common ancestor. Both a- and b-globin genes have three exons and two introns in conserved positions © Jones & Bartlett Learning, LLC (see Figure 4.4). The© Jonescentral exon & Bartlett represents Learning, LLC NOT FOR SALE OR DISTRIBUTION the heme-binding domainNOT FOR of the SALE globin chain.OR DISTRIBUTION There is a single myoglobin gene in the and its structure is essentially the EGF precursor same as that of the globin genes. The con- FIGURE 4.17 The LDL receptor gene consists of 18 exons, some of which are related to EGF precursor exons and some served three-exon structure therefore predates © Jones &of whichBartlett are related Learning, to the C9 bloodLLC complement gene. the© common Jones ancestor& Bartlett of the Learning, myoglobin LLC and NOT FORTriangles SALE mark OR the DISTRIBUTION positions of introns. globinNOT genes. FOR SALE OR DISTRIBUTION

4.10 Members of a Gene Family Have a Common Organization 93 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. Leghemoglobin genes contain three introns, Common insulin gene (chicken and rat) © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC the first and last of which are homo­logous to the Leader Coding Coding NOT FOR SALE OR DISTRIBUTIONtwo introns in the globin genes. ThisNOT remark FOR- SALEexon OR Intron DISTRIBUTION exon Intron exon able similarity suggests an exceedingly ancient origin for the interrupted structure of heme- binding proteins, as illustrated in ­FIGURE 4.18. The© central Jones intron & Bartlett of leghemoglobin Learning, separates LLC © Jones & Bartlett Learning, LLC two exons that together encode the sequence Delete intron NOT FOR SALE OR DISTRIBUTION Leader NOT FOR SALE OR DISTRIBUTION corresponding to the single central exon in exon Intron Coding exon globin; the functional heme-binding­ domain is split into two by an intron. Could the central exon of the globin gene have been derived by Second insulin gene in rat © Jones &a Bartlettfusion of twoLearning, central exons LLC in the ancestral FIGURE© Jones 4.19 The & rat Bartlett insulin gene Learning, with one intron LLC evolved gene? Or, is the single central exon the ancestral by loss of an intron from an ancestor with two introns. NOT FOR fSALEorm? In OR this DISTRIBUTION case, an intron must have been NOT FOR SALE OR DISTRIBUTION inserted into it early in plant evolution. Orthologous genes, or orthologs, are genes that are homologous (homologs) due these cases, there must have been exten- sive deletion or insertion of introns during © Jones & Bartlett Learning,to speciation; LLC in other words, they are© Jonesrelated & Bartlett Learning, LLC genes in different species. Comparison of evolution. A well characterized case is that NOT FOR SALE OR DISTRIBUTIONorthologs that differ in structure mayNOT provide FOR SALEof the OR actin DISTRIBUTION genes. The common features information about their evolution. An exam- of actin genes are an untranslated leader of ple is insulin. Mammals and birds have only ,100 bases, a coding region of 1200 bases, one gene for insulin, except for rodents, which and a trailer of ,200 bases. Most actin genes have introns, and their positions can be have© two. Jones FIGURE­ & 4.19Bartlett illustrates Learning, the structures LLC © Jones & Bartlett Learning, LLC of these genes. aligned with regard to the coding sequence NOTWe use FOR the principle SALE ORof parsimony DISTRIBUTION in com- (except for a singleNOT intron FOR sometimes SALE OR found DISTRIBUTION paring the organization of orthologous genes in the leader). by assuming that a common feature predates the FIGURE 4.20 shows that almost every evolutionary separation of the two species. In chick- actin gene is different in its pattern of intron ­positions. Among all the genes being com- © Jones &ens, Bartlett the single Learning, insulin gene LLC has two introns; © Jones & Bartlett Learning, LLC one of the two homologous rat genes has the pared, introns occur at 19 different sites. NOT FOR sameSALE structure. OR DISTRIBUTION The common structure implies However,NOT FOR the SALE range ofOR intron DISTRIBUTION number per that the ancestral insulin gene had two introns. gene is zero to six. How did this situation However, because the second rat gene has only arise? If we suppose that the ancestral actin one intron, it must have evolved by a gene gene had introns, and that all current actin duplication in rodents that was followed by genes are related to it by loss of introns, differ- © Jones & Bartlett Learning,the precise LLC removal of one intron from© Jones one of & Bartlettent introns Learning, have been LLClost in each evolution- NOT FOR SALE OR DISTRIBUTIONthe homologs. NOT FOR SALEary branch. OR DISTRIBUTION Probably some introns have been The organizations of some orthologs show lost entirely, so the ancestral gene could well extensive discrepancies between species. In have had 20 introns or more. The alternative is to suppose that a process of intron inser- tion continued independently in the different © Jones & Bartlett Learning, LLC lineages. © Jones & Bartlett Learning, LLC Heme-bindingNOT FOR domain SALE OR DISTRIBUTION Whether intronsNOT were FOR present SALE in OR actin DISTRIBUTION genes early or late, there appears to have been Globin E I E I E no consistent influence from actin protein domains or subdomains as to where introns should locate. On the other hand, when © Jones & Bartlett Learning, LLC exons© Jones are under & Bartlett negative Learning, selection (resulting LLC NOT FOR SALEE OR I E DISTRIBUTIONE I E inNOT homology FOR conservation), SALE OR DISTRIBUTION in-series recom- Extra intron Leghemoglobin bination between members of an expanding­ gene family (that could cause a contrac- Domain is divided tion in family size) would be decreased by FIGURE 4.18 The exon structure of globin genes cor- intron diversification (resulting in loss of © Jones & Bartlett Learning,responds toLLC protein function, but leghemoglobin© Jones has an & Bartlettsome homology), Learning, and LLC introns would come NOT FOR SALE OR DISTRIBUTIONextra intron in the central domain. NOT FOR SALEto reside OR where DISTRIBUTION this could best be achieved.

94 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC S. pombe NOT FOR SALE OR DISTRIBUTIONS. cerevisiae NOT FOR SALE OR DISTRIBUTION Acanthamoeba Thermomyces C. elegans D. melanogaster A6 A1 A4 © JonesA2 & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC Sea urchinNOT C FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION J Chick muscle Rat muscle Rat cytoplasmic Human smooth muscle Human cardiac muscle © Jones & BartlettSoybean Learning, LLC © Jones & Bartlett Learning, LLC NOTFIGURE FOR 4.20 SALE Actin genesOR DISTRIBUTIONvary widely in their organization. The sites of intronsNOT are FOR indicated SALE by OR DISTRIBUTION dark boxes. The bar at the top summarizes all the intron positions among the different orthologs.

Alleles would have similar exons and introns, 4.11 There Are Many Forms © Jones &so Bartlettin-parallel Learning, interallelic LLC recombination © Jones & Bartlett Learning, LLC NOT FOR(as SALE in meiosis)­ OR DISTRIBUTION would be unimpaired until NOTof FOR Information SALE OR DISTRIBUTION in DNA speciation occurred—a process that could be accompanied by intron relocations. The Key concepts relationships between the intron locations • Genetic information includes not only that related among different species could then be used to characters corresponding to the conventional to construct a phylogenetic© Jones tree & illustratingBartlett Learning, phenotype, LLC but also that related to charac©- Jones & Bartlett Learning, LLC the evolution of the NOTactin gene.FOR SALE OR DISTRIBUTIONters (pressures) corresponding to the genomeNOT FOR SALE OR DISTRIBUTION “­phenotype.” The relationship between individual exons and functional protein domains is • In certain contexts, the definition of the gene can be seen as reversed from “one gene–one protein” somewhat erratic. In some cases there is a to “one protein–one gene.” clear 1:1 relationship; in others no pattern • Positional information may be important in can be© Jonesdiscerned. & BartlettOne possibility Learning, is that LLCthe ­development. © Jones & Bartlett Learning, LLC removalNOT of FOR introns SALE has fused OR DISTRIBUTIONthe previously • Sequences transferredNOT “horizontally” FOR fromSALE other OR spe- DISTRIBUTION adjacent exons. This means that the intron cies to the germline could land in introns or inter- must have been precisely removed without genic DNA and then transfer “vertically” through changing the integrity of the coding region. the generations. Some of these sequences may be involved in intracellular nonself-recognition. An alternative is that some introns arose © Jones &by Bartlettinsertion Learning,into an exon LLC encoding a single © Jones & Bartlett Learning, LLC The term genetic information can include all NOT FORdomain. SALE ORTogether DISTRIBUTION with the variations that NOT FOR SALE OR DISTRIBUTION we see in exon placement in cases such as information that passes “vertically” through the the actin genes, the conclusion is that intron germline, not just genic information. The word positions can evolve. “gene” and its adjective “genic” have different The correspondence of at least some meanings in different contexts, but in most exons with protein ©domains Jones and & Bartlett the pres -Learning,circumstances LLC there is little confusion© Jones when & Bartlett Learning, LLC ence of related exonsNOT in differentFOR SALE proteins OR DISTRIBUTIONcontext is considered. In situations inNOT which FOR a SALE OR DISTRIBUTION leave no doubt that the duplication and sequence of DNA is responsible for production ­juxtaposition of exons have played impor- of one particular polypeptide, current usage tant roles in evolution. It is possible that the regards the entire sequence of DNA—from the number of ancestral exons—from which all first point represented in the messenger RNA proteins© Jones have been& Bartlett derived Learning, by duplication, LLC to the last point corresponding© Jones & to Bartlett its end—as Learning, LLC variation,NOT andFOR recombination—could SALE OR DISTRIBUTION be rela- comprising the “gene”:NOT exons, FOR introns, SALE and OR all. DISTRIBUTION tively small, perhaps as little as a few thou- When sequences encoding polypeptides sand. The idea that exons are the building overlap or have alternative forms of expres- blocks of new genes is consistent with the sion, we may reverse the usual description of “introns early” model for the origin of genes the gene. Instead of saying “one gene–one poly- © Jones &encoding Bartlett proteins Learning, (see the LLC Genome Evolution peptide,”© Jones we may & Bartlett describe theLearning, relationship LLC as NOT FORchapter). SALE OR DISTRIBUTION “oneNOT polypeptide–one FOR SALE gene.” OR DISTRIBUTIONSo we regard the

4.11 There Are Many Forms of Information in DNA 95 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. sequence involved in production of the poly- stage of development, which in turn leads to © Jones & Bartlett Learning,peptide (includingLLC introns and exons)© Jonesas con- & Bartletta further stage,Learning, and so LLCon, until all the tissues NOT FOR SALE OR DISTRIBUTIONstituting the gene, while recognizingNOT that partFOR SALEof the ORadult DISTRIBUTION are formed and functioning. The of this same sequence also belongs to the gene molecular of this regulatory network is of another polypeptide. This allows the use of still somewhat unknown, but we assume that descriptions such as “overlapping” or “alterna- it consists of genes that encode products (often tive”© genes. Jones & Bartlett Learning, LLC protein, but sometimes© Jones RNA) that& Bartlett can influence­ Learning, LLC We can now see how far we have come the expression of other genes. fromNOT the oneFOR gene–one SALE enzymeOR DISTRIBUTION hypothesis of Although suchNOT a series FOR ofSALE interactions OR DISTRIBUTION the early part of the 20th century. The driving­ is almost certainly the means by which the question at that time was the nature of the gene. developmental program is executed, we can It was thought that genes represented “fer- ask whether it is entirely sufficient. One spe- © Jones &ments” Bartlett (enzymes), Learning, but what LLC was the fundamen- cific© Jonesquestion & concerns Bartlett the Learning, nature and LLCrole of tal nature of ferments? Once it was discovered­ positional­ information. We know that all parts of NOT FOR thatSALE most OR genes DISTRIBUTION encode proteins, the paradigm a NOTfertilized FOR egg SALE are not OR equal; DISTRIBUTION one of the fea- became fixed as the concept that every genetic tures responsible for development of different unit functions through the synthesis of a par- tissue parts from different regions of the egg is ticular protein. Either directly or indirectly, location of information (presumably specific © Jones & Bartlett Learning,protein-encoding LLC pressure was responsible© Jones for & Bartlettmacromolecules) Learning, within LLC the cell. what we can now refer to as the conventional We do not fully understand how these NOT FOR SALE OR DISTRIBUTIONphenotype. We now recognize that geneticNOT units FOR SALEparticular OR regions DISTRIBUTION are formed, though particu- encoding polypeptides may also include infor- lar examples have been well studied (see the mation corresponding to the genome phenotype, mRNA Stability and Localization chapter). We manifestations of which include fold pressure, assume, however, that the existence of posi- purine-loading© Jones (AG)& Bartlett ­pressure, Learning, and GC pressure LLC. tional information© in Jones the egg &leads Bartlett to the Learning,dif- LLC There may be conflict between different pres- ferential expression of genes in the cells ­making sures,NOT such FOR as competition SALE OR for spaceDISTRIBUTION in the gam- up the tissues formedNOT from FOR these SALE regions. OR This DISTRIBUTION ete that will transfer genomic information to the leads to the development of the adult organism,­ next generation. For example, a protein might which in the next generation leads to the function most efficiently with the basic amino ­development of an egg with the appropriate © Jones &acid Bartlett lysine (codonLearning, AAA) LLC in a certain position, positional© Jones information. & Bartlett Learning, LLC but GC pressure might require the substitution This possibility of positional information NOT FOR ofSALE another OR basic DISTRIBUTION , such as arginine suggestsNOT FOR that SALEsome information OR DISTRIBUTION needed for (codon CGG). Alternatively, fold pressure might development of the organism is contained in require the ­corresponding to fold a form that we cannot directly attribute to a into a stem-loop structure where CCG would sequence of DNA (although the expression pair with the antiparallel arginine codon. A of particular sequences may be needed to © Jones & Bartlett Learning,lysine codon LLC in this position would disrupt© Jones the & Bartlett­perpetuate Learning, the positional LLC information). Put in a NOT FOR SALE OR DISTRIBUTIONstructure, so again a less efficient polypeptideNOT FOR SALEmore generalOR DISTRIBUTION way, we might ask the following: would have to suffice. If we have the entire sequence of DNA compris- The conventional phenotype, however, ing the genome of some organism and interpret remains the central paradigm of molecular it in terms of proteins and regulatory regions, biology: A genic DNA sequence either directly could we in principle construct an organism (or encodes© Jones a particular & Bartlett polypeptide Learning, or is adjaLLC- even a single living© cell) Jones by controlled & Bartlett expres Learning,- LLC centNOT to the FOR segment SALE that ORactually DISTRIBUTION encodes that sion of the proper NOTgenes? FOR SALE OR DISTRIBUTION polypeptide. How far does this paradigm take Once tissues and organs have developed, us beyond explaining the basic relationship they not only have to be maintained, but also between genes and proteins? protected against potential pathogens. Groups The development of multicellular organ- of variable genes have diversified in the germ- © Jones &isms Bartlett requires Learning, the use ofLLC different genes to line,© Jones and continue & Bartlett to diversify Learning, somatically, LLC to NOT FOR SALE­generate OR the DISTRIBUTIONdifferent cell phenotypes of each allowNOT multicellular FOR SALE organisms OR DISTRIBUTION to (1) respond tissue. The expression of genes is determined extracellularly by the synthesis of immunoglo­ by a regulatory network that takes the form of bulin directed against pathogens, a cascade. Expression of the first set of genes and (2) “remember” past pathogens so that at the start of embryonic development leads to future responses will be faster and stronger © Jones & Bartlett Learning,expression LLC of the genes involved in© the Jones next & Bartlett­(immunological Learning, memory; LLC see the chapter titled NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

96 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. Somatic Recombination in the Immune System). sequence is removed as an intron in some situ- © Jones Should& Bartlett it escape Learning, such extracellular LLC defenses, ations© Jones but retained & Bartlett as an exon Learning, in others. LLC NOT FORthough, SALE theOR nucleicDISTRIBUTION acid of a pathogenic NOTOften, FOR when SALE the organizations OR DISTRIBUTION of ortholo- virus could gain entry to cells and intracellular gous genes are compared, the positions of introns defenses would be needed. are conserved. In genes under negative selection We know that in bacteria infected by pressure, intron sequences vary—and may even bacteriophages (see ©the Jones chapter & titled Bartlett Phage Learning, appear unrelated—althoughLLC exon ©sequences Jones & Bartlett Learning, LLC Strategies), host defenses include rapid local or remain closely related. This conservation of genome-wide transcriptionNOT ofFOR DNA SALE (which OR has DISTRIBUTIONexons, which allows the conservationNOT of impor FOR- SALE OR DISTRIBUTION been documented in eukaryotes in response tant phenotypic characters, can be used to iden- to environmental insult or infection) to pro- tify related genes in different species. In genes duce “antisense” transcripts that are capable of under positive selection pressure, however, base-pairing with pathogen “sense” transcripts © Jones & Bartlett Learning, LLC exon sequences vary,© although Jones intron & Bartlett sequences Learning, LLC to form double-stranded RNAs. These RNAs can remain more similar. This conservation of then NOTact as anFOR alarm SALE signal OR to trigger DISTRIBUTION secondary introns relates to charactersNOT FORcorresponding SALE ORto the DISTRIBUTION defenses (see the example of bacterial CRIS- genome phenotype, such as fold pressure, which PRs discussed in the Regulatory RNA chapter). may relate to error correction in DNA. The host could store a “memory” of previ- Some genes share only some of their exons ous ­intracellular invaders by converting some © Jones & Bartlett Learning, LLC with© otherJones genes, & Bartlett suggesting Learning, that they LLChave pathogen transcripts into DNA through reverse been assembled by addition of exons represent- NOT FORtranscription SALE OR and DISTRIBUTION inserting them into its genome ingNOT functional FOR “modular SALE OR units” DISTRIBUTION of the protein. in an inactive form for future rapid transcription Such modular exons may have been incorpo- of antisense RNAs in times of active infection rated into a ­variety of different proteins and by that pathogen. Thus, some pathogen nucleic sometimes correspon­ d to functional domains of acid might enter the germline “horizontally” © Jones & Bartlett Learning,those proteins. LLC The idea that genes have© Jones been & Bartlett Learning, LLC (within a generation) and the parental memory assembled by sequential addition of exons is of the pathogen couldNOT subsequently FOR SALE be trans OR- DISTRIBUTIONconsistent with the hypothesis thatNOT introns FOR SALE OR DISTRIBUTION ferred “vertically” to offspring. The diversity of were present in the genes of ancestral organ- some elements found within introns and extra- isms, thus facilitating the assembly process. genic DNA (see the chapter titled Transposable Some of the relationships between homologous ­Elements and Retroviruses) could in part reflect © Jones & Bartlett Learning, LLC genes can be explained© Jones by loss &of intronsBartlett from Learning, LLC such past pathogen attacks. There is recent evi- the ancestral genes, with different introns being denceNOT of such FOR inherited SALE antiviral OR DISTRIBUTION immunity in lost in different linesNOT of descent. FOR SALE OR DISTRIBUTION several and plant species. References 4.12 Summary 4.1 Introduction © Jones &Most Bartlett eukaryotic Learning, genomes LLC contain genes that © Jones & Bartlett Learning, LLC NOT FORare SALE interrupted OR DISTRIBUTION by intron sequences. The pro- ReviewsNOT FOR SALE OR DISTRIBUTION portion of interrupted genes is low in some Crick, F. (1979) Split genes and RNA splicing. fungi, but few genes are uninterrupted in mul- Science 204, 264–271. ticellular eukaryotes. The size of a gene is deter- Harris, H. (1994) An RNA heresy in the fifties. mined primarily by the lengths of its introns. Trends Biochem. Sci. 19, 303–305. Hong, X., Schofield, D. G., and Lynch, M. (2006). The range of gene sizes© in Jones mammals & isBartlett generally Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTIONIntron size, abundance, and distributionNOT FOR SALE OR DISTRIBUTION from 1 to 100 kb, but there are some that are within untranslated regions of genes. Mol. Biol. even larger. Evol. 23, 2392–2404. Introns are found in all classes of eukary- otic genes, both those encoding protein ­products Research and those encoding independently functioning Glover, D. M., and Hogness, D. S. (1977). A novel RNAs.© TheJones structure & Bartlett of an interrupted Learning, gene LLC is arrangement of the© Jones8S and 28S & sequencesBartlett in Learning, a LLC repeating unit of D. melanogaster rDNA. Cell 10, the sameNOT in FORall tissues: SALE Exons OR are DISTRIBUTION spliced together NOT FOR SALE OR DISTRIBUTION 167–176. in RNA in the same order as they are found Scherrer, K., et al. (1970). Nuclear and cytoplas- in DNA, and the introns, which usually have mic messenger-like RNAs and their relation to no coding function, are removed from RNA by the active messenger RNA in polyribosomes of splicing. Some genes are expressed by alter- © Jones & Bartlett Learning, LLC ©HeLa Jones cells. Cold& Bartlett Spring Harb. Learning, Symp. Quant. LLC Biol. native splicing patterns, in which a ­particular 35, 539–554. NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

4.12 References 97 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. © Jones & Bartlett Learning,4.2 AnLLC Interrupted Gene Consists of© Exons Jones & Bartlett4.6 Exon Learning, Sequences LLCUnder Positive Selection and Introns Vary but Introns Are Conserved NOT FOR SALE OR DISTRIBUTION NOT FOR SALEForsdyke, OR D. DISTRIBUTION R. (1995). Conservation of stem-loop Review potential in introns of snake venom phos- Forsdyke, D. R. (2011). Exons and introns. In: pholipase A genes: an application of FORS-D Evolutionary Bioinformatics, 2nd ed. Springer, 2 analysis. Mol. Biol. Evol. 12, 1157–1165. New York, pp. 249–266. (See also http://post Forsdyke, D. R. (1995). Reciprocal relationship .queensu.ca/ forsdyke/introns.htm.) © Jones &, Bartlett Learning, LLC between stem-loop© Jones potential & and Bartlett substitu- Learning, LLC 4.3NOT Exon FOR and SALE Intron Base OR Compositions DISTRIBUTION Differ tion density in retroviralNOT FOR quasispecies SALE under OR DISTRIBUTION positive Darwinian selection. J. Mol. Evol. 41, Reviews 1022–1037. (See http://post.queensu Forsdyke, D. R., and Mortimer, J. R. (2000). .ca/,forsdyke/hiv01.htm.) Chargaff’s legacy. Gene 261, 127–137. (See http:// Forsdyke, D. R. (1996). Stem-loop potential in © Jones & Bartlettpost.queensu.ca/ Learning,,forsdyke/bioinfo2.htm.) LLC © MHCJones genes: & aBartlett new way ofLearning, evaluating posi LLC- Forsdyke, D. R., and Bell, S. J. (2004). Purine- tive Darwinian selection. Immunogenetics 43, NOT FOR SALEloading, OR stem-loops, DISTRIBUTION and Chargaff’s second NOT182–189. FOR SALE OR DISTRIBUTION parity rule: a discussion of the application of elementary principles to early chemical observa- 4.7 Genes Show a Wide Distribution of Sizes tions. Applied Bioinformatics 3, 3–8. (See http:// Due Primarily to Intron Size and Number post.queensu.ca/,forsdyke/bioinfo5.htm.) Variation © Jones & Bartlett Learning, LLC © Jones & BartlettHawkins, J. Learning, D. (1988). A surveyLLC of intron and NOT FOR SALE OR DISTRIBUTIONResearch NOT FOR SALEexon OR lengths. DISTRIBUTION Nucleic Acids Res. 16, 9893–9905. Babak, T., Blencowe, B. J., and Hughes, T. R. Naora, H., and Deacon, N. J. (1982). Relationship (2007). Considerations in the identification between the total size of exons and introns in of functional RNA structural elements in ge- protein-coding genes of higher eukaryotes. nomic alignments. BMC Bioinf. 8, article num- Proc. Natl. Acad. Sci. USA 79, 6196–6200. ber 33. Bechtel,© Jones J. M., et & al. Bartlett (2008). Genomic Learning, mid-range LLC 4.8 Some DNA Sequences© Jones Encode & Bartlett More Than Learning, LLC NOTinhomogeneity FOR SALE correlates OR with DISTRIBUTION an abundance One PolypeptideNOT FOR SALE OR DISTRIBUTION of RNA secondary structure. BMC 9, article number 284. Research Bultrini, E., et al. (2003). Pentamer vocabularies Pan, Q., et al. (2008). Deep surveying of alterna- characterizing introns and intron-like inter- tive splicing complexity in the human tran- scriptome by high-throughput sequencing. © Jones & Bartlettgenic tracts Learning, from Caenorhabditis LLC elegans and © Jones & Bartlett Learning, LLC . Gene 304, 183–192. Nature 40, 1413–1415. NOT FOR Ko,SALE C. H., OR et al. DISTRIBUTION (1998). U-richness is a defining Sultan,NOT M., FOR et al. SALE (2008). AOR global DISTRIBUTION view of gene feature of plant introns and may function as activity and alternative splicing by deep se- an intron recognition signal in maize. Plant quencing of the human transcriptome. Science Mol. Biol. 36, 573–583. 321, 956–960. Zhang, C., Li, W.-H., Krainer, A. R., and Zhang, 4.9 Some Exons Can Be Equated with Protein © Jones & Bartlett Learning,M. Q. LLC (2008). RNA landscape of evolution© Jones for & BartlettFunctional Learning, Domains LLC NOT FOR SALE OR DISTRIBUTIONoptimal exon and intron discrimination.NOT Proc. FOR SALE OR DISTRIBUTION Natl. Acad. Sci. USA 105, 5797–5802. Reviews Blake, C. C. (1985). Exons and the evolution of 4.4 Organization of Interrupted Genes proteins. Int. Rev. Cytol. 93, 149–185. May Be Conserved Doolittle, R. F. (1985). The genealogy of some Review recently evolved vertebrate proteins. Trends © Jones & Bartlett Learning, LLC Biochem. Sci. 10, ©233–237. Jones & Bartlett Learning, LLC Fedoroff,NOT N. FOR V. (1979). SALE On spacers. OR DISTRIBUTION Cell 16, 697–710. NOT FOR SALE OR DISTRIBUTION Research 4.10 Members of a Gene Family Have Berget, S. M., Moore, C., and Sharp, P. (1977). a Common Organization Spliced segments at the 59 terminus of adeno- Review virus 2 late mRNA. Proc. Natl. Acad. Sci. USA Dixon, B., and Pohajdek, B. (1992). Did the © Jones & Bartlett74, 3171–3175. Learning, LLC © ­ancestralJones globin& Bartlett gene of Learning,plants and animals LLC NOT FOR SALEChow, L. OR T., Gelinas, DISTRIBUTION R. E., Broker, T. R., and NOTcontain FOR only SALE two introns? OR DISTRIBUTION Trends Biochem. Sci. Roberts, R. J. (1977). An amazing sequence 17, 486–488. arrangement at the 59 ends of adenovirus 2 mRNA. Cell 12, 1–8. Research Jeffreys, A. J., and Flavell, R. A. (1977). The rab- Matsuo, K., et al. (1994). Short introns bit b-globin gene contains a large insert in the ­interrupting the Oct-2 POU domain may © Jones & Bartlett Learning,coding LLC sequence. Cell 12, 1097–1108.© Jones & Bartlettprevent Learning, recombination LLC between POU family NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

98 chapter 4 The Interrupted Gene © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION. members without interfering with potential ­unicellular state: implications of emerging © Jones & BartlettPOU domain Learning, “shuffling” LLC in ­evolution. Biol. ©­genomic Jones data. & TrendsBartlett Immunol Learning,. 23, 575–579. LLC NOT FOR SALEChem. ORHopp-Seyler DISTRIBUTION 375, 675–683. NOT(See http://post.queensu.ca/ FOR SALE OR DISTRIBUTION,forsdyke/­ Weber, K., and Kabsch, W. (1994). Intron posi- theorimm.htm.) tions in actin genes seem unrelated to the sec- Jeffares, D. C., Penkett, C. J., and Bähler, J. ondary structure of the protein. EMBO. J. 13, (2008). Rapidly regulated genes are intron 1280–1286. poor. Trends in Genetics 24, 375–378. © Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC 4.11 Research There Are ManyNOT Forms FOR of Information SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION in DNA Bertsch, C., Beuve, M., Dolja, V. V., Wirth, M., Pelsy, F., Herrbach, E., and O. Lemaire. (2009). Reviews Retention of the virus-derived sequences in Barrangou, R., et al. (2007). CRISPR provides the nuclear genome of grapevine as a potential acquired resistance against viruses in prokar­ pathway to virus resistance. Biology Direct 4, 21. yotes.© Jones Science 315,& Bartlett 1709–1712. Learning, LLC Flegel, T. W. (2009). Hypothesis© Jones for & heritable, Bartlett anti- Learning, LLC Bernardi,NOT G., FOR and Bernardi, SALE G. OR (1986). DISTRIBUTION viral immunity inNOT crustaceans FOR and SALE insects. OR DISTRIBUTION Compositional constraints and genome evolu- Biology Direct 4, 32. tion. J. Mol. Evol. 24, 1–11. Saleh, M.-C., Tassetto, M., van Rij, R. P., Goic. B., Forsdyke, D. R. (2011). Evolutionary Bioinformatics, Gausson, V. Berry, B., Jacquier, C., Antoniewski, 2nd ed. Springer, New York, 509 pp. C., and R. Andino. (2009). Antiviral immunity Forsdyke, D. R., Madill, C. A., and Smith, S. D. in Drosophila requires systemic RNA interference © Jones & Bartlett(2002). Immunity Learning, as a function LLC of the ©spread. Jones Nature & 458,Bartlett 346-350. Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

© Jones & Bartlett Learning, LLC © Jones & Bartlett Learning, LLC NOT FOR SALE OR DISTRIBUTION NOT FOR SALE OR DISTRIBUTION

References 99 © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION.