<<

Expression of tandem duplicates is often greater than twofold

David W. Loehlina,b and Sean B. Carrolla,b,1

aHoward Hughes Medical Institute, University of Wisconsin-Madison, Madison, WI 53706; and bLaboratory of & Molecular Biology, University of Wisconsin-Madison, Madison, WI 53706

Contributed by Sean B. Carroll, April 13, 2016 (sent for review March 11, 2016; reviewed by Daniel L. Hartl and Harmit S. Malik) Tandem gene duplication is an important mutational process in a twofold increase in gene output in the course of pursuing the evolutionary adaptation and human disease. Hypothetically, two genetic basis of the sixfold greater ADH activity in tandem gene copies should produce twice the output of a single brewery-adapted Drosophila virilis relative to its sibling Drosophila gene, but this expectation has not been rigorously investigated. americana (Fig. 1A). Two copies of the entire D. virilis gene, including Here, we show that tandem duplication often results in more than all known regulatory elements, occur within a 7-kb tandem duplica- double the gene activity. A naturally occurring tandem duplication tion, whereas the orthologous sequence in D. americana is single copy of the (Adh) gene exhibits 2.6-fold greater (9). We cloned the duplicated Adh region from D. virilis and found expression than the single-copy gene in transgenic Drosophila. that the two duplicate copies in our laboratory strain were nearly This tandem duplication also exhibits greater activity than two copies of the gene in trans, demonstrating that it is the tandem identical, with only three distinguishing single-nucleotide changes arrangement and not copy number that is the cause of overactivity. located distal to the unit (Fig. 1B). We therefore pre- We also show that tandem duplication of an unrelated synthetic re- sumed that the tandem duplication would account for twofold higher porter gene is overactive (2.3- to 5.1-fold) at all sites in the activity, with the remaining threefold change in activity accounted for that we tested, suggesting that overactivity could be a general prop- by subsequent changes in regulatory or coding sequences. erty of tandem gene duplicates. Overactivity occurs at the level of We tested this presumption by inserting duplicate and single- RNA transcription, and therefore tandem duplicate overactivity ap- copy D. virilis Adh transgenes (Fig. 1B) into an inbred Adh-null pears to be a previously unidentified form of position effect. The D. melanogaster recipient line at a specific chromosomal in- increment of surplus observed is comparable to sertion site (attP ZH-86Fb), followed by measurement of ADH many regulatory fixed in nature and, if typical of other activity from whole-fly homogenates. We were surprised to ob- , would shape the fate of tandem duplicates in . serve 2.6-fold higher ADH enzyme activity from the duplicates than from the single copy of D. virilis Adh (Fig. 1C). The dif- tandem duplication | gene expression | position effect | gene structure | ference between single and duplicate was significantly greater than expected (t test, P = 0.0005; see Tables S1–S14 for details of underlying mixed-effects models). In addition, we tested for any volutionarily and medically relevant phenotypes often derive effect of the between-copy nucleotide changes by engineering a from quantitative changes in gene expression. It is becoming E construct where the left and the right were identical (Fig. increasingly appreciated that relatively modest changes in gene 1B). ADH activity from this identical-duplicate construct was in- expression or activity can have meaningful effects. For ex- = ample, alleles with 1.1- to 1.6-fold effects on transcription or enzyme distinguishable from the original cloned duplicate (t test, P 0.64; activity (1) have been identified that show evidence for selection, including Adh in Drosophila melanogaster and Lactase and Significance Prodynorphin in humans (1–3). Furthermore, most transcriptional variation in Drosophila is on the order of twofold or less (4). Differences among individuals and species originate from Understanding the mutational basis of these activity changes is a changes to the genome. Yet our knowledge of the principles necessary step to predict phenotypes based on genomic sequences. that might allow prediction of the effects of any particular One simple way for gene activity to double is through tandem is limited. One such prediction might be that dupli- gene duplication. Gene duplication is a common mutational process, cating a gene would double the gene’s output. We show that − − occurring with estimated rates of 10 9 to 10 7 new duplicates per gene this is actually not the case in Drosophila flies. Instead, in al- per generation in flies, worms, and (5, 6). Gene duplication has most all of the cases we tested (using a naturally occurring and been of long-standing interest in evolution because, once genes have an artificially constructed tandem duplicate gene), we ob- duplicated, one copy may acquire a novel function (7, 8), and many served that the output of the duplicated genes was greater genes involved in physiological and developmental diversification occur than double the output of single copies—as much as five times as tandem duplicates in gene complexes. However, relatively little is greater. This finding suggests that tandem duplicate genes known empirically about the first step in this process—the immediate could have disproportionate effects when they occur. phenotypic consequences of a single gene duplication. This may be due Author contributions: D.W.L. and S.B.C. designed research; D.W.L. performed research; to the difficulty of isolating the effects of increased copy number from D.W.L. contributed new reagents/analytic tools; D.W.L. analyzed data; and D.W.L. and any potential contribution of subsequent sequence divergence to gene S.B.C. wrote the paper. expression of a duplicate pair. Here, we uncovered an effect of tandem Reviewers: D.L.H., Harvard University; and H.S.M., Fred Hutchinson Research duplications on gene activity in the Drosophila melanogaster genome Center. that is greater than twofold. We suggest that this phenomenon, which The authors declare no conflict of interest. we refer to as “tandem duplicate overactivity,” may be a previously Freely available online through the PNAS open access option. unidentified type of positioneffectongeneexpression. Data deposition: The DNA sequences reported in this paper have been deposited in the GenBank database [accession no. KU559568 (Drosophila virilis Adh locus)]. Results and Discussion 1To whom correspondence should be addressed. Email: [email protected]. Adh Tandem Duplication of Is Overactive. We encountered the This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. possibility that tandem gene duplicates might not simply produce 1073/pnas.1605886113/-/DCSupplemental.

5988–5992 | PNAS | May 24, 2016 | vol. 113 | no. 21 www.pnas.org/cgi/doi/10.1073/pnas.1605886113 Downloaded by guest on October 2, 2021 A C limited to one chromosomal location. We tested the first hy- 1.6 Site ZH-86Fb pothesis by constructing duplications of an unrelated gene, the 1.6 well-studied synthetic reporter gene vgQ-lacZ that consists of the β-galactosidase reporter gene linked to the ∼800-bp quadrant enhancer of the D. melanogaster vestigial gene (10). 1.2 We inserted single and duplicate constructs into the same inser- 1.2 tion site used above and then measured β-galactosidase activity in third-instar wing imaginal disk cells. The activity of duplicate 2x Single transgenes relative to singletons was again significantly greater 0.8 than twofold (t test, P = 0.01; Fig. 3A), even though the gene, 0.8 ADH activity tissue, and measurement assay used were completely different. We examined whether duplicate overactivity was dependent per min mg soluble protein)

340 on chromosomal position by inserting single and duplicate vgQ- 0.4 0.4 lacZ transgenes at six additional sites. We selected attP insertion ( Δ Abs sites that are commonly used by Drosophila researchers because of their faithful expression of transgenes. The vgQ-lacZ dupli- 0.0 cates were significantly overactive at all sites (Fig. 3 and Tables D. americana D. virilis Dup Ident_dup Single S1–S14), with duplicate activity ranging from 2.3-fold (in four of seven sites) to 5.1-fold higher than singletons. Even considering B 1kb the possibility that the insertion sites selected are a biased sample Dup of the genome, this result suggests that overactivity is common and Adh Adh could have a typical value. It also suggests that the degree of Ident_dup overactivity is influenced by location.

Single

Dup Adh Adh Fig. 1. Tandem duplication of Adh from D. virilis is overactive. (A) ADH enzyme activity is sixfold higher in D. virilis than D. americana. Boxplots Single Adh show median and interquartile range with thin lines extending to the lesser of 1.5× the interquartile range or the data extremes. n = 15 samples. Uninserted (B) Schematic of the tandem duplicated Adh locus in D. virilis (“Dup”). Vertical bars delimit the duplicated region. Ovals mark the three nucleotides Site ZH-86Fb that distinguish the left copy from the right copy. Also shown are engi- neered constructs with SNPs removed (“Ident_dup”) and the isolated single copy (“Single”). (C) ADH activity of D. melanogaster flies (ZH-86Fb attP site, Adh-null) transformed with D. virilis Single and Dup constructs. Dashed line shows predicted twofold mean activity of the Single construct. Error bars 1.2 show 95% confidence interval of means (Tables S1–S14). Sample sizes for this and subsequent plots are in Tables S1–S14. We verified that assay mea- surements scaled one-to-one with homogenate concentration (Fig. S1). P = 0.001

Fig. 1C). These results suggested that duplication of the Adh gene itself might be the source of the excess 60% activity. 0.8

Overactivity Depends on Tandem Arrangement. This unexpected observation prompted us to examine whether the surplus activity could be due to nonadditive scaling of gene expression. Specif- ically, the duplication-bearing flies contain four copies of the ADH activity D. virilis Adh gene per cell, whereas the singletons carry two /min per mg soluble protein) 0.4 340 copies. We reasoned that comparing ADH activity in flies with an equal number of gene copies per cell but in different con- Abs Δ figurations would control for nonadditive scaling of gene ex- ( pression. We crossed single and duplicate inserted flies back to the Adh-null transgene insertion line, crossed F1 siblings, and compared F2 flies that were singleton homozygotes (two copies 0.0 per cell) with flies that were duplicate heterozygotes (two copies Dup Dup Single Single Un- per cell) (Fig. 2). Duplicate heterozygotes had 50% higher ADH hom het hom het inserted activity than singleton homozygotes, significantly different from the null expectation of equal activity (t test, P = 0.0013). This Adh copies 422 01 demonstrates that two gene copies arranged in tandem behave per cell: (in cis) (in trans) differently than two copies in trans. Fig. 2. Excessive ADH activity is due to the tandem duplication, not copy number per cell. F2 homozygotes and heterozygotes from crosses back to Tandem Duplicates of a Synthetic Reporter Gene Are Overactive at All the Adh-null transgene insertion line were extracted using the high- Sites Tested. We next considered whether this overactivity could throughput procedure. Compare the Single homozygote and the Dup be a general property of tandem duplicates. If so, overactivity heterozygote, each of which bear two copies of the Adh but in different would not be limited to the Adh gene, and it should not be configurations.

Loehlin and Carroll PNAS | May 24, 2016 | vol. 113 | no. 21 | 5989 Downloaded by guest on October 2, 2021 (G) attP40 (F) VK00033

Dup vgQ-lacZ vgQ-lacZ (E) VK00037 (C) ZH-68E (B) ZH-22A (D) ZH-51D (A) ZH-86Fb Single vgQ-lacZ Chr. X Chr. 2 Chr. 3 4 A B C D 400 Site ZH-86Fb 400 Site ZH-22A 400 Site ZH-68E 400 Site ZH-51D

300 300 300 300 ) 3

2x Singleg

per hr *10 200 200 200 200 574 ( Δ Abs β -galactosidase activity

100 100 100 100

0 0 0 0 Dup Single Dup Single Dup Single Dup Single E F G 400 Site VK00037 400 Site VK00033 400 Site attP40 280 H C

300 300 300 240 G

) A 3

E

200 per hr *10 200 200 200 d 574

> 2-fold < 2-fol mean Dup activity ( Δ Abs β -galactosidase activity F 160 D 100 100 100

B 120 40 60 80 100 120 0 0 0 Dup Single Dup Single Dup Single mean Single activity

Fig. 3. Duplicate overactivity is not limited to Adh and varies with genomic position. vgQ-lacZ “Single” and tandem duplicate (“Dup”) constructs were inserted in the following attP sites: (A) ZH-86Fb (same site and genetic background as Fig. 1); (B) ZH-22A;(C) ZH-68E;(D) ZH-51D;(E) VK00037;(F) VK00033; and (G) attp40 (same site and genetic background as Fig. 4). β-Galactosidase activity was measured from wing imaginal discs. Dashed line shows predicted twofold activity of Single construct at each site. (H) Summary of preceding panels. Each point represents mean β-galactosidase activity from Dup and Single inserts at each site. Dashed line indicates a twofold activity difference.

Although overactivity was common in our observations, we manifest at the transcript level. To test this prediction, we iso- note that it was not universal across chromosome locations. lated RNA from single and duplicate inserted flies and con- When the Adh Single and Dup constructs were inserted in the ducted quantitative real-time PCR measurements calibrated with attP40 site (in an Adh-null background), Adh duplicate activity a standard curve and a control gene RP49 (Fig. S1). In the relative to singleton activity was not significantly different from overactive ZH-86Fb site, duplicate flies expressed virilis Adh twofold (t test, P = 0.19; Fig. 4A). If Adh duplicates in this site RNA transcript levels that were 3.7-fold higher than singleton are merely additive, two copies of the gene should produce the flies, significantly greater than the additive expectation of two- same output regardless of whether or not the genes are in tan- fold (t test, P = 0.002; Fig. 5A). In contrast, in the additive attP40 dem configuration. To test this hypothesis, we crossed single and site, the difference in Adh transcript levels was not significantly duplicate inserted flies back to the Adh-null transgene insertion different from twofold (t test, P = 0.70; Fig. 5B). Duplicate line, crossed F1 siblings, and compared the activity of F2 du- overactivity (and its absence) therefore manifests at both the plicate heterozygotes to singleton homozygotes. These genotypes protein and transcript levels. were indistinguishable from one another (t test, P = 0.66; Fig. 4B), indicating additivity. In contrast, however, the vgQ-LacZ The Biological Significance and Potential Mechanisms Underlying duplicate insertions in this site were overactive (Fig. 3G). There- Overactivity. It may be asked whether it is biologically signifi- fore, duplicates do have the capacity to behave additively, but this cant that a mutation changes activity by 2.6-fold rather than 2.0- appears to be influenced by both chromosomal location and some fold. Evidence from functional and population studies in flies aspect of the duplicated sequence. and humans suggests that fractional differences in gene expres- sion of this magnitude (60%) can have phenotypic effects and Duplicate Overactivity Is Transcriptional. Our experiments indicate show signatures of selection (1–3). It is therefore likely that that duplicate overactivity is not the result of raw scaling of gene duplicate overactivity can contribute meaningfully to phenotypes number per cell and is influenced by chromosome position. when large changes in activity are advantageous. The sixfold These observations are not consistent with a posttranscriptional difference in ADH activity between alcohol-resistant D. virilis mechanism. Instead, they suggest that overactivity should be and alcohol-sensitive D. americana is among the largest seen

5990 | www.pnas.org/cgi/doi/10.1073/pnas.1605886113 Loehlin and Carroll Downloaded by guest on October 2, 2021 A B attenuated by neighboring sequences (i.e., by classical position 1.25 Site attP40 Site attP40 effects), which could account for the observed dependence of the degree of overactivity on chromosomal position. However, the 2x Single 0.9 nearly universal positive overactivity observed here suggests that 1.00 this is not just the influence of classical position effects, which we would expect would affect both single and duplicate genes in P = 0.66 similar ways. Instead, some aspect of the duplicated sequence 0.75 0.6 itself appears to generate a synergistic effect on expression whenever two identical genes are adjacent to each other. ADH activity 0.50 per min mg total protein) Conclusion

340 0.3 The discovery of the overactivity of tandem duplicates in Drosophila,

( Δ Abs 0.25 despite many decades of the study of gene duplication, under- scores how our understanding of the quantitative factors that 0.0 govern gene expression are incomplete. We hope that this study 0.00 Dup Dup Single Single Un- Dup Single hom het hom het inserted will prompt similar quantitative analyses of gene duplicates in other genomes to ascertain to what degree overactivity is a Adh copies 42 2 01 per cell: (in cis) (in trans) general phenomenon. Uncovering such potential general prin- ciples is a necessary step toward the goal of using genome se- Fig. 4. Tandem duplicate activity is simply additive for Adh in one genomic quences to understand and predict phenotypes. position. (A) ADH activity for virilis Adh Single and Dup constructs in the attP40 site in Adh-null background. (B) Differences in ADH activity are Methods proportional to copy number per cell. F2 heterozygous and homozygous We investigated the contribution of tandem duplications to phenotypes males from crosses back to the Adh-null transgene insertion line were (enzyme activity and mRNA levels) using transgenes in Drosophila. Transgenic generated as in Fig. 2. lines with Adh and vgQ-lacZ single or tandem duplicate insertions were produced using the PhiC31-attP system as described in SI Methods and Table between sibling Drosophila species (9, 11). Gene duplication and S15. This transgenic system allows different transgenes (e.g., Single and Duplicate) to be inserted into the same chromosomal site in identical genetic duplicate overactivity appear to be able to account for a portion background. ADH enzyme activity and mRNA level was measured from of this difference, but we caution that we did not measure the homogenates of whole flies, whereas β-galactosidase activity was measured level of overactivity at the native D. virilis Adh locus. Additional from dissected wandering third-instar wing imaginal discs using protocols sequence divergence at Adh or in trans may also contribute to the described in SI Methods and Fig. S2. Assays were checked for linearity and difference in ADH levels. one-to-one scaling using standard curves shown in Fig. S1. The experimental At the population scale, however, most duplicates occur at low design had a nested structure: enzyme activity and mRNA levels were allele frequencies, suggesting that there is generally negative measured from a large number of samples (i.e., 12–96) from a small number selection against large changes in gene activity (5, 6). When func- tionally redundant duplicated genes are retained (12), their joint expression levels often evolve to be comparable to that of single- copy genes. These observations suggest that duplicate overactivity Adh Adh (B) attP40 (A) ZH-86Fb might often be suppressed or masked by selection for mutations Adh Chr. X Chr. 2 Chr. 3 4 that reduce gene activity. There are hints in the literature that gene duplicate over- A B activity may occur in other contexts. In the first-described case of gene duplication, Sturtevant (13) observed that Bar duplicate Site ZH-86Fb Site attP40 heterozygotes suppress eye facet formation 1.5-fold more than 6 singleton homozygotes, a similar ratio to what we observed here with Adh. In mosquitoes, a tandemly duplicated block of P450 genes exhibits 25- to 50-fold higher transcription (14). In addi- 2 tion, in tumor cells as well as human populations, some dupli- cated genes also show possible nonadditive expression relative to single copies (15–17). However, we note that detection of du- 4 plicate overactivity requires that one control for additional po- tential regulatory substitutions in cis and in trans, which may

impose a practical limit on studies of duplicate overactivity to (fold change)

fresh duplicates or to transgenic experiments. mRNA Relative Expression 1 Tandem duplicate overactivity appears to be a previously un- known form of position effect, in this case one in which gene 2 expression levels are affected by the presence of an adjacent GENETICS

duplicate gene. The greater than twofold increase in transcrip- D. virilis Adh tion from a tandem duplicate could arise from aspects of various known regulatory mechanisms. Some of the possibilities we can attP40 envision include the following: (i) more frequent rebinding of 0 0 transcription factors because the local concentration of binding Ident_dup Single Dup Single sites is higher in tandemly arranged duplicates (18); (ii) more Fig. 5. Overactivity is associated with increased transcription. Expression of efficient looping of DNA due to clusters of transcription factors Adh relative to control gene RP49 was measured with quantitative real-time binding to identical sites on both gene copies (18); or (iii) more PCR. (A) Adh transcription from duplicates in the ZH-86Fb site is overactive. effective remodeling of chromatin to a favorable state for tran- (B) Adh transcription from duplicates in the attp40 site is additive. Dashed scription (19). Any of these mechanisms could be enhanced or line shows predicted twofold expression of Single construct.

Loehlin and Carroll PNAS | May 24, 2016 | vol. 113 | no. 21 | 5991 Downloaded by guest on October 2, 2021 (i.e., 2–4) of replicate transgenic lines. Therefore, we analyzed the data with a ACKNOWLEDGMENTS. We thank Kathy Vaccaro for superb technical mixed-effects model (described in more detail in SI Methods and with details on support in producing transgenic lines; Nicholas Keuler for guidance on the sample size, model parameters, and estimated effects for each experiment statistical analysis; Greg Wray for discussion; Fiona Ukken, Victoria Kassner, – and Jane Selegue for technical advice; and Henry Chung, Matt Giorgianni, and presented in Tables S1 S14). Tests of the null hypothesis of twofold difference Noah Dowell for advice and helpful comments on the manuscript. D.W.L. is a were calculated with t tests, using the SEs from the mixed-effects models and Howard Hughes Medical Institute Fellow of the Sciences Research degrees of freedom corresponding to the number of transgenic lines. Foundation. S.B.C. is a Howard Hughes Medical Institute Investigator.

1. Stam LF, Laurie CC (1996) Molecular dissection of a major gene effect on a quanti- 14. Wondji CS, et al. (2009) Two duplicated P450 genes are associated with pyrethroid tative trait: The level of alcohol dehydrogenase expression in Drosophila mela- resistance in Anopheles funestus, a major malaria vector. Genome Res 19(3):452–459. nogaster. Genetics 144(4):1559–1564. 15. Faust JB, Meeker TC (1992) Amplification and expression of the bcl-1 gene in human 2. Tishkoff SA, et al. (2007) Convergent adaptation of human lactase persistence in solid tumor cell lines. Cancer Res 52(9):2460–2463. Africa and Europe. Nat Genet 39(1):31–40. 16. Perry GH, et al. (2007) Diet and the evolution of human amylase gene copy number 3. Babbitt CC, et al. (2010) Multiple functional variants in cis modulate PDYN expression. variation. Nat Genet 39(10):1256–1260. – Mol Biol Evol 27(2):465 479. 17. Handsaker RE, et al. (2015) Large multiallelic copy number variations in humans. Nat 4. Coolon JD, McManus CJ, Stevenson KR, Graveley BR, Wittkopp PJ (2014) Tempo and Genet 47(3):296–303. – mode of regulatory evolution in Drosophila. Genome Res 24(5):797 808. 18. Feuerborn A, Cook PR (2015) Why the activity of a gene depends on its neighbors. 5. Katju V, Bergthorsson U (2013) Copy-number changes in evolution: Rates, fitness Trends Genet 31(9):483–490. effects and adaptive significance. Front Genet 4:273. 19. Gross DS, Chowdhary S, Anandhakumar J, Kainth AS (2015) Chromatin. Curr Biol 6. Rogers RL, et al. (2015) Tandem duplications and the limits of in 25(24):R1158–R1163. Drosophila yakuba and Drosophila simulans. PLoS One 10(7):e0132184. 20. Wittkopp PJ, Vaccaro K, Carroll SB (2002) Evolution of yellow gene regulation and 7. Ohno S (1970) Evolution by Gene Duplication (Springer, New York). pigmentation in Drosophila. Curr Biol 12(18):1547–1556. 8. Force A, et al. (1999) Preservation of duplicate genes by complementary, de- 21. Gibson DG, et al. (2009) Enzymatic assembly of DNA molecules up to several hundred generative mutations. Genetics 151(4):1531–1545. kilobases. Nat Methods 6(5):343–345. 9. Nurminsky DI, Moriyama EN, Lozovskaya ER, Hartl DL (1996) Molecular phylogeny and 22. Ordway AJ, Hancuch KN, Johnson W, Wiliams TM, Rebeiz M (2014) The expansion of genome evolution in the Drosophila virilis species group: Duplications of the alcohol dehydrogenase gene. Mol Biol Evol 13(1):132–149. body coloration involves coordinated evolution in cis and trans within the pigmen- – 10. Kim J, et al. (1996) Integration of positional signals and regulation of wing formation tation regulatory network of Drosophila prostipennis. Dev Biol 392(2):431 440. and identity by Drosophila vestigial gene. Nature 382(6587):133–138. 23. Ashburner M (1989) Drosophila: A Laboratory Manual (Cold Spring Harbor Lab Press, 11. Mercot H, Defaye D, Capy P, Pla E, David JR (1994) Alcohol tolerance, ADH activity, Cold Spring Harbor, NY), pp 317–318. and ecological niche of Drosophila species. Evolution 48(3):746–757. 24. Bustin SA, et al. (2009) The MIQE guidelines: Minimum information for publication of 12. Qian W, Liao B-Y, Chang AY-F, Zhang J (2010) Maintenance of duplicate genes and quantitative real-time PCR experiments. Clin Chem 55(4):611–622. their functional redundancy by reduced expression. Trends Genet 26(10):425–430. 25. Galecki A, Burzykowski T (2013) Linear Mixed-Effects Models Using R: A Step-By-Step 13. Sturtevant AH (1925) The effects of at the Bar locus in Drosophila. Approach (Springer, New York), pp 478–480. Genetics 10(2):117–147. 26. Wickham H (2009) ggplot2: Elegant Graphics for Data Analysis (Springer, New York).

5992 | www.pnas.org/cgi/doi/10.1073/pnas.1605886113 Loehlin and Carroll Downloaded by guest on October 2, 2021