Duplicative and conservative transpositions

of Larval serum 1 in the genus Drosophila

Josefa González, Ferran Casals and Alfredo Ruiz

Departament de Genética i de Microbiologia, Universitat Autónoma de Barcelona, 08193

Bellaterra (Barcelona), Spain.

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession nos. AY561258 and AY561259.

1 Running head: transpositions in Drosophila

Keywords: gene transposition, duplication, Lsp1 genes, Drosophila, genome evolution.

Corresponding autor: Dr. Alfredo Ruiz, Departament de Genètica i de Microbiologia,

Facultat de Ciències-Edifici C, Universitat Autònoma de Barcelona, 08193 Bellaterra

(Barcelona), Spain.

Phone: 93-5812729; FAX: 93-5812387; E-mail: [email protected].

2 ABSTRACT

Interspecific comparative molecular analyses of transposed genes and their flanking regions can help to elucidate the time, direction and mechanism of gene transposition. In the

Drosophila melanogaster genome, three Larval serum protein 1 (Lsp1) genes (α, β and γ) are present and each of them is located on a different suggesting multiple transposition events. We have characterized the molecular organization of Lsp1 genes in D. buzzatii, a species of the Drosophila subgenus and in D. pseudoobscura, a species of the

Sophophora subgenus. Our results show that only two Lsp1 genes (β and γ) exist in these two species. The same chromosomal localization and genomic organization, different from that of

D. melanogaster, is found in both species for the Lsp1β and Lsp1γ genes. Overall, at least two duplicative and two conservative transpositions are necessary to explain the present chromosomal distribution of Lsp1 genes in the three Drosophila species. Clear evidence for an implication of snRNA genes in the transposition of Lsp1β in Drosophila has been found. We suggest that an ectopic exchange between highly similar snRNA sequences was responsible for the transposition of this gene. We have also identified the putative cis-acting regulatory regions of these genes which seemingly transposed along with the coding sequences.

3 INTRODUCTION

Sequence analysis of genomes has revealed that gene transposition contributes significantly to the reorganization of eukaryotic genomes. Gene transposition refers to the movement of relatively small genomic segments, containing one or a few genes, from one chromosomal position to another. This movement may be accompanied or not by the duplication of the genomic segment, two processes which may be denoted as duplicative and conservative transposition, respectively. In nematodes, gene transposition seems to be the most frequent kind of genome rearrangement (COGHLAN and WOLFE 2002) whereas duplications of chromosomal segments encompassing a few genes followed by differential gene loss is a common cause of gene order changes in yeasts (LLORENTE et al. 2000;

FISCHER et al. 2001). In plants, repeated rounds of large-scale genome duplication followed by selective gene loss are the main factors in genome evolution. Chromosomal rearrangements were thought to be only a minor factor in the divergence of plant genomes

(KU et al. 2000). However, when more detailed comparisons were performed many chromosomal rearrangements were found. For example, the comparison of genome sequences of rice to orthologous regions from other grass species revealed that numerous local rearrangements, including transpositions of single genes to different , have occurred (BENNETZEN and MA 2003). Transposition has also played a significant role in the evolution of the mammalian genome. Segmental duplications originated from the duplicative transposition of small portions of chromosomal material represent ∼5 % of the (LANDER et al. 2001; EICHLER 2001; BAILEY et al. 2002) and at least 1.2

% of the mouse genome (CHEUNG et al. 2003). Some of the duplicated segments in the human genome are associated with rapid gene innovation and chromosomal rearrangement in the genomes of man and the great apes (SAMONTE and EICHLER 2001; LOCKE et al.

2003; ARMENGOL et al. 2003).

4 In Drosophila, segmental duplications seem to be rare in comparison to the number found in mammalian genomes (LANDER et al. 2001; CELNIKER et al. 2002). Also, detailed analyses by in situ hybridization show that the gene content of chromosomal elements is generally conserved and suggest that gene transpositions are relatively scarce (GONZÁLEZ et al. 2002; RANZ et al. 2003) in relation to paracentric inversions, which have been traditionally considered as the chief type of chromosomal change (KRIMBAS and POWELL

1992; POWELL 1997). However, recent sequence analyses comparing five different

Drosophila species point to similar numbers of inversions and gene transpositions

(BERGMAN et al. 2002) and a number of new genes originated by retroposition has been unveiled (BETRÁN et al. 2002).

Diverse molecular mechanisms (sometimes poorly understood) may be responsible for gene transposition events. A common mechanism for gene transposition is retroposition, which implies reverse transcription of RNA and insertion of the resulting cDNA into a different genome site. In humans, the long interspersed element (LINE) L1 often associates 3’ flanking DNA as a read-through transcript and carries the non-L1 sequence to a new genomic location, a process termed L1-mediated transduction (MORAN et al. 1999; LANDER et al.

2001). Seemingly the LINE machinery can also act in trans to cellular RNA substrates giving rise to the trans-mobilization of genomic DNA, processed pseudogenes and occasionally new functional genes (ESNAULT et al. 2000; BETRÁN et al. 2002; EJIMA and YANG 2003;

LONG et al. 2003). Another mechanism of transposition is transposon-mediated excision- and-insertion of genomic segments. For instance, in D. melanogaster, Folback elements flanking relatively large genomic segments are able to transport these segments to sites far away in the genome forming the so-called “giant transposons” (CHIA et al. 1985;

LOVERING et al. 1991). Excision and insertion of these giant transposons is mediated by homologous recombination involving the Foldback sequences at the transposon termini.

5 Transposable elements seem to be implicated also in the origin of segmental duplications in humans. Duplication junctions have been found to be enriched for Alu short interspersed element (SINE) sequences with a significant proportion of all segmental duplications ending within Alu sequences (BAILEY et al. 2003). This observation suggests Alu-Alu homologous recombination as the most likely mechanism for these rearrangements. A similar mechanism has previously been shown to generate small duplications, deletions and inversions in diverse organisms.

Larval serum protein 1 (Lsp1) genes provide one of the few examples of gene transposition in the genus Drosophila. In D. melanogaster, each of the three Lsp1 genes is located on a different chromosome: Lsp1α in chromosome X, Lsp1β in chromosomal arm 2L and Lsp1γ in chromosomal arm 3L (ROBERTS and EVANS-ROBERTS 1979; SMITH et al.

1981). These chromosomal arms correspond to Muller’s elements A, B and D, respectively

(Table 1). BROCK and ROBERTS (1983) mapped Lsp1 genes by in situ hybridization in nine other species of the Sophophora subgenus (including D. pseudoobscura) and five species of the Drosophila subgenus (including D. hydei as a representative species of the repleta group).

In the melanogaster subgroup species they were able to localize the three Lsp1 genes: Lsp1α on element A, Lsp1β on element B and Lsp1γ on element D. In all the other species, both

Lsp1α and Lsp1β hybridized to the same polytene band of Muller’s element E suggesting a gene exchange between elements. No hybridization of the Lsp1γ gene was observed although

γ-like were detected with specific antibodies. To determine Lsp1 gene number,

BROCK and ROBERTS (1983) performed Southern analyses and concluded that at least two genes, one α-like and one β-like were present in all the species analyzed. Their data also suggested that the ancestor of the genus Drosophila probably had its Lsp1 genes on element E.

Recently, we localized by in situ hybridization Lsp1α in chromosome 2 (Muller’s element E)

6 of D. repleta and D. buzzatii (GONZÁLEZ et al. 2002, RANZ et al. 2003) corroborating their results.

In this work, Lsp1 genes and their flanking sequences have been cloned and sequenced in D. buzzatii, a species belonging to the repleta group of the Drosophila subgenus which diverged from D. melanogaster 40-62 myr ago (BEVERLY and WILSON 1984; RUSSO et al. 1995). In addition, the genome sequence of D. pseudoobscura (available at www.hgsc.bcm.tmc.edu) has been searched and Lsp1 genes annotated in this species which diverged from D. melanogaster ∼30 myr ago (THROCKMORTON 1975) and belongs to the same subgenus. The aims of this study are: (i) to determine beyond doubt the number and localization of Lsp1 genes in these two Drosophila species because Southern analyses and in situ hybridization results for members of gene families may be misleading (BACHTROG and

CHARLERSWORTH 2003); (ii) to ascertain the number and type of transposition events undergone by Lsp1 genes during the evolution of the genus Drosophila; (iii) to uncover the molecular mechanism of transposition, in particular to test the hypothesis of an involvement of transposable elements; and (iv) to identify putative regulatory sequences of Lsp1 genes and determine whether these regulatory sequences transposed along with the coding sequences or were recruited ex novo at the new chromosomal location.

MATERIALS AND METHODS

Drosophila stocks: Two lines of D. buzzatii were used: line j-19 from Ticucho (Argentina) is fixed for chromosomal arrangement 2j and line jq7-4 from Otamendi (Argentina) is fixed for arrangement 2jq7. The genome sequence of D. pseudoobscura comes from inbred line MV2-

25 (www.hgsc.bcm.tmc.edu).

7 Screening of genomic libraries: Two different lambda genomic libraries were screened by plaque hybridization: the j-19 library (CÁCERES et al. 2001) and the jq7-4 library (CASALS et al. 2003). Both libraries were previously amplified as described in SAMBROOK et al.

(1989). DNA from positive phages was digested and the resulting fragments were subcloned into Bluescript II SK vector (Stratagene) after gel purification. The j-19 library was screened with a 0.7 kb BamHI fragment of the D. melanogaster Lsp1α gene (BROCK and ROBERTS

1983). Six positive phages were recovered and one of them, λj-19/8, was partially sequenced and found to contain the 5’ region of Lsp1β (Figure 1a). Another lambda phage, λj-19/25, containing the 3’ region of this gene had been previously isolated in our laboratory (CASALS et al. 2003). None of the six positive phages contained Lsp1γ. In order to clone this gene the j-

19 library was screened with a 0.6 kb HindIII-SalI fragment of D. buzzatii Lsp1β (Figure 1a).

Nine positive phages were identified but again none of them included Lsp1γ. To clone D. buzzatii Lsp1γ, a different library (jq7-4) was screened with the same fragment of D. melanogaster Lsp1α previously used to screen the j-19 library. Five positive phages were identified and one of them, λjq7-4/12, contained the 3’ end of the Lsp1γ gene. A 0.75 kb SalI-

ClaI fragment of D. buzzatii Lsp1γ was used to screen the same library in order to clone the other end of the gene (Figure 2a). Seventeen positive phages were recovered and one of them,

λjq7-4/27, included almost the entire Lsp1γ gene.

Southern analysis: Southern hybridization was performed as described in SAMBROOK et al. (1989). Probes were labeled by random primer with digoxygenin-11-dUTP and hybridization was carried out overnight in standard buffer with 50% formamide at 42º for homologous probes and at 37º for heterologous probes. Stringency washes were performed with 2X SSC 0.1% SDS and 0.1X SSC 0.1% SDS solutions at 68º and 50º for homologous and heterologous hybridizations, respectively.

8

DNA sequencing and sequence analysis: Sequences were obtained with an ABI 373 A

(Perkin-Elmer) automated DNA sequencer using M13 universal forward and reverse primers.

A few internal primers were also designed when necessary. Nucleotide sequences were assembled using GeneToolLite software (BioTools). Similarity searches in the

GenBank/EMBL and in the Drosophila pseudoobscura genome project (available at www.hgsc.bcm.tmc.edu) databases were performed using blastn and fasta3 programs.

Multiple sequence alignments were obtained with ClustalW (THOMPSON et al. 1994) and

DiAlign (MORGENSTERN 1999). PAML software (YANG 1997) was used to estimate the number of synonymous (dS) and non-synonymous (dN) substitutions per site. This software avoids using reconstructed ancestral sequences to estimate dS and dN for lineages in a phylogeny by using a maximum likelihood approach. Different codon-based likelihood models that allow for different dN/dS ratios among evolutionary lineages can be devised. The models can then be compared to test the neutral prediction that the dN/dS ratio is identical among lineages (YANG AND BIELAWSKI 2000).

In situ hybridization: In situ hybridization of DNA probes was carried out as described in

MONTGOMERY et al. (1987). Hybridization temperature was 37º. Probes were labelled with biotin-16-dUTP by nick translation and detection was done using the ABC-Elite kit from

Vector Laboratories. Hybridization signals were localized using the cytological maps of D. buzzatii (RUIZ and WASSERMAN 1993) and photographs were taken with a phase contrast

Nikon Optiphot-2 microscope at 600X magnification.

9 RESULTS

Lsp1 gene number: Screening of two genomic libraries of D. buzzatii with different probes allowed us to isolate clones containing the Lsp1β and Lsp1γ genes (see MATERIALS and

METHODS). In order to determine the number of Lsp1 genes present in the D. buzzatii genome, a Southern analysis was performed (Figure S1 available as online supplementary material at http://www.genetics.org/supplemental/). Genomic DNA from D. buzzatii and D. melanogaster (as control) was digested with EcoRI and HindIII and hybridized with the same three probes used to screen the D. buzzatii libraries (see MATERIAL and METHODS). The

D. melanogaster Lsp1α probe hybridized to 6-kb and 4.1-kb fragments of D. melanogaster genomic DNA corresponding to Lsp1α and Lsp1β, respectively (BROCK and ROBERTS

1983). In D. buzzatii this probe hybridized to a single fragment of 3.1 kb corresponding to

Lsp1β. The D. buzzatii Lsp1β probe hybridized to 6kb and 4.1-kb fragment of D. melanogaster corresponding to Lsp1α and Lsp1β, respectively (BROCK and ROBERTS

1983) and a 3.1-kb fragment of D. buzzatii genomic DNA corresponding to Lsp1β. Finally, the D. buzzatii Lsp1γ probe hybridized to the 4.1-kb fragment corresponding to Lsp1β, and to

1.8-kb and 1.4-kb fragments both corresponding to Lsp1γ of D. melanogaster (BROCK and

ROBERTS 1983) and also with a 3.1-kb fragment corresponding to Lsp1β and a 1-kb fragment corresponding to Lsp1γ of D. buzzatii. Overall, the results of Southern analyses indicated that only two genes, Lsp1β and Lsp1γ, are present in the D. buzzatii genome. No

Lsp1α gene was detected by Southern analysis in the D. buzzatii genome.

Similarity searches against the D. pseudoobscura genome sequence database were carried out (Table 2). Two sequences with significant similarity were found with Lsp1β. The most similar sequence, that included in Contig 815_Contig 5737, was considered to correspond to the D. pseudoobscura Lsp1β ortholog. When the database was searched with

10 the coding sequence of Lsp1γ as query the same two hits were recovered. This time the most similar sequence included in Contig 1500_Contig 3546 was considered to correspond to the

D. pseudoobscura ortholog of Lsp1γ. No new sequences were identified when D. melanogaster Lsp1α sequence was used as query, the only significant hits being those previously identified as Lsp1β and Lsp1γ. We conclude that only Lsp1β and Lsp1γ are present in the D. pseudoobscura genome.

Organization of the Lsp1β and Lsp1γ genomic regions in D. buzzatii and D. pseudoobscura: In D. buzzatii, the Lsp1β gene was localized by in situ hybridization, using the λj-19/25 phage as probe to polytene band D3c of chromosome 2 (Figure S2 available as online supplementary material at http://www.genetics.org/supplemental/). This chromosome is homologous to chromosomal arm 3R of D. melanogaster (Muller’s element E; Table 1).

Overall, 9,464 bp including the entire D. buzzatii Lsp1β gene and its flanking regions were sequenced (accesion number: AY561258). In this species, Lsp1β is flanked by CG12492 and

Pellino (Figure 1a). Between Lsp1β and Pellino there is a snRNA:U1 gene, orthologous to snRNA:U1:95Ca of D. melanogaster.

In D. pseudoobscura, the Lsp1β gene is included within Contig 815_Contig 5737 which has been putatively assigned to cromosome XR, homologous to D. melanogaster chromosomal arm 3L (Muller’s element D; Table 1). As in D. buzzatii, Lsp1β is flanked by

CG12492 and Pellino but in D. pseudoobscura there are three snRNA:U1 genes between

Lsp1β and Pellino (Figure 1b). In D. melanogaster, CG12492, snRNA:U1:95Ca and Pellino are localized in chromosomal arm 3R (Figure 1c) while Lsp1β is located at 21D-E in chromosomal arm 2L (Figure 1d). The alignments of D. pseudoobscura and D. melanogaster genome sequences, available at http://pipeline.lbl.gov/pseudo/, show that Lsp1β and its flanking genes are in fact included in Contig 7891_Contig 7492 assigned to chromosome 2

11 (Muller’s element E; Table 1). We concluded that this must be the correct localization based on the fact that CG12492 and Pellino are found on element E in both D. melanogaster and D. buzzatii.

In situ hybridization of the λjq7-4/27 phage, containing the entire Lsp1γ coding sequence, to the polytene chromosomes of D. buzzatti allowed us to map this gene to band

C2g of chromosome 3 (Figure S2). This chromosome is homologous to chromosomal arm 2L of D. melanogaster (Muller’s element B; Table 1). In total, 10,817 nucleotides including the entire Lsp1γ gene and its flanking regions have been sequenced in D. buzzatii (accesion number: AY561259). Upstream of Lsp1γ, 4.5 kb from the ATG codon, there is a Leucyl transfer RNA (Leu-tRNA) gene (Figure 2a).

In D. pseudoobscura, Lsp1γ is found within Contig 1500_Contig 3546 which belongs to chromosome 4, homologous to chromosomal arm 2L of D. melanogaster (Muller’s element

B; Table 1). Analysis of the D. pseudoobscura genomic sequence revealed that 4.9 kb upstream from Lsp1γ there is an Aspartic acid transfer RNA (Asp-tRNA). Both Lsp1γ and Asp- tRNA are nested inside the first intron of the gene Sema-1a (Figure 2b). In both D. pseudoobscura and D. buzzatii, we detected eight short sequences, 28 to 59 nucleotides long, scattered along the 5-kb region upstream of the Lsp1γ coding sequence, that are highly conserved in D. melanogaster (86-97% nucleotide identity). Another four short sequences, 41 to 55 nucleotides long, also highly conserved between the three species (91-98% of nucleotide identity) were found downstream of Lsp1γ in D. pseudoobscura and D. buzzatii (Figure 2).

The conservation in number and relative position of these 12 highly conserved non-coding sequences flanking Lsp1γ in both D. buzzatii and D. pseudoobscura led us to conclude that the molecular organization of this gene region is the same in the two species, i.e., Lsp1γ is nested within an intron of Sema-1a in both cases. On the other hand, in D. melanogaster

(Figure 2c, d), the Sema-1a gene is located in band 29E1-3 on chromosomal arm 2L (Muller’s

12 element B; Table 1) and inside its first intron there is also an Asp-tRNA gene (Figure 2c) but not the Lsp1γ gene which is located on chromosomal arm 3L (Muller’s element D; Table 1).

Molecular structure of Lsp1β and Lsp1γ genes in D. buzzatii and D. pseudoobscura: In the two species, both Lsp1β and Lsp1γ are made up of two exons separated by a small intron

(Table 3). The alignment of D. buzzatii, D. pseudoobscura and D. melanogaster nucleotide sequences shows a 79.4% nucleotide identity for Lsp1β and 75.7% nucleotide identity for

Lsp1γ. For both genes, the intron is placed in the same precise site in the three species and has a similar length (63-71 nucleotides for Lsp1β and 57-65 nucleotides for Lsp1γ) but shows

A T little nucleotide conservation other than the donor (G·T· /G·A·G· /C) and acceptor (C·A·G) splice sites (DELANEY et al. 1986).

Both Lsp1β and Lsp1γ possess a TATA box which is localized in D. buzzatii and D. pseudoobscura at the same nucleotide position as in D. melanogaster (-32 to -26). Sequence similarity extends for several nucleotides on either side of the TATA box (Figure 3). The 5’

UTR of D. buzzatii Lsp1β has 89 bp, the first 21 nucleotides being identical to those in D. melanogaster and D. pseudoobscura. The 5’ UTR of D. buzzatii Lsp1γ is 68-bp long and 17 of the first 21 nucleotides are identical to those in D. melanogaster and D. pseudoobscura. In both cases, another block with significant sequence similarity is found further downstream

(Figure 3). A polyadenylation signal (AATAAA) is located about 100 bp downstream from the stop codon in Lspβ and Lspγ. In Lspβ, but not in Lspγ, the sequence around this signal is highly conserved in the three species (Table 4a). Apart from this, no other conserved sequence have been found in the 3’ region of Lsp1 genes.

We searched for putative regulatory sequences in the 5’ regions of Lsp1 genes in D. buzzatii and D. pseudoobscura following the criteria devised by BERGMAN and

KREITMAN (2001). Comparison of the 5’ ends of Lsp1β led to the identification of three

13 conserved sequences starting at sites –126, -189, and -683 of D. buzzatii (Table 4b, c, d).

Another three conserved sequences starting at sites –71, -181 and –379 of D. buzzatii were found in the 5’ region of Lsp1γ (Table 4e, f, g).

Molecular evolution of the Lsp1 genes: The coding sequence of Lsp1β and Lsp1γ from D. buzzatii, D. melanogaster and D. pseudoobscura, and that of Lsp1α from D. melanogaster were aligned using ClustalW. The number of synonymous (dS) and non-synonymous (dN) substitutions per site for pairwise comparisons between the three species were estimated using maximum likelihood methods (YANG 1997). The dN/dS ratios for Lsp1β (0.0226-0.0449) are similar to those for Lsp1γ (0.0209-0.0512). Comparisons between Lsp1α of D. melanogaster with Lsp1β (0.0470-0.0545) and Lsp1γ (0.0367-0.0386) of the other two species also yielded similar results (Table 5). Overall, the dN/dS ratios were low suggesting a relatively high degree of functional constraint of these genes in the three species analyzed.

Two different methods, Neighbour-Joining and UPGMA, were used to construct phylogenetic trees using PHYLIP software (FELSENSTEIN 1989). In addition to the seven above mentioned sequences, we included in the trees those of arylphorin, a Lsp1-like gene of

Calliphora vicina (NAUMANN and SCHELLER 1991), and Lsp2 of D. melanogaster

(ADAMS et al. 2000). Alternative models for the evolution of the Lsp genes were then tested using maximum likelihood methods (YANG 1997; BIELAWSKI AND YANG 2003). First we tested for the constancy of evolution rates by comparing both trees: that produced with the

UPGMA method, which assumes a molecular clock, and that built with the Neighbour-

Joining method, assuming no clock. In both cases a single dN/dS ratio for all lineages was considered. The difference between the likelihood of both trees was significant (2∆l = 32.64; df = 7; P<0.005) indicating that the model assuming no clock provides a significantly better fit to the data (Figure 4). We then tested for homogeneity in the dN/dS ratio between lineages

14 by comparing the model assuming no clock and a single dN/dS ratio with a free-ratio model which assumes an independent dN/dS ratio for each lineage. The result was significant (2∆l =

83.18; df = 14; P<0.0001) indicating that the dN/dS ratios are heterogeneous among lineages.

Finally we compared a model assuming several dN/dS ratios (one background ratio and one ratio for each of the lineages leading to the Lsp1 genes in D. melanogaster) with the single ratio model. Again the difference was statistically significant (2∆l = 21.86; df = 3; p<0.005) indicating a better fit to the data for the model assuming several dN/dS ratios.

DISCUSSION

Lsp1 gene number: In D. melanogaster the LSP-1 protein is made up of three subunits encoded by three Lsp1 genes: Lsp1α, Lsp1β and Lsp1γ (ROBERTS and EVANS-ROBERTS

1979). However, in D. buzzatii, a species of the Drosophila subgenus, and in D. pseudoobscura, a species of the Sophophora subgenus, only two Lsp1 genes seem to be present: Lsp1β and Lsp1γ. Two different genomic libraries of D. buzzatii have been screened using as probes a fragment of D. melanogaster Lsp1α and a fragment of D. buzzatii Lsp1β or

Lsp1γ. Overall, four different library screenings have been carried out and in every case all positive clones contained either Lsp1β or Lsp1γ. No Lsp1α gene was found. To corroborate this result genomic DNA was digested with restriction enzymes and hybridized with the same three probes used to screen the libraries and again the results where in agreement with the existence of only two genes, Lsp1β and Lsp1γ, in the D. buzzatii genome. Besides, the D. pseudoobscura genome has been recently sequenced to approximately 7-fold coverage and it is available at http://www.hgsc.bcm.tmc.edu. Similarity searches against this database have been performed allowing us to annotate Lsp1β and Lsp1γ but not Lsp1α. Thus, our results indicate that Lsp1α is not present in D. buzzatii or D. pseudoobscura. The most parsimonious

15 explanation for this observation is that the duplicative transposition which gave rise to Lsp1α took place in the lineage leading to D. melanogaster after the divergence of the D. pseudoobscura lineage ∼30 myr ago (but see below). The fact that D. melanogaster Lsp1α is not dosage compensated although it is X-linked is in agreement with a recent duplicative transposition onto this chromosome (ROBERTS and EVAN-ROBERTS 1979).

BROCK and ROBERTS (1983) reported that in eight species of the Drosophila and

Sophophora subgenera Lsp1α and Lsp1β map to the same polytene band in chromosomal element E except in D. pseudoobscura where they observed an extra signal in element B (see

Table 1). They concluded that there were at least two genes, one α-like and one β-like, given that Lsp1γ could not be localized by in situ hybridization. Contrasting results have been obtained in this work. We have unambiguously localized Lsp1β to chromosomal element E and Lsp1γ gene to element B in both D. buzzatii and D. pseudoobscura. According to our results the extra signal in D. pseudoobscura element B likely corresponds to Lsp1γ. A plausible explanation for their results is that Lsp1α and Lsp1β probes were cross-hybridizing to Lsp1β as is suggested by the fact that the signals obtained with the Lsp1β probe were stronger than those obtained with the Lsp1α probes (BROCK and ROBERTS 1983).

Likewise, our previous in situ hybridization of D. melanogaster Lsp1α to D. buzzatii (and D. repleta) chromosomes (GONZÁLEZ et al. 2002; RANZ et al. 2003) must be re-interpreted as due to cross-hybridization with Lsp1β (Figure S2).

Larval serum proteins (LSP) belong to the hemocyanin superfamily and are thought to act as storage proteins that provide amino acids and energy during non-feeding periods of immature or adult development (BURMESTER et al. 1998). In D. melanogaster LSP-1 is a heterohexamer of randomly associated subunits encoded by Lsp1α, Lsp1β and Lsp1γ genes.

The lack of one subunit in D. buzzatii and D. pseudoobscura does not imply that the LSP-1

16 protein will not be functional. In fact, it has been reported that an inbred stock of D. melanogaster lacking the γ chain is viable under laboratory conditions suggesting that a subunit specific function for the LSP-1 monomers does not exist (BROCK and ROBERTS

1980). The polypeptides encoded by Lsp1β and Lsp1γ of D. melanogaster, D. pseudoobscura and D. buzzatii and by Lsp1α of D. melanogaster were aligned with the ClustalW algorithm

(Figure S3 available as online supplementary material at http://www.genetics.org/supplemental/). These seven sequences have a similar length (772-

789aa), 49% of the amino acids are identical and 33% of the amino acid substitutions are conservative or semi-conservative. However, different subunits can accumulate different specific amino acids and this could allow the organism to modulate the availability of these amino acids in different developmental processes (MASSEY et al. 1997). The three subunits of the LSP-1 protein of D. melanogaster, as well as similar proteins in other Diptera, are enriched in aromatic amino acids (BURMESTER et al. 1998). Aromatic residues are thought to serve as precursors for quinones, which play a role in cuticle hardening during metamorphosis (BURMESTER et al. 1998). The polypeptides coded by Lsp1α and Lsp1β are also enriched in methionine but not that encoded by Lsp1γ. The same pattern is observed in D. pseudoobscura and D. buzzatii. The role of methionine in Drosophila development is not clear (MASSEY et al. 1997). Gene duplication is considered a major force in gene family expansion and gene innovation. After duplication, one copy may become silenced

(nonfunctionalization), assume a novel function (neofunctionalization) or both copies may split the multiple functions of the ancestral gene (subfunctionalization) (LYNCH and

CONERY 2000). Gene amplification of highly expressed functions often lead to highly conserved paralogs in microbial genomes (HOOPER and BERG 2003). The latter seems to be the case of the Lsp1 genes in Drosophila. The protein coded by these genes is accumulated to

17 high levels by feeding larvae (MASSEY et al. 1997) and gene amplification have not resulted in different discernible functions.

Genome organization and number of transposition events: In both D. buzzatii and D. pseudoobscura, Lsp1β is flanked by CG12492 and Pellino genes on chromosomal element E

(Figure 1a and 1b). The homologous region in D. melanogaster, where CG12492 and Pellino are found, is also located on element E. This region shows no sequence similarity to any Lsp1 gene whatsoever (Figure 1c). In D. melanogaster, Lsp1β is located on element B (Figure 1d).

In the three species a snRNA:U1 is closely linked to the Lsp1β gene. Given that D. buzzatii and D. pseudoobscura, which belong to two different subgenera separated by 80-124 myr

(BEVERLY and WILSON 1984; SPICER 1988), show the same genomic organization for

Lsp1β, this is likely to be the ancestral organization of the genus and therefore Lsp1β would have conservatively transposed from element E to element B in the lineage leading to D. melanogaster. The maximum size of the transposed region is 5.7 kb and the only protein- coding gene included is Lsp1β.

In D. buzzatii and D. pseudoobscura, Lsp1γ is flanked by a tRNA and both genes are located inside the first intron of Sema-1a on element B (Figure 2a and 2b). In D. melanogaster, Sema-1a is located on element B while Lsp1γ is located on element D (Figure

2c and 2d). Being the genomic organization of Lsp1γ in D. buzzatii the same as in D. pseudoobscura, this is likely to be the ancestral organization of the genus and thus the Lsp1γ gene would have conservatively transposed from element B to element D in the lineage leading to D. melanogaster. The maximum size of the transposed region is 5.9 kb and includes the putative regulatory regions located upstream of the gene. Another transposition event where the nested organization was the ancestral one had been previously reported in

Drosophila (NEUFELD et al. 1991).

18 Given that Lsp1β and Lsp1γ are present in the three species studied in different chromosomes, the duplicative transposition originating these two genes must have occurred before the divergence of these species (40-62 myr ago). The chromosomal localization of the ancestral Lsp1 gene in the genus Drosophila is unknown. There is no reason to believe that this chromosome was element E as suggested by BROCK and ROBERTS (1983) because in both D. buzzatii and D. pseudoobscura Lsp1β is located on element E and Lsp1γ is located on element B. Therefore, the localization of the ancestral Lsp1 gene could be any of these two elements. In any case, to explain the current localization of these two genes in D. melanogaster at least two conservative transpositions are needed (see above). Another duplicative transposition gave rise to D. melanogaster Lsp1α. The absence of this gene in D. pseudoobscura indicates that this transposition likely occurred after the divergence between

D. melanogaster and D. pseudoobscura. However, the Neighbour-Joining tree of Lsp1 sequences (Figure 4) suggests that this transposition occurred before the divergence between

D. melanogaster and D. pseudoobscura. If this were the case, this gene would have been lost in the D. pseudoobscura lineage. Overall, at least two duplicative and two conservative transpositions are needed to explain the present localization of Lsp1 genes (Figure 5).

Mechanism of transposition: As stated before, the position of the unique intron of Lsp1β and Lsp1γ and the 5’ putative regulatory sequences are conserved in the three species analyzed. This allows us to rule out retroposition (BETRÁN et al. 2002) as the mechanism of transposition. The analysis of the flanking regions of both genes provides no evidence that transposition had been mediated by transposable elements (TEs). A single TE copy was found in the original position of Lsp1β in D. melanogaster (Figure 1c) but there are no indications of its involvement in the transposition. However, taking into account the divergence time between the species analyzed, mobile elements could have played a role in the origin of the

19 transposition and then be lost by deletion or excision. The most striking feature of Lsp1 genes in the three species analyzed is that they are very close to snRNA or tRNA genes. Both are repetitive genes that are scattered in the genome. However, the probability that Lsp1 genes are close to snRNA or tRNA genes just by chance seems very low. A recent review of D. melanogaster genome sequence-Release 3 has found 290 tRNA genes and only 28 snRNAs in the euchromatin (MISRA et al. 2002), i.e. about one tRNA gene per 0.4 Mb and one snRNA gene per 4.3 Mb of euchromatic DNA. Repeated genes, e.g. tRNA and ribosomal protein genes, have been previously implied in the origin of chromosomal rearrangements in yeasts

(SZANKASI et al. 1986; KELLIS et al. 2003).

We propose that an ectopic exchange between snRNA:U1 sequences mediated the

Lsp1β transposition in Drosophila. Three observations indicate that snRNA:U1 genes are implied in this transposition. First, they are present in the original location of chromosomal element E as well as in the destination site of element B. Second, snRNA:U1 genes present a high level of nucleotide identity among the three species (98%) suggesting that they might act as substrates for ectopic exchanges. Finally, the snRNA:U1 gene represents the dowstream boundary of the transposed chromosomal segment (Figure 1) which would be expected if ectopic recombination actually took place. To our knowledge, snRNA genes have not been previously implicated in the generation of rearrangements. Nevertheless, there is no reason why they could not act as substrates for ectopic recombination in a similar manner to tRNAs in yeasts or Alu sequences in humans. As a matter of fact, the mechanism for gene transposition in Drosophila could be quite similar to that originating segmental duplications in humans (see Figure 6 in BAILEY et al. 2003).

The evidence for the implication of tRNA genes in the transposition of Lsp1γ is weak.

The tRNA genes found near Lsp1γ in D. buzzatii and D. pseudoobscura belong to a different isoacceptor type and no tRNA gene was found near Lsp1γ in D. melanogaster. In addition

20 these tRNAs lie outside the transposed chromosomal segment. Therefore, an involvement of tRNA genes in the transposition of Lsp1γ is unlikely and the mechanism remains in this case uncertain.

5’ noncoding conserved sequences: Transposition implies the localization of the transposed gene in a new genomic environment. The probability of success of a transposition should be higher when it includes the regulatory regions than when it does not. In the latter case the transposed gene would need to recruit new regulatory regions in order to be functional

(BETRÁN et al. 2002). Comparative sequence analysis allows the identification of conserved

DNA sequences in noncoding regions which are considered putative cis-acting regulatory elements (BERGMAN and KREITMAN 2002). The comparison of the 5’ ends of Lsp1α,

Lsp1β and Lsp1γ genes of D. melanogaster led to the identification of two such conserved sequences (DELANEY et al. 1986). These sequences are also conserved in Lsp1β (Table 4b and 4c) and Lsp1γ (Table 4e and 4f) in the three species analyzed. These two regions are more conserved between Lsp1α and Lsp1β than between Lsp1β and Lsp1γ which is in agreement with the origin of Lsp1α from a duplication of Lsp1β (SMITH et al. 1981). Two conserved sequences not previously described in D. melanogaster have been also identified. One is exclusive of Lsp1β and the other exclusive of Lsp1γ (Table 4d and 4g) and both are highly conserved in the three species analyzed. All conserved sequences except that located at -181 of D. buzzatii Lsp1γ fulfil the requirements used by BERGMAN and KREITMAN (2001) to identify noncoding conserved blocks. The mean size of these five blocks is 13 bp, similar to the modal size of 11 bp reported by these authors, and three of them are included in the conserved blocks that they have described. They follow the pattern described for cis- regulatory elements in Drosophila, i.e. highly conserved sequences separated by unalignable gaps (BERGMAN and KREITMAN 2001).

21 The 5’ non-translated regions of the RNA show two highly conserved sequences common to the three Lsp1 genes in the three species analyzed (Figure 3). Although the significance of these homologies in the 5’ UTR is not clear, it has been suggested that the synthesis of LSP-1 protein may be under translational control (POWELL et al. 1984). All the signals necessary for the correct tissue and temporal specificity expression of Lsp1α and Lsp1β in D. melanogaster lie 1,650 and 2,250 bp upstream from these two genes respectively (JOWETT

1985; DAVIES et al. 1986). More than 2 kb upstream from Lsp1β and Lsp1γ genes have been analyzed in this work and we are quite confident that the majority if not all the cis-acting regulatory sequences have been identified. The fact that the transposition of Lsp1 genes included not only the genes but also the regulatory regions have probably played a role in the success of these transpositions.

Molecular evolution: Maximum likelihood methods of phylogenetic inference were used to test alternative models for the evolution of Lsp1 genes (YANG 1997). Figure 4 shows the phylogenetic tree that provides a significantly better fit to the data. The tree is in agreement with the appearance of Lsp1α from a duplication of Lsp1β in the Sophophora subgenus.

The ratio of synonymous (silent; dS) to non-synonymous (amino acid-changing; dN) substitution rates is a measure of selective pressure on a protein: if amino acid changes are neutral dN/dS = 1, if they are mostly deleterious dN/dS < 1 and if they offer a selective advantage dN/dS > 1 (YANG AND BIELAWSKI 2000; BIELAWSKI AND YANG 2003). A model assuming free dN/dS ratios for different lineages provided a significant better fit to the data than a model with a single ratio for all lineages. This led us to test if a model assuming four different dN/dS ratios, one background ratio and one ratio for each of the three lineages leading to D. melanogaster Lsp1 genes, was significantly different from the single ratio model. According to our observations is in these three lineages where the Lsp1 genes have

22 transposed and we wanted to test if the change of chromosomal location has affected the molecular evolution of these genes. Again the results were significant indicating that the model assuming four different dN/dS ratios fits better to the data. The estimates obtained for the dN/dS ratios in this model were 0.0518 for the background ratio, and 0.0769, 0.0598 and

0.0231 for the lineages leading to Lsp1α, Lsp1β and Lsp1γ of D. melanogaster, respectively.

These results suggest that changes in the selection regime (degree of functional constraint) are associated with Lsp1 transpositions. However dN/dS ratios do not show a consistent pattern being higher than the background ratio for Lsp1α and lower than the background ratio for

Lsp1γ. In addition, the background ratio is itself heterogeneous indicating that other factors besides transposition have an effect on the molecular evolution of Lsp1 genes.

ACKNOWLEDGMENTS

We thank L. Sánchez (Centro de Investigaciones Biológicas, CSIC, Madrid) for providing the

D. melanogaster Lsp1α clone and E. Hasson (Universidad de Buenos Aires) for D. buzzatii stocks. We also thank B. Negre for technical advice and helpful discussion. This work was supported by grant BMC2002-01708 from the Dirección General de Investigación (Ministerio de Ciencia y Tecnología, Spain) awarded to A.R.

LITERATURE CITED

ADAMS, M. D., S. E. CELNIKER, R. A. HOLT, C. A: EVANS, J. D. GOCAYNE et al.,

2000 The genome sequence of Drosophila melanogaster. Science 287: 2185-2195.

ARMENGOL, L., M. A. PUJANA, J. CHEUNG, S. W. SCHERER, and X. ESTIVILL, 2003

Enrichment of segmental duplications in regions of breaks of synteny between the

23 human and mouse genomes suggests their involvement in evolutionary

rearrangements. Hum. Mol. Genet. 12: 2201-2208.

BACHTROG, D., AND B. CHARLESWORTH, 2003 On the genomic location of the

exuperantia1 gene in Drosophila miranda: the limits of in situ hybridization

experiments. Genetics 164: 1237-40.

BAILEY, J. A., Z. GU, R. A. CLARK, K. REINERT, R. V. SAMONTE et al., 2002 Recent

segmental duplications in the human genome. Science 297: 1003-1007.

BAILEY, J.A., G. LIU, and E. E. EICHLER, 2003 An Alu transposition model for the origin

and expansion of human segmental duplications. Am. J. Hum. Genet. 73: 823-834.

BENNETZEN, J. L., AND J. MA, 2003 The genetic colinearity of rice and other cereals on

the basis of genomic sequence analysis. Curr. Opin. Plant Biol. 6: 128-133.

BERGMAN, C. M., AND M. KREITMAN, 2001 Analysis of conserved noncoding DNA in

Drosophila reveals similar constraints in intergenic and intronic sequences. Genome

Res. 11: 1335-1345.

BERGMAN, C. M., B. D. PFEIFFER, D. E. RINCON-LIMAS, R. A. HOSKINS, A.

GNIRKE et al., 2002 Assessing the impact of comparative genomic sequence data on

the functional annotation of the Drosophila genome. Genome Biol. 3:

RESEARCH0086-0086.

BETRÁN, E., K. THORNTON and M. LONG, 2002 Retroposed new genes out of the X in

Drosophila. Genome Res. 12: 1854-1859.

BEVERLEY, S. M., AND A. C. WILSON, 1984 Molecular evolution in Drosophila and

higher Dipterans. II. A time scale for fly evolution. J. Mol. Evol. 21: 1-12.

BIELAWSKI, J. P., AND Z. YANG, 2003 Maximum likelihood methods for detecting

adaptive evolution after gene duplication. J Struct Funct Genomics 3: 201-212.

24 BROCK, H. W., and D. B. ROBERTS, 1980 Comparison of the Larval serum proteins of

Drosophila melanogaster using one and two-dimensional peptide mapping. Eur. J. Biochem.

106: 129-135.

BROCK, H. W., AND D. B. ROBERTS, 1983 Location of the LSP-1 genes in Drosophila

species by in situ hybridization. Genetics 103: 75-92.

BURMESTER, T., H. C. MASSEY, JR., S. O. ZAKHARKIN and H. BENES, 1998 The

evolution of hexamerins and the phylogeny of insects. J. Mol. Evol. 47: 93-108.

CÁCERES, M., M. PUIG and A. RUIZ, 2001 Molecular characterization of two natural

hotspots in the Drosophila buzzatii genome induced by transposon insertions. Genome

Res. 11: 1353-1364.

CASALS, F., M. CACERES and A. RUIZ, 2003 The Foldback-like transposon Galileo is

involved in the generation of two different natural chromosomal inversions of

Drosophila buzzatii. Mol. Biol. Evol. 20: 674-685.

CELNIKER, S. E. et al. 2002 Finishing a whole-genome shotgun: Release 3 of the

Drosophila melanogaster euchromatin genome sequence. Genome Biology 3:

research0079.1-0079.14.

CHEUNG, J., M. D. WILSON, J. ZHANG, R. KHAJA, J. R. MACDONALD et al., 2003

Recent segmental duplications in the mouse genome. Genome Biology 4: R47.

CHIA, W., S. McGILL, R. KARP, D. GUBB, and M. ASHBURNER, 1985 Spontaneous

excision of a large composite transposable element of Drosophila melanogaster.

Nature 316: 81-83.

COGHLAN, A., and K. H. WOLFE, 2002 Fourfold faster rate of genome rearrangement in

nematodes than in Drosophila. Genome Res. 12: 857-867.

DAVIES, J. A., C. F. ADDISON, S. J. DELANEY, C. SUNKEL and D. M. GLOVER, 1986

Expression of the prokaryotic gene for chloramphenicol acetyl transferase in

25 Drosophila under the control of Larval serum protein 1 gene promoters. J. Mol. Biol.

189: 13-24.

DELANEY, S. J., D. F. SMITH, A. McCLELLAND, C. SUNKEL and D. M. GLOVER,

1986 Sequence conservation around the 5' ends of the Larval serum protein 1 genes

of Drosophila melanogaster. J. Mol. Biol. 189: 1-11.

EICHLER, E. E., 2001 Recent duplication, domain accretion and the dynamic mutation of

the human genome. Trends Genet. 17: 661-669.

EJIMA, Y., AND L. YANG, 2003 Trans mobilization of genomic DNA as a mechanism for

retrotransposon-mediated exon shuffling. Hum Mol Genet. 12: 1321-1328.

ESNAULT, C., J. MAESTRE AND T. HEIDMANN, 2000 Human LINE retrotransposons

generate processed pseudogenes. Nat Genet. 24: 363-367.

FELSENSTEIN, J., 1989 PHYLIP: phylogeny inference package (Version 3.2). Cladistics:

164-166.

FISCHER, G., C. NEUVÉGLISE, P. DURRENS, C. GAILLARDIN, and B. DUJON, 2001

Evolution of gene order in the genomes of two related yeast species. Genome Res. 11:

2009-2019.

GONZÁLEZ, J., J. M. RANZ and A. RUIZ, 2002 Chromosomal elements evolve at different

rates in the Drosophila genome. Genetics 161: 1137-1154.

HOOPER, S. D., AND O. G. BERG, 2003 On the nature of gene innovation: duplication

patterns in microbial genomes. Mol. Biol. Evol. 20: 945-954.

JOWETT, T., 1985 The regulatory domain of a larval serum protein gene in Drosophila

melanogaster. EMBO J. 4: 3789-3795.

KELLIS, M., N. PATTERSON, M. ENDRIZZI, B. BIRREN AND E. S. LANDER., 2003

Sequencing and comparison of yeast species to identify genes and regulatory

elements. Nature 423: 241-254.

26 KRIMBAS, C. B., AND J. R. POWELL., 1992. Drosophila Inversion Polymorphism. CRC

Press, Boca Raton, Fla.

KU, H., T. VISION, J. LIU, AND S. D. TANKSLEY., 2000 Comparing sequenced segments

of the tomato and Arabidopsis genomes: large-scale duplication followed by selective

gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA 97: 9121-9126.

LANDER, E. S., L. M. LINTON, B. BIRREN, C. NUSBAUM, M.C. ZODY et al., 2001

Initial sequencing and analysis of the human genome. Nature 409: 860-921.

LLORENTE, B., A. MALPERTUY, C. NEUVÉGLISE, J. De MONTIGNY, M. AIGLE et

al., 2000 Genomic exploration of the Hemiascomycetous yeasts: 18. Comparative

analysis of chromosome maps and synteny with Saccharomyces cerevisiae. FEBS

Lett. 47: 101-112.

LOCKE, D. P., N. ARCHIDIACONO, D. MISCEO, M. F. CARDONE, S. DESCHAMPS et

al., 2003 Refinement of a chimpanzee pericentric inversion breakpoint to a segmental

duplication cluster. Genome Biology 4: R50.

LONG, M., E. BETRÁN, K. THORNTON, AND W. WANG, 2003 The origin of new genes:

glimpses from the young and old. Nat. Rev. Genetics 4: 865-875.

LOVERING, R., N. HARDEN, AND M. ASHBURNER., 1991 The molecular structure of

TE146 and its derivatives in Drosophila melanogaster. Genetics 128: 357- 372.

LYNCH, M., AND J. S. CONERY, 2000 The evolutionary fate and consequences of

duplicate genes. Science 290: 1151-1155.

MASSEY, H. C., JR., J. KEJZLAROVA-LEPESANT, R. L. WILLIS, A. B.

CASTLEBERRY and H. BENES, 1997 The Drosophila Lsp-1 beta gene. A structural

and phylogenetic analysis. Eur. J. Biochem. 245: 199-207.

27 MISRA, S., M. A. CROSBY, C. J. MUNGALL, B. B. MATTHEWS, K. S. CAMPBELL et

al., 2002 Annotation of the Drosophila melanogaster euchromatic genome: a

systematic review. Genome Biol. 3: research0083.1-0083.22.

MONTGOMERY, E., B. CHARLESWORTH AND C. H. LANGLEY, 1987 A test for the

role of natural selection in the stabilization of transposable element copy number in a

population of Drosophila melanogaster. Genet Res. 49: 31-41.

MORAN J. V., DeBERARDINIS R. J., KAZAZIAN H. H. JR, 1999 Exon shuffling by L1

retrotransposition. Science 283: 1465-1467.

MORGENSTERN, B., 1999 DIALIGN 2: improvement of the segment-to-segment approach

to multiple sequence alignment. Bioinformatics 15: 211-218.

NAUMANN, U., AND K. SCHELLER, 1991 Complete cDNA and gene sequence of the

developmentally regulated arylphorin of Calliphora vicina and its homology to insect

hemolymph proteins and arthropod hemocyanins. Biochem. Biophys. Res. Commun.

177: 963-972.

NEUFELD, T. P., R. W. CARTHEW AND G. M. RUBIN, 1991 Evolution of gene position:

chromosomal arrangement and sequence comparison of the Drosophila melanogaster

and Drosophila virilis sina and Rh4 genes. Proc. Natl. Acad. Sci. USA 88: 10203-

10207.

POWELL, D., J. D. SATO, H. W. BROCK and D. B. ROBERTS, 1984 Regulation and

synthesis of the Larval serum proteins of Drosophila melanogaster. Dev. Biol.102:

206-215.

POWELL, J. R., 1997. Progress and Prospects in Evolutionary Biology: the Drosophila

Model. Oxford University Press, New York.

28 RANZ, J. M., J. GONZÁLEZ, F. CASALS, AND A. RUIZ, 2003 Low occurrence of gene

transposition events during the evolution of the genus Drosophila. Evolution 57: 1325-

1335.

ROBERTS, D. B., and S. EVANS-ROBERTS, 1979 The X-linked alpha-chain gene of

Drosophila LSP-1 does not show dosage compensation. Nature 280: 691-692.

RUIZ, A., AND M. WASSERMAN, 1993 Evolutionary cytogenetics of the Drosophila

buzzatii species complex. Heredity 70: 582-596.

RUSSO, C. A. M., N. TAKEZAKI, AND M. NEI, 1995 Molecular phylogeny and

divergence times of Drosophilid species. Mol. Biol. Evol. 12: 391-404.

SAMBROOK, J., E. F. FRITSCH, AND T. MANIATIS, 1989 Molecular cloning, a

laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

SAMONTE, R. V., AND E. E. EICHLER., 2001 Segmental duplications and the evolution of

the primate genome. Nat. Rev. Genet. 3: 65-72.

SMITH, D. F., A. MCCLELLAND, B. N. WHITE, C. F. ADDISON and D. M. GLOVER,

1981 The molecular cloning of a dispersed set of developmentally regulated genes

which encode the major larval serum protein of D. melanogaster. Cell 23: 441-449.

SPICER, G. S., 1988 Molecular evolution among some Drosophila species groups as

indicated by two-dimensional electrophoresis. J. Mol. Evol. 27: 250-260.

SZANKASI, P., C. GYSLER, U. ZEHNTNER, U. LEUPOLD, J. KOHLI, et al., 1986

Mitotic recombination between dispersed but related tRNA genes of

Schizosaccharomyces pombe generates a reciprocal translocation. Mol. Gen. Genet.

202: 394-402.

THOMPSON, J. D., D. G. HIGGINS AND T. J. GIBSON, 1994 CLUSTAL W: improving

the sensitivity of progressive multiple sequence alignment through sequence

29 weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids

Res. 22: 4673-4680.

THROCKMORTON, L. H., 1975 The phylogeny, ecology and geography of Drosophila., pp.

421-469 in Handbook of Genetics, edited by R. C. KING. Plenum Press, New York.

YANG, Z., 1997 PAML: a program package for phylogenetic analysis by maximum

likelihood. Comput. Appl. Biosci. 13: 555-556.

YANG, Z., AND J. P. BIELAWSKI., 2000 Statistical methods for detecting molecular

adaptation. TREE 15: 496-503.

30 Table 1

Chromosomal homologies between D. melanogaster, D. pseudoobscura and D. buzzatii, the three Drosophila species analyzed in this study (from Powell 1997).

Muller’s element

Species Subgenus Species group A B C D E F

D. melanogaster Sophophora melanogaster X 2L 2R 3L 3R 4

D. pseudoobscura Sophophora obscura XL 4 3 XR 2 5

D. buzzatii Drosophila repleta X 3 5 4 2 6

31 Table 2

Results of the Blast search against D. pseudoobscura genome database as of 28-08-2003

(www.hgsc.bcm.tmc.edu) using as queries the coding sequences of Lsp1α, Lsp1β and Lsp1γ.

Query Blast Hits Score(bits) E value

Lsp1β Contig 815_Contig 5737 2044 0.0

Contig 1500_Contig 3546 222 3e-56

Lsp1γ Contig 1500_Contig 3546 1237 0.0

Contig 815_Contig 5737 121 9e-26

Lsp1α Contig 815_Contig 5737 1269 0.0

Contig 1500_Contig 3546 111 9e-23

32 Table 3

Molecular structure of Lsp1 genes in D. buzzatii and D. pseudoobscura compared to those of

D. melanogaster.

D. buzzatiia D. pseudoobscurab D. melanogasterc

Gene region Lsp1β Lsp1γ Lsp1β Lsp1γ Lsp1α Lsp1β Lsp1γ

5’ UTR 89 68 82 88 88 85 82

Exon 1 210 210 210 207 210 210 207

Intron 1 63 57 71 64 67 67 65

Exon 2 2157 2112 2154 2115 2241 2160 2112

3’ UTR 108 127 117 139 69 151 105

a Data from this work b Our annotation of the Drosophila pseudoobscura genome (Contig 7891_Contig 7492 for

Lsp1β and Contig 1500_Contig 3546 for Lsp1γ of the D. pseudoobscura database) c Genomic clones AE003489 (Lsp1α), AE003588 (Lsp1β) and AE003467 (Lsp1γ) from the

Berkeley Drosophila Genome Project database.

33 Table 4

Conserved sequences between D. melanogaster (Dm), D. pseudoobscura (Dp) and D. buzzatii (Db) in the 3’ UTR (a) and 5’ region (b,c and d) of

Lsp1β and in the 5’ region (e, f and g) of Lsp1γ. Blocks inside these sequences that fulfil the requirements of Bergman and Kreitman (2001) to identify noncoding conserved sequences are shown in boldface.

Lsp1β Lsp1γ a) Dm +2638 GCAAAAAGTCTAATAAACTTTCGAAAA +2664 e) Dm -380 AATTAAACCTGAAC--TGATATG -360

********** **************** ** ***** ***** **

Dp +2618 GCAAAAAGTCAAATAAACTTTCGAAAA +2644 Dp -494 AAATAAACTTGAACTTTGCATAA -472

********* ***************** * **** **** ** **

Db +2611 GCAAAAAGTGAAATAAACTTTCGAAAA +2637 Db -379 ATGTAAA-TTGACGATT--ATCG -360

b) Dm -216 AATAGAAGTCTGGCT--TTGATAAG -194 f) Dm -180 ACCACCTGAATTGAGGC -168

* * ** **** * ******* ******* **

Dp -191 ACGACAAC-CTGGATGGCTGATAAG -168 Dp -299 CGCACCTGACGTGCACT -178

* * *** **** * ******* ** * ** *

Db -189 ATGGCAA--CTGGGCGATTGATAAG -167 Db -181 GCCAGGGTAGTTGAGCC -165

34

c) Dm -130 AGCACCTGAGATACACCC -113 g) Dm -74 GTCTGCCGCTGATATGGTGCA -54

* *********** ** ****** *************

Dp -111 ACCACCTGAGATAGACTT -94 Dp -126 GTCTGCGACTGATATGGTGCA -106

* ****** **** *** ***** *************

Db -126 AGCACCTGCGATACACTC -109 Db -71 GTCTGATGCTGATATGGTGCA -51

d) Dm -411 TATCTACATTTTTGAGGA -394

** * ******** * *

Dp -512 TAGCAACATTTTTCTGCA -495

******************

Db -683 TAGCAACATTTTTCTGCA -666

35 Table 5

Maximum likelihood estimates of the number of synonymous (dS) and nonsynonymous (dN) substitutions per site in the coding regions of Lsp1 genes for pairwise comparisons between D. melanogaster, D. pseudoobscura and D. buzzatii.

Comparison t k S N dS dN dN/dS

Lsp1α-Lsp1β

D. melanogaster- D. melanogaster 1.0643 1.5939 247.8 1729.2 2.2380 0.0849 0.0379

D. melanogaster- D. buzzatii 1.2360 1.1993 309.2 1667.8 2.0358 0.1109 0.0545

D. melanogaster- D. pseudoobscura 1.0806 1.6403 269.3 1707.7 2.0375 0.0957 0.0470

Lsp1α-Lsp1γ

D. melanogaster- D. melanogaster 5.5285 1.4217 316.1 1660.9 10.1718 0.2578 0.0253

D. melanogaster- D. buzzatii 4.4263 1.3910 321.9 1655.1 7.6226 0.2800 0.0367

D. melanogaster- D. pseudoobscura 3.0028 1.2382 247.5 1729.5 6.2989 0.2429 0.0386

Lsp1β

D. melanogaster- D. buzzatii 1.4840 1.9491 268.1 1708.9 3.1884 0.0720 0.0226

D. melanogaster- D. pseudoobscura 0.4923 1.6360 179.9 1797.1 1.2450 0.0559 0.0449

D. buzzatii- D. pseudoobscura 0.9964 1.8803 286.8 1690.2 1.9216 0.0624 0.0325

36 Lsp1γ

D. melanogaster- D. buzzatii 1.5423 1.7122 336.8 1640.2 2.5579 0.0944 0.0369

D. melanogaster- D. pseudoobscura 1.1835 1.4705 252.3 1724.7 2.7054 0.0565 0.0209

D. buzzatii- D. pseudoobscura 0.8167 1.4804 261.9 1715.1 1.5395 0.0787 0.0512

t = branch length; k = transition/ transversion rate ratio; S = synonymous positions; N = non-synonymous positions; dN/dS = ratio of non-synonymous/ synonymous substitution rate

37 Figure legends

Figure 1. Genome organization of the Lsp1β region in D. buzzatii, D. pseudoobscura and D. melanogaster. Arrows indicate the start point and direction of transcription. (a) Annotation and chromosomal localization of the 9,464 bp sequenced in D. buzzatii. Restriction map include EcoRI (E), HindIII (H) and SalI (S) sites. The λ phages used to sequence this region are represented as lines: solid lines are sequenced regions and discontinuous lines regions cloned but not sequenced. The short thick segment represents the probe used to screen the D. buzzatii libraries. (b) Annotation of the homologous region in D. pseudoobscura. (c)

Fragment of the AE003745 genomic clone of D. melanogaster where those genes flanking

Lsp1β in D. buzzatii and D. pseudoobscura are located. The chromosomal localization of these genes is also given. Note that in this species Lsp1β is not present between CG12492 and

Pellino. (d) Genomic organization and chromosomal localization of the Lsp1β region in D. melanogaster. This region is included in the genomic clone AE003588 of this species.

Figure 2. Genome organization of the Lsp1γ region in D. buzzatii, D. pseudoobscura and D. melanogaster (for symbols see legend for Figure 1). (a) Annotation and chromosomal localization of the 10,817 bp sequenced in D. buzzatii. Restriction map include EcoRI (E),

HindIII (H), SalI (S) and ClaI (C) sites. (b) Annotation of the homologous region in D. pseudoobscura. Note that Lsp1γ is nested within Sema-1a gene. (c) Fragment of the

AE003621 genomic clone of D. melanogaster containing Sema-1a gene. Note that Lsp1γ is not present within the first intron. (d) Fragment of D. melanogaster AE003467 genomic clone which contains Lsp1γ.

38 Figure 3. ClustalW alignment of 5’ UTR and close upstream sequences of Lsp1α in D. melanogaster (Dm) and Lsp1β and Lsp1γ in D. melanogaster, D. pseudoobscura (Dp) and D. buzzatii (Db). Conserved regions are included in rectangles. Stars show nucleotides identical in the seven sequences and bold letters the most common nucleotide in each position. TATA box is underlined. +1 shows the first nucleotide of the 5’ UTR. The first AUG codon is in bold and indicated by dots beneath the sequence.

Figure 4. Neighbour-Joining tree of Lsp1 genes from D. melanogaster, D. pseudoobscura and

D. buzzatii species. Arylphorin from C. vicina and Lsp2 from D. melanogaster are also included. The tree was constructed with an alignment of the coding sequence of the genes.

Figure 5. Phylogenetic relationships and karyotypic organization of D. melanogaster and D. buzzatii. The present chromosomal localization of Lsp1 genes in D. melanogaster and D. buzzatii is shown. At least two duplicative and two conservative transpositions are required to explain the evolution of these genes from an ancestral Lsp1 gene.

39