Proc. Natl. Acad. Sci. USA Vol. 92, pp. 9171-9175, September 1995 Evolution Geographical structuring in the mtDNA of Italians GUIDO BARBUJANI*tt, GIORGIO BERTORELLEt, GIULIA CAPITANI*, AND RosARiA ScozzARI§ *Dipartimento di Scienze Statistiche, Universita di Bologna, via Belle Arti 41, 40126 Bologna, Italy; tDipartimento di Biologia, Universita di Padova, via Trieste 75, 35121 Padua, Italy; and §Dipartimento di Genetica e Biologia Molecolare, UniversitA di Roma "La Sapienza," Piazzale Aldo Moro 7, 00100 Rome, Italy Communicated by Robert R. Sokal, State University of New Yorkl Stony Brook NY June 30, 1995 ABSTRACT Geographical patterns of mtDNA variation of analysis, then, is the individual haplotype and not the were studied in 12 Italian samples (1072 individuals) by two population, which has two main consequences: (i) a measure different spatial autocorrelation methods. Separate analyses of intrapopulation genetic relatedness is estimated by pairwise of the frequencies of 12 restriction morphs show North-South comparing all individuals of the same sample; (ii) the sample clines, differences between Sardinia and the mainland popu- size increases, giving the test higher statistical power. For lations, and the effects of isolation by distance. A recently instance, in a traditional study of 20 samples, autocorrelation developed autocorrelation statistic summarizing molecular statistics are evaluated on the basis of 20 x 19/2 = 190 similarity at all sites (AIDA; autocorrelation index for DNA comparisons: but, if the average sample size is 20 individuals, analysis) confirms the presence ofa clinal pattern; differences the AIDAs are based on 400 x 399/2 = 79,200 comparisons. between random pairs of haplotypes tend to increase with The results obtained using the two approaches on the same their geographical distance. The partition of gene diversity, data set are not identical. The frequencies of some mtDNA however, reveals that most variability occurs within popula- morphs probably reflect recent processes of drift and gene tions, whereas differences between populations are minor flow, whereas geographical structuring at the sequence level, (GST = 0.057). When the data from the 12 samples are pooled, strictly depending on the appearance of new mutations, seems two descriptors of genetic variability (number of polymorphic to be much more related to differentiation events occurring in sites and average sequence difference between pairs of indi- a remote past. To interpret these findings, in the final part of viduals) do not behave as expected under neutrality. The this study we calculate some descriptors of genetic heteroge- presence of clinal patterns, Tajima's tests, and a simulation neity. Their distributions suggest that different demographic experiment agree in suggesting that population sizes in- phenomena affected the peninsular Italian versus the Sardi- creased rapidly in Italy and Sicily but not necessarily so in nian populations, with the former, but not the latter, showing Sardinia. The distribution of pairwise sequence differences in evidence of a demographic expansion. Based on mtDNA the Italian peninsula (excluding Sardinia) permits a tentative diversity, such an expansion seems to have occurred either in location of the demographic increase between 8000 and 20,500 the early Neolithic or in the late Paleolithic. years ago. These dates are consistent with archaeological estimates of two distinct expansion processes, occurring, respectively, in the Neolithic and after the last glacial maxi- SPATIAL AUTOCORRELATION STATISTICS mum in the Paleolithic. Conversely, there is no genetic evi- Spatial autocorrelation is defined as the dependence of one dence that such processes have had a major impact on the variable upon its values at other localities (12). Patterns of Sardinian population. allele frequencies may be summarized by autocorrelation statistics, generally Moran's I, calculated in discrete distance Most studies of mtDNA variation in humans have inferred classes between all possible pairs of populations. For large evolutionary processes by reconstructing history-i.e., gene- samples, Moran's I values range from -1 (negative autocor- alogies of haplotypes (1-4). With one exception (5), geograph- relation, indicating genetic dissimilarity in a distance class) to ical information has been digregarded, or it has been used + 1 (positive autocorrelation, or genetic similarity). The ex- simply to classify populations, as an alternative to the subjec- pected value is very close to 0 under a randomization hypoth- tive criteria of racial classification (6, 7). There is no doubt, esis (12). however, that spatial patterns of genetic diversity also contain AIDAs are two autocorrelation statistics developed for the useful information for evolutionary inferences. One reason for study of molecular data (11). They measure whether, and to this omission may lie in the relative paucity of statistical tools what extent, pairs of haplotypes (rather than pairs of haplotype suited for spatial analysis of molecular data. frequencies) resemble each other at various distances in space. Genetic variation in space can be summarized by spatial The AIDA used in this study, called IIby analogy with Moran's autocorrelation measures (see, e.g., ref. 8). These statistics I, varies between -1 and + 1, with its expected value being have also been used to test hypotheses on past demographic close to 0; it can be interpreted in the same way as Moran's I. processes (9, 10). Here we apply them to a data base ofmtDNA If haplotypes are coded as strings of binary digits (as described restriction fragment length polymorphism data. We recon- in ref. 5), II can be calculated in arbitrary distance classes as struct patterns of genetic variation from molecular informa- tion, and we try to draw inferences about the underlying n-1 n S microevolutionary phenomena. n> wi, > pi -P)(pjk -P) In the first part of this paper, we analyze the frequencies of H i=1 j>i k=1 12 common restriction morphs, treating them as if they were WE (Pik allele frequencies. In the second part, we use a recently i=1 k=1E -Pk)2 developed autocorrelation statistic, AIDA (autocorrelation index for DNA analysis) (11), which compares DNA sequences where n is the sample size; W is the number of pairwise (and not only their frequencies) at several spatial lags. The unit comparisons in the distance class of interest; pik and pjk represent the bases observed at the kth site in the haplotypes The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in Abbreviation: AIDA, autocorrelation index for DNA analysis. accordance with 18 U.S.C. §1734 solely to indicate this fact. tTo whom reprint requests should be sent at the * address. 9171 Downloaded by guest on September 26, 2021 9172 Evolution: Barbujani et al. Proc. Natl. Acad. Sci. USA 92 (1995) of the ith and thejth individuals, respectively;Pk is the average I, Ava II, and, with one exception, HinclI restriction endo- of the p values at the kth site across all individuals; and the nucleases. Because for HincII there is no polymorphism in weights wij are 1 if individuals i and j fall in the distance class Italy (17), we focused on the other four enzymes. The fre- of interest, otherwise they are 0. Summation is over the S quencies of the 12 most common restriction morphs were polymorphic sites for all individuals in the sample. The error analyzed by traditional spatial autocorrelation. of II is calculated by repeatedly randomizing haplotypes with Haplotype sequences were inferred from the restriction respect to their spatial location, each time calculating a patterns following the scheme proposed by Excoffier (2, 5). pseudovalue of II; an empirical null distribution of pseudo- Each individual haplotype (or mtDNA type) was represented values is thus constructed, and the significance of the observed by an array of Os or ls indicating, respectively, that each DNA value is assessed by comparison with it. site was identical to, or different from, the corresponding site The overall significance of the entire set of autocorrelation in the sequence published by Anderson et al. (21). The choice coefficients at different distances (correlogram) was evaluated of that particular sequence does not affect any of the statistics by the Bonferroni criterion (13). Under the null hypothesis of we evaluated. The data thus transformed were then used to isolation by distance, a correlogram is expected to show a calculate AIDAs. Geographical distances between localities decrease of genetic relatedness, from positive significance at a were great-circle distances. short distance to insignificant (14). RESULTS THE DATA Tlwenty-six sites appeared polymorphic among the 1072 indi- We collected mtDNA restriction fragment length polymor- viduals studied, defining 42 different mtDNA types (Table 1). phism data from six studies in the literature (15-20) (Fig. 1). Twelve morphs showed substantial variation in frequency All these samples had been typed using BamHI, Hae II, Msp among populations. FIG. 1. Samples considered. Numbers in boldface are sizes, and numbers in italics are numbers of different mtDNA types in the samples. Downloaded by guest on September 26, 2021 Evolution: Barbujani et al. Proc. Natl. Acad. Sci. USA 92 (1995) 9173 Table 1. Relative frequency of mtDNA types in the 12 samples (the first 5 samples are Sardinian) Type Desulo Tonara Orosei Galt. Cagl. Vogh. Berg. Roma Enna Bari Cosen.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-