Am J Hum Genet 36:1121-1134, 1984

Beta-Thalassemia in the Delta: Selection, Geography, and Population Structure I. BARRAI, '2 A. RosITO,1 G. CAPPELLOZZA,3 G. CRISTOFORIj4 C. VULLO,4 C. SCAPOLI, I AND G. BARBUJANI1

SUMMARY The allele frequencies for beta-thalassemia for 51 localities in the province of , and in 25 localities in the province of , were studied. It was observed that in the there is a significant dine of frequencies; these decrease from the coast of the toward the west. No such gradient was visible in Rovigo. It was advanced, also on the basis of geography documented by ancient maps, that in the province ofRovigo there were multiple foci of selection for the thalassemia gene, and that in the province of Ferrara selection was stronger in the Oriental part of the area. Examination of the isolation by distance model with these data showed that the Malecot-Morton model fits for the Ferrara data and geography, whereas it does not for Rovigo.

INTRODUCTION The frequencies of beta-thalassemia in the Po Delta in Northern have been studied by several investigators in the past 30 years (see [1] for references); in the same area, particularly in the part of the Delta belonging to the province of Ferrara, the frequencies of other genetic markers have been described [2, 3]. The effect of genetic counseling and of medical genetics procedures on the incidence of marriage between beta-thalassemia heterozygotes [4, 5] and on the incidence of Cooley anemia was also investigated [6, 7]. Models of genetic structure of

Received December 20, 1983; revised April 4, 1984. This study was done with the assistance of the MPI 60% and 40% funds. 1 Institute of Zoology, University of Ferrara, Ferrara, Italy. 2 Interdisciplinary Center of the National Academy of the Lincei, Rome, Italy. 3Microcitemia Center, USL 30, Rovigo, Italy. 4 Pediatric Division, USL 31, Ferrara, Italy. C) 1984 by the American Society of Human Genetics. All rights reserved. 0002-9297/84/3605-0017$02.00 1121 1122 BARRAI ET AL. the population were tested using markers of the ABO and Rh systems and markers of the GLO, PGP, DIA, ESD, GPT, and 6PGD enzyme systems [8-10]. We have available now the data on the frequencies of microcythemia in the whole delta of the , in the provinces of Ferrara and Rovigo (fig. 1). We can study now the geographical variation of frequencies inside the delta area and compare the population residing at the north of the river (Rovigo) with the pop- ulation at the south (Ferrara). Since it was observed that a gradient of frequencies exists for Ferrara with distance from the sea [ 1, 9], we want to study here if such a gradient is confirmed by the present set of data and also if it exists for the area north of the river. We shall study further the population structure for the two provinces separately, using the thalassemia gene only; for Ferrara, we shall be able to compare the structure obtained with markers that seem to be neutral at present with the structure of a gene that is still lethal in homozygous state.

MATERIALS AND METHODS In the Rovigo province, blood samples were collected from 14,806 children in elementary schools in 51 residential areas or in the years 1979, 1980, and 1981; the het- erozygous state for beta-thalassemia was diagnosed by determination of hemoglobin A2 level. Out of 14,806 individuals, 1,159, or 7.82%, were found to be beta-thalassemia heterozygotes from the hemoglobin A2 test (table 1). In the Ferrara province, screening for microcythemia was initiated many years before Rovigo; here, we consider the data from 61 screening campaigns in elementary schools of 24 Comunes and 1 sub- (Renazzo). The screenings refer to a period from 1957 to 1975 (tables 2 and 3). The methodology for screening was the Simmel test, which detects increased osmotic resistance and which is an indicator, in this area, of the hetero-

ADIGE ROVER

R o vo~~~~~~~~~~~~~~.i g 2r.

0

FIG. 1.-The provinces of Ferrara and Rovigo in the Po Delta in Italy. Rovigo is a narrow belt limited by the river at the north, and by the Po at the south. BETA-THALASSEMIA 1123

TABLE 1 LOCALITIES, SAMPLE SIZES, AND No. THALASSEMIC HETEROZYGOTES IN THE

Locality Sample size Heterozygotes DS X Y

Adria ...... 938 61 22 86 80 Ariano ...... 265 18 22 93 68 Arqua Petrarca ...... 188 14 48 62 75 ...... 754 49 66 44 85 Bagnolo Po ...... 88 6 66 42 75 ...... 190 9 85 24 80 ...... 81 6 46 65 74 ...... 58 5 73 32 73 Canaro ...... 188 18 54 57 67 ...... 73 4 65 45 76 Castel Guglielmo ...... 143 10 63 46 77 ...... 348 25 80 28 76 ...... 170 12 83 26 77 ...... 180 21 76 33 75 ...... 237 26 38 72 79 Contarina ...... 371 30 11 100 77 ...... 84 10 22 89 74 ...... 199 7 50 58 67 ...... 157 6 38 74 72 Donada ...... 318 37 11 99 78 ...... 191 19 72 38 68 Fiesso ...... 295 23 60 51 70 Frassinelle ...... 129 8 51 56 72 ...... 218 20 55 55 77 ...... 72 6 69 42 68 ...... 92 9 33 76 76 ...... 151 12 69 39 81 ...... 74 12 43 67 72 ...... 941 80 58 51 83 Loreo ...... 196 19 12 98 81 Lusia...... 270 25 52 56 85 ...... 135 2 91 19 81 ...... 642 49 62 50 65 ...... 55 5 26 86 72 Pettorazza ...... 86 2 27 82 89 ...... 118 12 57 52 73 ...... 219 24 48 63 70 Pontecchio ...... 127 9 42 67 76 Portotolle ...... 521 64 10 109 69 ...... 294 27 7 102 82 Rovigo ...... 2,985 213 43 66 82 ...... 79 8 72 37 73 S. Bellino ...... 81 8 58 50 68 S. Martino Venezze ...... 286 22 35 72 88 ...... 198 17 65 47 67 ...... 429 22 15 99 75 ...... 211 13 70 40 77 ...... 412 36 35 74 82 ...... 89 6 51 58 75 Villanova del Ghebbio ...... 123 8 55 54 80 ...... 57 5 31 80 75

NOTE: DS is the distance of the locality from the coast; X and Y, the linear coordinates in centimeters on a map of the area. 1124 BARRAI ET AL. TABLE 2 LOCALITIES, SAMPLE SIZES, AND No. MICROCYTHEMICS IN THE PROVINCE OF FERRARA

Locality Sample size Microcythemics DS X Y

Argenta ...... 3,140 304 33 70 32 ...... 1,193 128 30 80 74 ...... 1,874 85 66 36 61 ...... 2,097 51 75 27 44 ...... 3,246 387 13 91 55 ...... 4,527 317 5 98 39 ...... 7,072 645 38 69 62 Ferrara ...... 25,709 1,838 50 52 56 ...... 730 81 33 71 56 Iolanda di Savoia ...... 1,701 159 25 81 61 ...... 1,686 170 9 94 48 ...... 383 47 36 65 51 Massa ...... 1,718 203 20 83 53 -Goro ...... 3,109 346 5 105 60 - ...... 2,239 234 25 79 50 Mirabello ...... 180 8 62 40 55 ...... 3,019 306 24 77 46 ...... 1,018 53 60 42 48 Porto Maggiore ...... 2,795 233 35 67 40 Renazzo ...... 1,582 43 76 28 48 Ro Ferrarese ...... 397 49 43 63 68 S. Agostino ...... 389 9 68 34 51 ...... 925 63 28 74 54 ...... 1,600 91 60 43 56 ...... 654 45 40 62 48

NOTE: DS is the distance of the locality from the coast; X and Y, the linear coordinates in centimeters on a map of the area. zygous state for beta-thalassemia. Out of72,983 individuals tested in 61 screenings, 5,895 were positive for the test, resulting in a frequency of microcythemia of 8.08%. We have discussed elsewhere the characteristics of the Simmel test as an indicator of the heterozygous state in this area [11]; however, in table 4, we give the data of a run of 1,102 tests on apparently normal individuals tested in 1978 [12]. Assuming that the diagnosis based on the level of hemoglobin A2 is the true indicator of the heterozygous state for beta-thalassemia, the Simmel test gives a high rate of false positives (about 10%) and of false negatives (about 1.7%). Out of 100 microcythemics so detected, 90 are probably heterozygous for beta-thalassemia; the others may be sideropenic individuals, alpha-thalassemia hetero- zygotes, and double heterozygotes for alpha- and beta-thalassemia. Furthermore, in the population at large, there may be as many as three heterozygotes per thousand individuals

TABLE 3 No. SCREENING BY COMUNE IN THE FERRARA PROVINCE, 1957- 1975

No. screening Comunes

1 ...... 4 2 ...... 10 3 ...... 7 4 ...... 4 Total ...... 25 BETA-THALASSEMIA 1125 TABLE 4 CORRELATION BETWEEN THE SIMMEL TEST AND THE LEVEL OF HEMOGLOBIN A2 IN 1,102 INDIVIDUALS FROM THE FERRARA POPULATION

HEMOGLOBIN A2 SIMMEL TEST + - TOTAL

+ ...... 182 20 202 - ...... 3 897 900 Total ...... 185 917 1,102 who go undetected as false negatives. The estimates of heterozygote frequencies through the Simmel test may then be inflated by an 8% bias in this population; the overall estimate of 8.08% from these data may be as low as 7.30%. Under the above restrictions, here we shall consider the rates of individuals with raised hemoglobin A2 in Rovigo, and the rates of individuals with increased osmotic resistance in Ferrara, as estimates of rates of heterozygosis for beta-thalassemia.

RESULTS Distribution of Heterozygote Frequencies by Comune The parameters of the distributions of heterozygote frequencies per Comune in the provinces of Ferrara and Rovigo are given in table 5. The mean and the variance are weighted by sample size; indicators of skewness and kurtosis are unweighted. The ratio of the variance of the two distributions is F = 1.25, far from any indication of significance. Had the ratio been significant, differential selective environment might have been advocated in the two provinces. Skewness and kurtosis were calculated on the unweighted values of heterozygote frequencies; they are not significant, and no peculiarity is apparent in the two distributions. Under the assumption of equilibrium, the overall selection intensities and the fitness of the heterozygotes seem to have been similar in the area north and in the area south of the river. Geographic Variation of Frequencies The provinces of Ferrara and Rovigo are elongated latitudinally along the river toward the Adriatic; Rovigo is a narrower belt than Ferrara and is limited at the

TABLE 5 PARAMETERS OF THE DISTRIBUTION BY COMUNE OF THE RATES OF HETEROZYGOSIS FOR BETA-THALASSEMIA IN FERRARA AND RovIGO

WEIGHTED UNWEIGHTED AREA No. SAMPLES Mean Variance Skewness Kurtosis

Ferrara ... 25 .0805 .0005228 -.465 -.903

Rovigo ...... 51 .0782 .0004168 .118 1.468 1126 BARRAI ET AL. north border by another river, the Adige (fig. 1). A fraction of both provinces is delta area and still marshy; centuries of bonifications have now reduced the marshy area to the eastern portion of the delta. The marshy area in Ferrara was, also in the known past, limited to the eastern part of the province; north of the river, the marshes extended inland toward (fig. 2). Since the marshes-and the malarial area-have been left by the river in its progress through the plain into the sea, the older marshes, those more westward, have disappeared, being filled by detritus or bonification. Assuming that malaria was endemic in the area after human settlement, it seems opportune to test for clinal frequencies of beta-thalassemia as a function of present distances of localities from the sea. The scatter diagram of the frequencies of heterozygotes per Comune is given in figure 3 as a function of the straight-line distances of the locality from the coast. There is an evident association between frequencies of microcythemia and distance from the sea for Ferrara, whereas no trend is visible for Rovigo. The parameters of the regression of the frequencies on distances, weighted by sample size, are given in table 6. Regression of frequencies on distance is significant for Ferrara; for Rovigo, the clinal variation is much weaker, and of borderline significance. In these populations, migration is mostly toward the outside of the area; dif- ferential migration, which could explain the finding of association with distance, does not seem the simplest way to explain the observed difference. In 1981, the average immigration rate in Comunes belonging to the province of Ferrara was 1.58%. About 80% of the movement is made by residents of the province that change Comune; then, the average exchange per Comune per year is estimated to be 1.26%. At this yearly rate, half-way to equilibrium in allele frequencies would be expected in about 2 generations. Even assuming that in past generations the migration rate was one order of magnitude lower than the present value, considerable uniformity over the province would be expected. Therefore, it is possible that selection has been a stronger force than migration in determining the spatial distribution observed in Ferrara. The dine might rea- sonably be attributed to different intensity of selection in a westward direction from the Adriatic coast. The Selective Environment in Ferrara and Rovigo Maps of middle 1 500s and earlier (fig. 2) suggest that a large part of the present Rovigo province was covered by marshes [13]. There is an indication of three large areas from west to east that were covered by low waters: the Grandi Valli Veronesi, the Valli di Santa Justina, and the Valli of the Delta proper. Nowadays, only the Valli of the Delta survive, and the area west of the Delta is agricultural land. In the Ferrara province, the marshes were and in part still are at the eastern limit of the province; they are the-Nalli di Comacchio, bordering the Adriatic Sea. It would seem that there is a strict correlation between the marshy area and heterozygote frequencies in both provinces. Selection intensity in Ferrara was maximal at the eastern limit of the area, possibly diminishing because of lesser malarial intensity toward the west; this might have resulted in the observed clinal top a "Van*~~ 2s-b--.'L'--- imp-S*~~~~~~~~~~~~~~~~~~ !I-,

I

iX..

IA ~~ ~ ~ ~ ~ ~ I 1128 BARRAI ET AL.

PROVINCE OF ROVIGO 15

.

10 * 0. . 0.0.0 0. . * 0

5-

U

I . C) 0 Or U 10 20 30 40 50 60 70 80 90 100

LL 0 z UL * PROVINCE OF FERRARA *

. * 0 10 . .

.

. .

0 0

* 0

*

10 20 30 40 50 60 70 80 DISTANCE FROM THE COAST (Kmi) FIG. 3.-Variation of the frequency of heterozygotes in Rovigo and Ferrara provinces as a function of the distance of a locality from the Adriatic coast. frequencies. On the other hand, the presence of marshes all along the Rovigo belt might have resulted in a more uniform selective environment, with a consequent uniformity in the present geographical variation of heterozygote frequencies. Models of Population Structure and Geographic Distance We have, then, although for a single biallelic system, two limited areas, one of which is visibly structured and the other not. It seems appropriate to test whether the observed difference in the visible structure is also reflected in models of genetic structure with distance. Here, we shall use only kinship as an indicator [14], since the problems raised by Euclidean distance [15] at the origin may not be solved with the simple approximations we use here and have previously used BETA-THALASSEMIA 1129 [9, 10]. Here, we prefer kinship that at distance zero is defined. For the purpose of testing for presence of genetic structure, we regress kinship over distance between Comunes. If the linear component, obtained including the origin, is highly significant (namely, if the value of t is 3 or more), we test for a Malecot- Morton structure, namely, if at distance d kinship varies according to the function f(d) = (1 - L)ae- bd + L, where b is the rate of exponential decay of kinship with distance, a is the kinship at the origin, and L is the kinship relative to the hypothetical founder population. In computing kinship, no correction for sample size was imposed to the average allele frequency. For Ferrara, the weighted average is .0403, and the unweighted, .0407; for Rovigo, the values are .0391 and .0403, respectively. Since the difference is minimal, we used the unweighted averages, which maintain the useful property that total unweighted sample kinship is zero. The Malecot-Morton model applies when there is no selection, or, if there is, it is constant over the area. When there are dines due to selection, the slope kinship vs. distance may be determined by the line, which in the case of Ferrara, may be due to the westward decrease of the selection intensity. The presence of clinal frequencies for Ferrara predicts negative kinship at larger distances; for Rovigo, no strong prediction can be advanced a priori. The plotting of the observed unweighted sample kinship over distance is given in figures 4 and 5. The results of the analysis of variance for the fit of the linear regression and of the model are given in table 7, both with distance zero included and excluded. In tables 1 and 2, we give the Ferrara and Rovigo data for the fitting of indicators of genetic similarity to indicators of geographic proximity. With these data, any other architecture may be constructed and tested.

DISCUSSION In testing for the presence of population structure associated with distance, it is desirable to use genetic indicators based on as many loci and alleles as possible. Therefore, the exercise of testing for structure with a single biallelic locus is hardly justified unless there is an indication that, for such a locus, structure exists and is visible. Such is the case for Ferrara; further, we have the same allele and its frequencies for a neighboring area, which has possibly experienced the same selective history. In this second area, no structure is openly visible. It seems

TABLE 6 PARAMETERS OF THE REGRESSION OF BETA-THALASSEMIA FREQUENCIES ON LINEAR DISTANCES (km) FROM THE ADRIATIC SEA IN THE PROVINCE OF FERRARA AND RoVIGO.

Area b Sb5

Ferrara ... -.000921 .000165 5.596 Rovigo ... -.000228 .000138 1.655

NOTE: Parameters were weighted on sample size. 1130 BARRAI ET AL.

INSHIP x104

ROVI GO

200-

*

p

0 0 0 0, : 1. . * 0 ee 0 0 00 0 *

0 0 9 0 000 0 0 0 .* .0

10 20 30 40 50 70 90 Km

FIG. 4.-Scatter diagram of kinship as a function of distance in the Rovigo province. No association seems visible; the fit of the Malcot-Morton model is not significant. BETA-THALASSEMIA 1131

KINSHIP x104 FERRARA

200-

100 *.~~~~~~~~~~~~4

-100- . . 310 20 -40 50 60 70 80 90 Km

-200*

FIG. 5.-Scatter diagram of kinship as a function of distance in the Ferrara province. Association seems visible, and the isolation by distance signal is revealed by a significant fit of the Mal~cot- Morton model. then of some interest to study the behavior of the isolation by distance model in this simple case. The strong signal visible for Ferrara is confirmed also under the Male'cot- Morton model. For Rovigo, we fitted the model with great difficulty. The initial values obtained from the logarithmic regression, once transformed to a, b, and L, deviated from the observed data more than the mean. Then, we iterated L, and found that at L = --.1, a = .0913, and b = .0001789 the model removed a sum of squares that resulted in an F of 2.006, and in a correlation coefficient between model and data of .067; no hint of structure is indicated, however, and the fit is not significant. For Ferrara, the fit of the model is significant; the values of the parameters of the exponential are a = .0228, b = .0173, andL = -.016; feeding these values 132 BARRAI ET AL. TABLE 7 COEFFICIENT OF DETERMINATION AND OF CORRELATION BETWEEN KINSHIP AND GEOGRAPHIC DISTANCE

LINEAR MALtCOT-MORTON* AREA MODEL R2 r F R2 r F No.

Ferrara Distance 0 included . 33 .58 160.9 .34 .58 54.5 325 Distance 0 excluded .31 .56 129.3 . * ... . . 300 Rovigo Distance 0 included..005 .07 6.68 .0045 .067 2.006 1,326 Distance 0 excluded..0003 .02 .35 * ...... 1,275

* Ferrara f(d) = (1.0135) (.0206)e 0167d - .0135. Rovigo: f(d) = 1.(.0913)e 00018d- .1. into the numerical Newton-Raphson method improves the fit only slightly at a = .0206, b = .0167, and L = .0135. Since in Ferrara a dine was observed, the variation of kinship with distance is expected. It is not obvious why the model does not fit at all in Rovigo; possibly, the absence of fit may be confirmatory of the presumed multifocal centers of selection of the thalassemia gene north of the river. The coefficients of the Malecot-Morton function permit estimation of the three most important parameters of isolation by distance, namely, evolutionary size, N, systematic pressure, m, and standard deviation of the breeding area, a [16]. We obtain by attributing to Rovigo the values estimated:

Area N m a

Ferrara ...... 191 .059 21 Rovigo ...... 70 .038 567

It is not likely that the pressures that have produced equal average frequencies of thalassemia in the two areas in the present time were different. However, the comparison between Ferrara and Rovigo cannot probably be done with these results, since there is no signal of significant structure for Rovigo. For Ferrara, we have the possibility of comparing the present structure with previous analyses of isolation by distance [9, 10]:

System N m a

ABO-Rh .446 .087 15 Six red cell markers ...... 246 .066 11

In both cases, the model removed a significant fraction of the sum of squares of total kinship, and these values may be based on a real signal (fig. 6). It would seem that the model gives conflicting results when strong selection is involved in the genetic system, so that the estimates of demographic parameters obtained from it, in such cases, must be considered with caution. BETA-THALASSEMIA 1133

Kinship x104

100

80 Km 10 20 0 _ 0 65- 700 ABO+Rh ( R2 =0.055 ) * Royv igo

6 enzymes (R2=0. 38)

Thalassemia (R2=0.34)

FIG. 6.-Isolation by distance in Ferrara using different genetic markers. *R. R. Sokal (personal communication, 1983) observed that this structure, tested with indicators accounting for autocorrelation, may barely reach borderline significance.

In conclusion, we observe that isolation by distance in Ferrara seems similar to isolation in other groups and other geographies. The value of a result in estimates of evolutionary size ranging from Papago localities to northeastern Brazil localities; average or is similar to other European structures observed in Swiss and Belgian Comunes [16]. However, it is likely that under strong selection the model of isolation by distance fits better to the geographical variation of gene frequencies at distances larger than those existing in the area studied here. Therefore, it seems desirable to test the model using thalassemia frequencies in the whole peninsula.

REFERENCES

1. SILVESTRONI E, BIANCO I: Screening for microcythemia in Italy: analysis of data collected in the past 30 years. Am J Hum Genet 27:198-212, 1975 2. GANDINI E, MENINI C, DE FILIPPIs A, DELL'ACQUA G: Erythrocytary glucose-six- phosphate deficiency. Acta Genet Med Gemellol (Roma) 3:271-284, 1969 3. LUCARELLI P, CORBO RM, SCACCHI R, ET AL.: A study of nine polymorphic systems in the population of the Po Delta. Am J Phys Anthropol 45:211-216, 1976 4. BARRAI I, VULLO C: Assessment of prospective genetic counselling in the Ferrara area. Am J Med Genet 6:195 -204, 1980 1134 BARRAI ET AL. 5. BARRAI I, VULLO C: Genetic counselling in beta thalassemia in Ferrara. J Genet Hum 28:97-104, 1980 6. AGUZZI S, VULLO C, BARRAI I: Reproductive compensation in families segregating for Cooley's anemia in Ferrara. Ann Hum Genet 42:153-160, 1978 7. BARRAI I, VULLO C: Screening for beta-thalassemia heterozygotes. Lancet ii: 1257, 1980 8. BERETTA M, MAZZETTI P, BARRAI I, RAVANI-SACCHI A, SALSINI G: Red cell isoenzymes in the population of the Po Delta. J Hum Evol 10:517-521, 1981 9. ZANARDI P, DELL'ACQUA G, MENINI C, BARRAI I: Population genetics in the province of Ferrara. Genetic distances and geographic distances. Am J Hum Genet 29:169- 177, 1977 10. BARRAI I, BARBUJANI G, BERETTA M, VULLO C: Heterozygosity and geographic distances in a limited area. J Hum Evol 12:403-408, 1983 11. VULLO C, CRISTOFORI G, SALSINI G, BARRAI I: Considerazioni sul consultorio genetico nella talassemia. Prospettive Pediatr 7:203-210, 1977 12. CADORE M: Utilizzazione di una funzione discriminante nella diagnosi di microcitemia. Doctoral thesis in the Faculty of Sciences, Ferrara, Italy, Univ. of Ferrara, 1978- 1979 13. MAZZETTI A, ROMANATO G: Il Polesine dalla guerra di Ferrara al taglio di (1482-1604). Carte geografiche, Mappe, disegni. Rovigo, Italy, Accad. dei Concordi, 1976-1977 14. MORTON NE: Genetic Structure of Populations. Honolulu, Univ. of Hawaii Press, 1973 15. CAVALLI-SFORZA LL, EDWARDS AWF: Phylogenetic analysis models and estimation procedures. Am J Hum Genet 19:233-257, 1967 16. MORTON NE: Estimation of demographic parameters from isolation by distance. Hum Hered 32:37-41, 1982