Current events

Charles A. Lockwood Randomization procedures and sexual & Brian G. Richmond dimorphism in Australopithecus Doctoral Program in Anthropological afarensis Sciences, State University of New York, Stony Brook, NewYork 11794-4364, U.S.A.

William L. Jungers Department of Anatomical Sciences, State University of New York, Stony Brook, New York 11794-8081, U.S.A.

William H. Kimbel Institute of Human Origins, 1288 Ninth St, Berkeley, California 94710, U.S.A. Journal of Human Evolution (1996) 31, 537–548

Randomization procedures have recently been used in a variety of applications in the field of paleoanthropology, centered on the assessment of taxonomic composition and sexual dimorphism of fossil samples (e.g., Richmond & Jungers, 1995; Kramer et al., 1995; Cope & Lacy, 1995; Grine et al., 1996). In particular, Richmond & Jungers (1995) used the exact randomization method to assess the probability that the range of size and shape variation in specimens attributed to Australopithecus afarensis could be found in comparative samples of extant hominoids. To compare size, the exact randomization method involves computing all possible pairwise ratios of size variables in the comparative sample and determining what percentage of these ratios exceed the maximum ratio found in the fossil sample. Based on the available sample of mandibles, proximal femora and humeri attributed to A. afarensis, Richmond & Jungers (1995) concluded that the size range of variation for each element was so rarely exceeded by ratios between same-species pairs of extant hominoids that, if the Hadar remains are accepted to represent a single species, a degree of sexual dimorphism at least as great as that of the most dimorphic living apes (gorillas and orang-utans) is probable. While exact randomization is an appropriate procedure to determine the relative affinities between two fossil specimens, there are potential problems with focusing exclusively on the extremes of a sample without taking into account the presence of intermediate specimens. One alternative is to compare the total distribution of all pairwise comparisons among fossils with those among reference samples. However, because fossil specimens are differentially preserved, there is clearly value in finding a method with which the extremes of size in a given sample can be used to test the null hypothesis of a single species. Bootstrapping is a technique for estimating standard errors and confidence limits that involves resampling with replacement from the population (Efron & Tibshirani, 1993). Resampling from reference samples at sample sizes equal to that of the fossil sample is a form of bootstrapping that may be suitable for comparing extremes or, in some cases, statistics of variation such as the coefficient of variation (CV). Given the assumptions of independent and random sampling in the fossil record, the random samples created using this method approximate potential ‘‘fossil samples’’ for each extant hominoid species.

0047–2484/96/120537+12 $25.00/0  1996 Academic Press Limited 538 . .  ET AL.

Using this more conservative, and probably more appropriate, resampling procedure, the present study compares size variation in A. afarensis with that of the same reference samples used by Richmond & Jungers (1995). We ask the question, what is the probability of sampling a set of n individuals from an extant hominoid species whose size variation is greater than that present in a sample of n individuals of A. afarensis? In other words, we acknowledge that more than two individuals contribute to the pattern of variation in the fossil assemblage (cf. Kramer, 1993; Cope & Lacy, 1995). The corresponding null hypotheses are that for each hominoid reference taxon and for each skeletal element, the size variation in A. afarensis does not exceed that of the reference group. Rejection of the null hypothesis for each group would support an alternative hypothesis that multiple species are included in the Hadar sample. Rejection of the null hypothesis for some but not all groups would support the hypothesis of a high degree of sexual dimorphism in A. afarensis relative to those groups whose size variation is less.

Methods The mandibular corpus, the proximal femur and the humerus are used here to examine sexual dimorphism. The measurements of these bones and the reference samples of modern hominoids are described in Table 1 of Richmond & Jungers (1995). In brief, the reference samples are from multiple populations or subspecies of Homo sapiens, Gorilla gorilla, Pan troglodytes, and Pongo pygmaeus. Total sample sizes for each element in each hominoid species range from 48–50 for the mandible and humerus, and from 31–41 for the proximal femur.

Corpus breadth and height at P4/M1 are measured for mandibles. Eight measurements are used to describe the proximal femur. The size variable for each individual mandible or proximal femur is the geometric mean, which equals the nth root of the product of n measurements. The geometric mean, therefore, condenses the size information from a number of variables into a single dimension (Mosimann, 1970; Jungers et al., 1995). For comparisons of humeri, the size variable is simply total humeral length. Before we address questions of probability, it is useful and appropriate to establish some estimate of average sexual dimorphism in A. afarensis for the elements being considered. This is done herein only for the mandibles of A. afarensis, as the sample size and attendant accuracy are higher than for the other elements. To estimate the average degree of sexual dimorphism in the mandible, indices of sexual dimorphism (ISDs) are first calculated for the extant hominoids. The ISD is the ratio of male to female mean values for geometric means. If the CVs of geometric means for the mandibular corpora are correlated with the ISDs for our reference samples of extant hominoids, it may be postulated that the CV for the Hadar sample can be directly compared with those of extant hominoids in order to ascertain the degree of sexual dimorphism in A. afarensis. In addition, a mean index of sexual dimorphism may be estimated from the CV by a reduced major axis regression analysis analogous to those used by Fleagle et al. (1980), Kay (1982) and Leutenegger & Shell (1987) to estimate degrees of sexual dimorphism in the canines of extinct anthropoid species without a priori knowledge of specimen gender. Plavcan (1994) suggests that this method may overestimate the degree of sexual dimorphism present in a species with low sexual dimorphism. This is a fault common to most estimators of sexual dimorphism (see also Josephson et al., 1996), and we regard the resampling procedures described below as another opportunity to test the validity of the CV-based estimate of sexual dimorphism.    A. AFARENSIS 539

The method of hypothesis testing used here is a resampling method commonly referred to as bootstrapping. Bootstrapping was developed to calculate standards errors for unconven- tional statistics, although the term is also applied to general methods of resampling to determine the validity of a result from a single sample (Efron, 1979; Efron & Tibshirani, 1993; Manly, 1991; Sokal & Rohlf, 1995). We use bootstrapping to simulate random samples of extant taxa comparable in sample size with those of the fossil record. By assessing the probability of finding a certain degree of size variation in a modern taxon, it is possible to determine the validity of directly comparing statistics of size variation. The assumption that these random and independent samples represent a legitimate set of comparisons for the fossil record may be criticized on various grounds. For reasons related to the socioecological structure of the group being sampled and the type of fossil deposition, a fossil sample may be biased toward one sex or the other or toward one end of the body size range. However, because there is no a priori reason to suspect a specific bias at Hadar, we have chosen not to explore these alternatives here. Another consideration is that the time depth represented in some fossil assemblages could artificially increase the apparent range of variation. As suggested by Richmond & Jungers (1995), the inclusion of multiple subspecies or populations in the extant reference samples may address this problem in part. For each skeletal element, 1000 random samples of geometric means are selected with replacement from each of the extant hominoid groups. This is a sufficient number of replications to detect significance at the P=0·05 level for all but borderline cases (Manly, 1991; Efron & Tibshirani, 1993). Two statistics of variation are used: the max/min ratio and the CV. The bootstrap procedure is performed for pairs of size variables (for max/min ratios) and for sample sizes equal to the number of specimens in the hypodigm of A. afarensis for each element (for max/min ratios and CVs). Pairwise sampling of max/min ratios should produce results similar to the exact randomization procedure used by Richmond & Jungers (1995). Using larger sample sizes in the bootstrapping analysis should lead to larger ranges of variation in the extant hominoids (Cope & Lacy, 1995). The maximum sample size for fossil specimens is 17 for the mandible, five for the proximal femur, and three for the humerus. The sample of 17 adult mandibles whose corpus height and breadth can be measured at P4/M1 includes eight undescribed specimens. The measurements for two of these, A.L. 444-2b and A.L. 417-1a, were also used by Richmond & Jungers (1995), but six other undescribed specimens have been added for the current study. The largest mandibular corpus is that of A.L. 438-1g (undescribed), and the smallest is that of A.L. 207-13. For the proximal femur, the sample size of five contains those specimens of A. afarensis for which absolute size in the dimensions used here can be compared, although the full suite of measurements may not be available. These are A.L. 211-1, 288-1ap, 333-3, 333w-40 (Johanson et al., 1982; Lovejoy et al., 1982), and A.L. 333-123 (undescribed). A recently discovered proximal femur from Hadar, A.L. 600-1, is larger than all of the others (Johanson et al., 1994), and Richmond & Jungers (1995) discussed the possible implications of this specimen for sexual dimorphism in A. afarensis. However, A.L. 600-1 is now recognized as belonging to a large felid of the Homotherium. Therefore, of hominid specimens, A.L. 288-1ap and A.L. 333-3 represent the smallest and largest specimens, respectively, based on rank order determinations of overall size using original fossils or casts. These two specimens are also the most complete and can be measured for each variable described by Richmond & Jungers (1995). Therefore, the ratio of the geometric means between these specimens is used in comparison with random samples of two and five from extant hominoids. 540 . .  ET AL.

Table 1 Summary statistics for mandibular geometric means

Australopithecus Humans Gorillas Orang-utans Chimpanzees afarensis

Mean (mm) 21·9 27·9 25·9 20·5 26·0 n 50 50 48 50 17 Min 18·7 21·3 20·5 15·6 22·0 Max 25·4 36·0 32·1 24·1 31·3 Standard deviation 1·45 3·41 3·05 1·59 2·99 CV 6·67 12·3 11·8 7·78 11·7 Standard error of CV 0·68 1·27 1·24 0·79 2·06 Male mean 22·3 30·1 28·0 21·2 N/A Female mean 21·5 25·7 23·9 19·9 N/A ISD* 1·038 1·174 1·179 1·065 1·167

CVs are calculated using Sokal & Braumann’s (1980) correction. ISD=male mean/female mean. *ISDs are reported to three decimal places for correspondence with Figure 1; that for A. afarensis is an estimate based on CV comparisons.

For the humerus, three specimens are preserved well enough to assess overall size in the sense of total humeral length, and of these A.L. 288-1m and MAK-VP-1/3 are the smallest and largest, respectively (Richmond & Jungers, 1995). A second method used to assess the degree of size variation is the coefficient of variation (CV), which is less affected by sample size than is the max/min ratio. Cope & Lacy (1992, 1995) have suggested with simulated single- and multiple-species samples that CV comparisons are more powerful than range-based statistics for assessing the taxonomic diversity within a sample. On the other hand, the primary assumption made in calculating a CV—sampling from a normal distribution—holds less and less in samples with increasing levels of sexual dimorphism (Sokal and Braumann, 1980). The chief logistical problem with using the CV is that to assess size variation using the geometric mean, data for each measurement are required for each specimen in the hypodigm. For the present purposes, the CV is only deemed applicable for the mandibles and perhaps the humeri. Making inferences from a CV based on a sample size of three humeri may be questioned. In an absolute sense, inferences would probably be unwarranted; relatively speaking, the humeri serve as a good test for the applicability of the bootstrapping procedures. CVs are calculated for each random sample of extant hominoids using Sokal & Braumann’s (1980) correction for small sample sizes, and the frequencies with which these exceeded the CV of the geometric means for specimens of A. afarensis are determined.

Results Estimating average sexual dimorphism in the mandible Summary statistics for the mandibular samples are presented in Table 1. For extant hominoids, the species ISD values are highly correlated with the species CVs (r=0·996). The CV of A. afarensis mandibles is 11·7, which is similar to those of orang-utans and gorillas. A reduced major axis regression of ISD values on CVs yields an ISD estimate of 1·167 in mandibular size (Figure 1). Note that this estimate is reported as 1·167 for correspondence to Figure 1; it is 1·17 to the correct number of significant figures.    A. AFARENSIS 541

1.2 Orang-utans r = 0.996 ISD = (0.0258* CV) + 0.865 * Gorillas 1.15 A. afarensis (estimated ISD = 1.167) 1.1 ISD

1.05 Chimpanzees Humans 1 6 7 8 9 1011 12 13 CV Figure 1. Reduced major axis regression of ISD on CV for extant hominoid mandibular geometric means. ISD=male mean/female mean.

Max/min ratios It is clear from Table 2 that the bootstrap method using n=2 produces similar results to the exact randomization procedure for max/min ratios. However, for some reference groups, the results from the bootstrap diverge dramatically from those of the exact randomization when sample sizes larger than two are considered. The probability of finding a max/min ratio of geometric means greater than 1·42 (the ratio for A. afarensis)in random samples of 17 gorilla mandibles is 83·3%, and for orang-utans 73·7%. These proportions are expressed graphically in Figure 2. At the other extreme, a change in the sample size used to randomly sample human mandibles does not affect the results, because no two humans in the comparative sample of 50 individuals display a size difference as great as that of A. afarensis. Chimpanzees are intermediate, as there is a 25·6% probability of sampling a max/min ratio of mandibular size greater than that of A. afarensis. The range of size among five proximal femora of A. afarensis is relatively great no matter what comparative group is used (Table 2; Figure 3). Again, the results for max/min ratios using 1000 samples of two individuals closely approximate those from exact randomization. Those for samples of five individuals suggest that the probabilities of randomly obtaining max/min ratios greater than 1·37 in gorillas and orang-utans are 35·6% and 14·7%, respectively. Some caution should be applied to the result for orang-utans, as the sample size of the reference sample is n=31, the smallest for any reference sample in the current study. As the max/min ratios for the entire reference samples of either chimpanzees or humans are exceeded by that of five specimens of A. afarensis, the probability of duplicating the size range of A. afarensis from samples of these extant hominoids remains zero. Although only three humeri of A. afarensis are preserved well enough to assess their overall length, it is important to recognize the differences between repeated sampling of two individuals from reference samples (comparable with exact randomization) and repeated sampling of three individuals (Table 2). The probabilities of sampling a ratio greater than 1·24 more than double for each reference sample of extant hominoids when three individuals are sampled, and the results in this case are similar to those obtained for the proximal femur. 542 . .  ET AL.

16% 14% Gorillas 12% 10% 8% 6% 4% 2% 0%

20% Orang-utans 18% 16% 14% 12% 10% 8% 6% 4% 2% 0%

18% Chimpanzees 16% 14% 12% 10% 8% 6% 4% 2% 0%

30% Humans = 1.42 25% 20% 15% 10% 5% 0% 11.1 1.2 1.3 1.4 1.5 1.6 1.7 Figure 2. Frequency histograms of geometric mean max/min ratios calculated from 1000 random samples of 17 extant hominoid mandibles. The vertical line marks the max/min ratio of the Hadar sample (n=17).

Coefficient of variation There is a 59·5% probability of obtaining a CV as high as that of A. afarensis (11·7) in 1000 samples of 17 gorilla mandibles (Table 2). For orang-utans, this probability is 49·8%, while in chimpanzees and humans, the same result is highly improbable (1·4% and 0%, respectively). Figure 4 illustrates the distinction between chimpanzees and humans on the one hand, and gorillas and orang-utans on the other, in the distribution of randomly sampled CVs. The value for A. afarensis falls in the more dimorphic group. Whether A. afarensis is compared with chimpanzees, orang-utans or gorillas, there is less likelihood of sampling the CV of mandibular    A. AFARENSIS 543

Table 2 Probabilities (%) of sampling statistics of size variation in extant hominoid samples greater than those that describe Australopithecus afarensis

Sample size Gorillas Orang-utans Chimpanzees Humans

Mandibles Max/min 17 83·3 73·7 25·6 0·0** CV 17 59·5 49·8 1·4* 0·0** Max/min (pairs) 2 *3·2 *2·4 0·8** 0·0** Exact randomization (pairs) 2 *3·8 *2·7 0·7** 0·0** Proximal femora Max/min 5 35·6 14·7 0·0** 0·0** Max/min (pairs) 2 7·0 *2·3 0·0** 0·0** Exact randomization (pairs) 2 6·5 *2·6 0·0** 0·0** Humeri Max/min 3 33·7 19·5 5·3 2·7* CV 3 29·2 15·4 3·2* 1·5* Max/min (pairs) 2 15·5 8·6 2·0* 0·5** Exact randomization (pairs) 2 17·5 7·6 1·9* 0·9**

Based on 1000 random samples. All probabilities (%) for mandibles and proximal femora are reported for comparisons of geometric means; those for humeri are based on total humeral length. Exact randomization results for proximal femor and humeri are from Richmond & Jungers (1995); those for mandibles have been recalculated using identical methods for the expanded sample. CVs are reported using Sokal & Braumann’s (1980) correction for small samples. See text for explanation of ‘‘Sample size’’. *<5·0% **<1·0%. size than of sampling the max/min ratio (Table 2). In addition, a comparison between Figures 2 and 4 suggests that the CVs of 1000 samples from each hominoid species are more normally distributed than are the max/min ratios. For total humeral length, the probabilities of obtaining a CV greater than that of A. afarensis (12·9) are consistently slightly less than when comparing max/min ratios. Nonetheless, the results from the two techniques are compatible with each other and with those from max/min comparisons using the proximal femur. Size differences comparable with those of A. afarensis in the proximal femur and the humerus are less frequently found in extant hominoid samples than are those in the mandible.

Discussion The estimate of average sexual dimorphism for mandibular size in A. afarensis, based on the CV of geometric means, gives a result (ISD=1·167) that is compatible with a single species that possesses a relatively high degree of sexual dimorphism. This result naturally leads to the testing of several hypotheses. The primary concern when we initiated this study was that pairwise, exact randomization could lead to an unacceptably high rate of type I error—that is, rejecting a null hypothesis that actually is true—in cases where the fossil sample size is greater than two. A method of repeated sampling from the comparative samples, setting n equal to that of the same element for A. afarensis, may be a more relevant comparison of like with like. This application of bootstrapping effectively documents the previously recognized dichotomy between humans and chimpanzees on the one hand, and the strongly sexually dimorphic orang-utans and gorillas on the other (e.g., O’Higgins et al., 1990; Wood et al., 1991). While the probabilities using the most sexually dimorphic reference samples are much 544 . .  ET AL.

14% Gorillas 12% 10% 8% 6% 4% 2% 0%

20% Orang-utans 18% 16% 14% 12% 10% 8% 6% 4% 2% 0%

Chimpanzees 16% 14% 12% 10% 8% 6% 4% 2% 0%

Humans 14% 12% = 1.37 10% 8% 6% 4% 2% 0% 11.1 1.2 1.3 1.4 1.5 1.6 Figure 3. Frequency histograms of geometric mean max/min ratios calculated from 1000 random samples of five proximal femora for each extant species. The vertical line marks the max/min ratio of the Hadar sample (n=5). higher than those from exact randomization, the probabilities using less dimorphic taxa are more similar to the previous results (Richmond & Jungers, 1995). This is especially clear using the coefficient of variation for mandibular size (Figure 4). The results from the bootstrapping analyses further indicate that the null hypothesis can be rejected for chimpanzees and humans; the size range of A. afarensis is not equivalent to that    A. AFARENSIS 545

12% Gorillas 10% 8% 6% 4% 2% 0%

14% Orang-utans 12% 10% 8% 6% 4% 2% 0%

Chimpanzees 16% 14% 12% 10% 8% 6% 4% 2% 0%

20% Humans 18% 16% = 11.7 14% 12% 10% 8% 6% 4% 2% 0% 2.55 7.5 10 12.5 15 1.75 Figure 4. Frequency histograms of CVs calculated for geometric means of the mandibular corpus. One thousand random samples of 17 extant hominoid mandibles are included. The CV of the Hadar sample (n=17) is indicated by the vertical line. displayed by either of these taxa. The max/min results for the mandible are somewhat more conservative than those for CV-based comparisons. In all cases, however, the probabilities indicate that a similar range of size can be sampled from gorillas or orang-utans with frequencies that do not approach the probability level of 0·05. For comparisons of mandibles (the element with the greatest sample size), the probabilities achieved with CVs or max/min ratios are greater than 50% for gorillas and orang-utans. This implies that the size variation in 546 . .  ET AL.

A. afarensis is consistent with that of single species if extant hominoids are used as comparative baselines (cf. Kimbel et al., 1994; Aiello, 1994). That the relative range of size present in A. afarensis is greater than in chimpanzees and humans warrants the interpretation that sexual dimorphism in A. afarensis is almost certainly greater than in these taxa (cf. McHenry, 1991). The use of multiple hominoid species for comparative purposes enables us to bracket the degree of sexual dimorphism present in A. afarensis between that of strongly dimorphic hominoids (gorillas and orang-utans) and those that show reduced levels of sexual dimorphism (humans and chimpanzees). This may be an overly conservative view, for if A. afarensis shows a tendency towards one group or the other, it is to the more highly dimorphic hominoids. The estimate of average dimorphism in mandibular size based on the correlation between the ISD and the CV appears to be a legitimate estimate based on the other lines of evidence presented here. However, Plavcan (1994) points out that the CV-based method of estimating the degree of sexual dimorphism may provide an overestimate of sexual dimorphism in fossil populations that exhibit low levels of sexual dimorphism. This caution is only applicable for samples whose ISDs range between 1·0 and 1·1, unless intrasexual variation is thought to be unusually high (Tables 2–6 in Plavcan, 1994). Because all of our results suggest a much higher level of sexual dimorphism in A. afarensis, it is doubtful that the CV furnishes an overestimate in this case. Another issue raised here is that the results vary for different skeletal elements; that is, the levels of size variation for proximal femora and humeri of A. afarensis are relatively greater for their sample sizes than is that for the mandible. Although there is no a priori reason to expect that dimorphism will be expressed identically throughout the skeleton (see, for instance, McHenry, 1986, 1991; Lovejoy et al., 1989; O’Higgins et al., 1990; Wood et al., 1991), it is pertinent to establish whether or not the differences convey any biological information. There are several methodological explanations that may account for the apparent contrast between mandible size and postcranial size, all based on the understanding that the A. afarensis sample for each element is only one out of many possible outcomes. For example, the degree of sexual dimorphism in A. afarensis may be greater than the mandible would suggest, the sampling of postcranial elements that are highly divergent in size may represent an unusual sampling event, or the CV could be a more appropriate measure of size variation than the max/min ratio. The last explantion gains support from work by Cope & Lacy (1992, 1995) and suggests greater consistency between the mandibular and postcranial results. The range may be low relative to the CV in highly platykurtic (or bimodal) populations where the largest or smallest individuals are not being sampled. On the other hand, there are several biological explanations for why the mandible may be less sexually dimorphic than the proximal femur or humerus relative to the standards of extremely dimorphic modern hominoids. The first question is whether the results for postcrania correspond to body weight dimorphism. The results for the proximal femur and humerus are correlated because the bones of A.L. 288-1 serve as the lower bound in each case (A.L. 288-1 is also a member of the mandibular sample but is not the smallest specimen). The probabilities of sampling gorilla or orang-utan ranges as great as those of A. afarensis are somewhat less than 50% for either the humerus or proximal femur, which indicates a reasonable probability that sexual dimorphism in A. afarensis for these elements could be slightly higher than in any of the extant species. McHenry (1991: p. 30) has proposed that ‘‘strong sexual dimorphism may have characterized forelimb size in A. afarensis, but hindlimb joint size indicates only moderate levels of body size dimorphism in this species.’’    A. AFARENSIS 547

He cautioned that his earlier estimates of a high level of sexual dimorphism in the distal femur (McHenry, 1986) were overestimates of body size dimorphism, because femoral shaft size in early hominids is large relative to body weight (Ruff, 1988). Thus, although our results indicate a similar degree of bone size dimorphism in the humerus and femur, these may not correspond to body weight dimorphism if McHenry’s hypothesis is correct. Alternatively, if dimorphism in bone size alone can be taken as a surrogate for body size dimorphism, it is clear that body size dimorphism in A. afarensis was high. With regard to the mandible, sexual dimorphism in mandibular size is probably related to sexual dimorphism in canine size. As canine dimorphism in hominids is reduced relative to body size dimorphism (Johanson & White, 1979; Kay, 1982; Leutenegger & Shell, 1987; Fleagle, 1988; Plavcan & van Schaik, 1994), it may be that reduced sexual dimorphism in mandibular size is a correlate of this reduced canine dimorphism. Distinguishing between this possibility and the methodological arguments presented above can be facilitated by studying the pattern of sexual dimorphism in other fossil hominids.

Conclusions Our results demonstrate that to take into account the full samples available for fossil taxa provides a more conservative view than is suggested from the results of pairwise compari- sons using exact randomization, even when the fossil sample numbers only three to five specimens. We caution against the use of pairwise comparisons for hypothesis testing with fossil samples, although such methods retain utility to assess the phenetic affinities between two specimens. To account for sample size, bootstrapping offers a simple method of comparing statistics of size variation from fossil samples to simulated ‘‘fossil samples’’ of modern hominoids. An application of this procedure to range-based statistics and CVs for A. afarensis suggests with high probability that A. afarensis was more variable in size than are modern humans and chimpanzees, and that its degree of sexual dimorphism approached, but did not exceed that of the most dimorphic modern hominoids. There is little justification for dividing the Hadar sample into two species based on size alone, assuming that the sexual dimorphism displayed by gorillas and orang-utans provides an appropriate extreme in living by which to judge early hominids. From a phylogenetic standpoint, A. afarensis represents the earliest hominid for which sexual dimorphism has been examined in detail, and it apparently shares with the gorilla a high degree of sexual dimorphism. For the two most likely interpretations of the evolution- ary relationships among African apes and humans (a human/chimp clade and a chimp/ gorilla clade), the common ancestor of all three living species is most parsimoniously interpreted to have exhibited a high level of sexual dimorphism. It follows that a reduced level of sexual dimorphism has been derived independently in modern humans and chimpanzees.

Acknowledgements We extend our thanks to the Center for Research and Conservation of Cultural Heritage (Ethiopian Ministry of Information and Culture) and the National Museum of Ethiopia for permitting two of us (W.H.K., C.A.L.) to study the Hadar remains, and for their cooperation in that endeavor. Dana Cope, John Fleagle and an anonymous reviewer provided helpful 548 . .  ET AL. comments on the manuscript. We are also grateful to Fred Grine and Ozzie Pearson for data on Zulu humeri, and to the curators at the various institutions whose material is included in the reference sample of modern taxa.

References

Aiello, L. C. (1994). Variable but singular. Nature 368, 399–400. Cope, D. A. & Lacy, M. G. (1992). Falsification of a single species hypothesis using the coefficient of variation: a simulation approach. Am. J. phys. Anthrop. 89, 359–378. Cope, D. A. & Lacy, M. G. (1995). Comparative application of the coefficient of variation and range-based statistics for assessing the taxonomic composition of fossil samples. J. hum. Evol. 29, 549–576. Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Statist. 7, 1–26. Efron, B. & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall. Fleagle, J. G. (1988). Primate Adaptation and Evolution. New York: Academic Press. Fleagle, J. G., Kay, R. F. & Simons, E. L. (1980). Sexual dimorphism in early anthropoids. Nature 287, 328–330. Grine, F. E., Jungers, W. L. & Schultz, J. (1996). Phenetic affinities among early Homo crania from East and South . J. hum. Evol. 30, 189–225. Johanson, D. C. & White, T. D. (1979). A systematic assessment of early African hominids. Science 202, 312–330. Johanson, D. C., Lovejoy, C. O., Kimbel, W. H., White, T. D., Ward, S. C., Bush, M. E., Latimer, B. M. & Coppens, Y. (1982). Morphology of the partial hominid skeleton (AL288-1) from the Hadar Formation, Ethiopia. Am. J. phys. Anthrop. 57, 403–451. Johanson, D. C., Kimbel, W. H. & Rak, Y. (1994). New Pliocene hominids from the Hadar Formation, Ethiopia. Am. J. phys. Anthrop. 18 (Suppl.), 116–117. Josephson, S. C., Juell, K. E. & Rogers, A. R. (1996). Estimating sexual dimorphism by Method-of-Moments. Am. J. phys. Anthrop. 100, 191–206. Jungers, W. L., Falsetti, A. B. & Wall, C. E. (1995). Shape, relative size, and size-adjustments in morphometrics. Yearb. phys. Anthrop. 38, 137–161. Kay, R. F. (1982). Sexual dimorphism in Ramapithecinae. Proc. Nat. Acad. Sci. U.S.A. 79, 209–212. Kimbel, W. H., Johanson, D. C. & Rak, Y. (1994). The first skull and other new discoveries of Australopithecus afarensis at Hadar, Ethiopia. Nature 368, 449–451. Kramer, A. (1993). Human taxonomic diversity in the : Does Homo erectus represent multiple hominid species? Am. J. phys. Anthrop. 91, 161–171. Leutenegger, W. & Shell, B. (1987). Variability and sexual dimorphism in canine size of Australopithecus and extant hominoids. J. hum. Evol. 16, 359–367. Lovejoy, C. O., Johanson, D. C. & Coppens, Y. (1982). Hominid lower limb bones recovered from the Hadar Formation: 1974–1977 collections. Am. J. phys. Anthrop. 57, 679–700. Lovejoy, C. O., Kern, K. F., Simpson, S. W. & Meindl, R. S. (1989). A new method for estimation of skeletal dimorphism in fossil samples with an application to Australopithecus afarensis. In (G. Giacobini, Ed.) Hominidae, pp. 103–108. Milano: Jaka Book. Manly, B. F. J. (1991). Randomization and Monte Carlo Methods in Biology. New York: Chapman & Hall. McHenry, H. M. (1986). Size variation in the postcranium of Australopithecus afarensis and extant species of Hominoidea. J. hum. Evol. 15, 149–156. McHenry, H. M. (1991). Sexual dimorphism in Australopithecus afarensis. J. hum. Evol. 20, 21–32. Mosimann, J. E. (1970). Size allometry: size and shape variables with characteristics of the log normal and generalized gamma distributions. J. Am. Stat. Assoc. 655, 930–945. O’Higgins, P., Moore, W. J., Johnson, D. R., McAndrew, T. J. & Flinn, R. M. (1990). Patterns of cranial sexual dimorphism in certain groups of extant hominoids. J. Zool. Lond. 222, 399–420. Plavcan, J. M. (1994). Comparison of four simple methods of estimating sexual dimorphism in fossils. Am. J. phys. Anthrop. 94, 465–476. Plavcan, J. M. & van Schaik, C. P. (1994). Canine dimorphism. Evol. Anthropol. 2, 208–214. Richmond, B. G. & Jungers, W. L. (1995). Size variation and sexual dimorphism in Australopithecus afarensis and living hominoids. J. hum. Evol. 29, 229–245. Ruff, C. (1988). Hindlimb articular surface allometry in Hominoidea and Macaca, with comparisons to diaphyseal scaling. J. hum. Evol. 17, 687–714. Sokal, R. R. & Braumann, C. A. (1980). Significance tests for coefficients of variation and variability profiles. Syst. Zool. 34, 449–456. Sokal, R. R. and Rohlf, F. J. (1995). Biometry. New York: W. H. Freeman and Co. Wood, B. A., Li, Y. & Willoughby, C. (1991). Intraspecific variation and sexual dimorphism in cranial and dental variables among higher primates and their bearing on the hominid fossil record. J. Anat. 174, 185–205.