The expression of sexual dimorphism in human skeletal remains from ancient : regional and temporal differences and the impact on modern and population- specific metric sex estimation methods

A thesis submitted to the University of Manchester for the

degree of PhD in the Faculty of Life Sciences

2014

Emily Jane Marlow

Table of Contents

List of Tables ………………………………………………………………………... 8 List of Figures ……………………………………………………………………... 15 List of Equations ………………………………………………………………….. 17 Abbreviations ……………………………………………………………………… 18 Abstract …………………………………………………………………………….. 20 Declaration …………………………………………………………………………. 21 Copyright Statement ……………………………………………………………... 21 Acknowledgements ………………………………………………………………. 22 Preface ……………………………………………………………………………… 23 Map of .…………………………...………………………………. 24 Chronology of Ancient Egypt …………………………………………………... 25

1 INTRODUCTION ...... 28 1.1 General introduction and project overview ...... 28

1.1.1 Research questions ...... 29

1.1.2 Thesis outline ...... 30

1.2 Human remains in ...... 31

1.2.1 Biological information ...... 31

1.2.1.1 The ancient Egyptian population ...... 31

1.2.1.2 Palaeopathology ...... 39

1.2.1.3 Diet ...... 45

1.2.2 Cultural information ...... 46

1.3 Developmental biology ...... 49

1.3.1 Sex determination and differentiation ...... 49

1.3.1.1 Sex differences in early development ...... 50

1.3.2 Skeletal biology ...... 52

1.3.2.1 Skeletal development and growth ...... 52

1.3.2.2 Temporal changes in human growth ...... 61

2

1.4 Evolutionary and environmental biology ...... 67

1.4.1 Sexual selection and genetic influences ...... 67

1.4.2 Environmental influences ...... 73

1.4.2.1 Diet and nutritional stresses ...... 74

1.4.2.2 Climate and altitude ...... 79

1.4.2.3 Ecological influences ...... 87

1.4.2.4 Division of labour ...... 88

1.4.3 Ontogeny ...... 94

1.5 Fundamentals of palaeodemography: the estimation of sex ...... 97

1.5.1 Morphological sex estimation ...... 97

1.5.1.1 Reliability, scoring systems and weighting of traits ...... 99

1.5.1.2 Geometric morphometrics and ‘virtual’ osteology ...... 108

1.5.1.3 Population differences in morphology ...... 114

1.5.2 Metric sex estimation ...... 116

1.5.2.1 Computer program-based methods ...... 123

1.5.2.2 Population differences in body size and proportions ...... 125

1.5.2.3 The role of reference collections ...... 127

1.5.3 Molecular sex estimation ...... 129

1.6 Aims, objectives, and hypotheses ...... 131

1.6.1 Hypotheses ...... 132

2 MATERIALS AND METHODS ...... 134 2.1 Materials ...... 134

2.1.1 Identification and selection of skeletal collections ...... 134

2.1.2 Selection of skeletons ...... 138

2.1.2.1 Data collection ...... 140

2.1.3 Cemetery sites and excavations ...... 141

2.1.3.1 Keneh (Qena) ...... 141

2.1.3.2 Sheikh Farag ...... 143

3

2.1.3.3 ...... 147

2.1.3.4 Thebes ...... 154

2.2 Sex estimation ...... 159

2.2.1 Morphological sex estimation ...... 159

2.2.1.1 Issues and considerations ...... 160

2.2.1.2 The Phenice characteristics ...... 163

2.2.1.3 Morphological assessment of the bony pelvis...... 165

2.2.1.4 Morphological assessment of the skull ...... 167

2.2.1.5 Weighting and ranking of morphological methods and traits ...... 169

2.2.2 Metric sex estimation ...... 177

2.2.2.1 Selection of metric methods and skeletal dimensions ...... 177

2.2.2.2 Measuring procedures and application of discriminant functions ...... 188

2.3 Intra-observer and inter-observer error ...... 204

2.3.1 Test of intra-observer error ...... 205

2.3.2 Test of inter-observer error ...... 207

2.3.3 Other calculations and equations ...... 208

2.4 Age at death estimation ...... 210

2.5 Statistical analyses ...... 211

2.5.1 Missing data ...... 213

2.5.2 Independent and paired samples t-tests ...... 216

2.5.3 Analysis of Variance (ANOVA) ...... 216

2.5.4 Projection pursuit and PCA ...... 219

2.5.5 Discriminant function analysis and logistic regression ...... 222

2.5.5.1 Overview of discriminant function analysis ...... 222

2.5.5.2 Overview of logistic regression ...... 225

2.5.5.3 Discriminant function analysis versus logistic regression ...... 227

2.6 Test sample and methods ...... 231

2.6.1 Test sample ...... 232

4

2.6.2 Test methods ...... 234

3 RESULTS ...... 236 3.1 Descriptive demographics of sample ...... 236

3.1.1 Sample size and sex distribution ...... 236

3.1.2 Age at death estimation ...... 237

3.1.3 Exploration of data ...... 238

3.1.3.1 Outliers and extreme scores ...... 238

3.1.3.2 Distribution of data ...... 240

3.1.3.3 Descriptive statistics of skeletal dimensions ...... 244

3.1.3.4 Variability statistics of skeletal dimensions ...... 247

3.1.3.5 Comparison of means ...... 250

3.1.3.6 Z-Scores ...... 254

3.2 Intra- and inter-observer error ...... 254

3.2.1 Intra-observer error ...... 254

3.2.2 Inter-observer error ...... 258

3.2.2.1 Morphological sex estimation error ...... 260

3.3 Principal components analysis ...... 261

3.3.1 Principal components analysis of cranial variables ...... 262

3.3.1.1 Raw data ...... 262

3.3.2 Principal components analysis with expectation maximisation imputation ...... 266

3.3.2.1 Raw data ...... 266

3.3.2.2 Z-Scores ...... 267

3.4 Sex estimation ...... 271

3.4.1 Accuracy of previously developed metric sex estimation methods ...... 271

3.4.1.1 Accuracy of “modern methods” ...... 271

3.4.1.2 Accuracy of “population-specific” methods ...... 279

3.4.1.3 Accuracy of “living Egyptian” methods ...... 281

3.4.2 Discriminant function analysis ...... 283

5

3.4.2.1 Complete sample ...... 285

3.4.2.2 Old Kingdom Giza ...... 296

3.4.2.3 Late Period Giza ...... 296

3.4.3 Logistic regression analysis ...... 298

3.4.3.1 Complete sample ...... 300

3.4.3.2 Late Period sample ...... 312

3.4.4 Comparison of samples: two-factor ANOVA ...... 315

3.4.4.1 Effect of sex and time period on skeletal dimensions ...... 315

3.4.5 Degree of sexual dimorphism...... 323

3.4.5.1 Adjustment for size effects ...... 327

3.5 Test of discriminant functions and logistic equations on the -West sample 338

3.5.1 Test of discriminant functions ...... 338

3.5.2 Accuracy of discriminant functions ...... 339

3.5.3 Test of logistic regression equations ...... 340

3.5.4 Accuracy of logistic regression equations ...... 341

3.5.5 Comparison of discriminant functions vs. logistic regression equations ...... 342

3.5.6 Analysis of metric data from the Saqqara-West sample ...... 343

4 DISCUSSION ...... 344 4.1 Limitations of study ...... 365

5 CONCLUSIONS ...... 370 6 REFERENCES ...... 374 7 APPENDICES ...... 428 7.1 Sex estimation ...... 428

7.1.1 Phenice characteristics recording form ...... 428

7.1.2 Pelvic morphology recording form ...... 429

7.1.3 Cranial morphology recording form ...... 430

7.1.4 Osteometric recording form ...... 431

7.2 Age at death estimation ...... 433

6

7.2.1 Age at death recording form ...... 433

7.3 Variability statistics ...... 435

7.3.1 Variability statistics broken down by sex ...... 435

7.3.1.1 Males ...... 435

7.3.1.2 Females ...... 438

7.3.2 Variability statistics broken down by time period ...... 441

7.3.2.1 Pre-dynastic Period ...... 441

7.3.2.2 Old Kingdom ...... 444

7.3.2.3 Late Period ...... 447

7.3.3 Variability statistics broken down by cemetery site ...... 447

7.3.3.1 Keneh ...... 447

7.3.3.2 Sheikh Farag ...... 448

7.3.3.3 Giza ...... 451

7.3.3.4 Thebes ...... 453

7.4 Results of intra- and inter-observer error tests ...... 454

7.4.1 Intra-observer error ...... 454

7.4.2 Inter-observer error ...... 455

7.4.2.1 EJM vs. IK-O ...... 455

7.4.2.2 EJM vs. MR ...... 456

7.5 Per cent sexual dimorphism by time period ...... 456

7.6 Analysis of metric data from Saqqara-West sample ...... 458

7.6.1 Outliers and extreme scores ...... 458

7.6.2 Descriptive statistics ...... 458

7.6.2.1 Comparison of means ...... 459

Word count: 79,914

7

List of Tables

Table 1.2A: Summary of studies exploring palaeopathology in the Nile Valley 39

Table 1.2B: Summary of selected studies exploring sex differences in skeletal 41 indicators of palaeopathology

Table 1.5A: Accuracy rates in studies testing the Phenice method 98

Table 1.5B: Inter-observer reliability for measures of the index of sexualisation 107

Table 1.5C: Summary of studies examining the utility of “virtual osteology” methods 112

Table 1.5D: Summary of studies examining odontometric methods of sex estimation in 117 juveniles Table 1.5E: Summary of studies examining odontometric methods of sex estimation in 118 adults Table 1.5F: Studies creating population-specific sex estimation methods for ancient 121 Egyptians Table 1.5G: Summary of studies presenting metric sex estimation methods for use in 122 the living Egyptian population

Table 1.6A: Study hypotheses 132

Table 2.1A: Collections of ancient Egyptian skeletal remains previously sampled in 134 published studies

Table 2.1B: The Egyptian series from the Peabody Museum collection (Source: 136 Herschensohn, Olivia; Personal Communication, 2011)

Table 2.1C: Previous studies using the Gizeh ‘E’ series 153

Table 2.2A: The three Phenice characteristics for sexing the pubic bones 164

Table 2.2B: Features of the bony pelvis used for the estimation of sex 166

Table 2.2C: Features of the skull used for the estimation of sex 167

Table 2.2D: Accuracy, precision, and reliability of the Phenice characteristics in the 171 estimation of sex

Table 2.2E: Accuracy, precision, and reliability of the pelvic morphology method in the 172 estimation of sex (Source: Rogers & Saunders, 1994; with additions)

Table 2.2F: Accuracy, precision, and rank of individual sex indicators used in the 174 cranial morphology method in three different skeletal populations

8

Table 2.2G: Overall rank and accuracy, precision, and reliability of individual cranial 175 sex indicators used to estimate sex in the present study

Table 2.2H: Summary of “Modern” metric methods tested, including original study 178 populations and published accuracy rates

Table 2.2I: Summary of “Population-Specific” methods tested, including original study 180 populations and published accuracy rates

Table 2.2J: Summary of “Living Egyptian” methods tested, including original study 181 populations and published accuracy rates

Table 2.2K: Definitions of skeletal dimensions recorded for metric analysis 182

Table 2.2L: Cranial functions for pooled White and Black populations (Giles & Elliot, 190 1963)

Table 2.2M: Second cervical vertebra (C2) functions (Wescott, 2000) 192

Table 2.2N: Maximum diameter of femoral head sectioning points (Krogman & İşcan, 193 1986)

Table 2.2O: Supero-inferior femoral neck diameter function (Seidemann et al, 1998) 194

Table 2.2P: Femoral shaft circumference method (İşcan & Miller-Shaivitz, 1984a) 194

Table 2.2Q: Univariate sectioning points for tibial measurements (İşcan & Miller- 195 Shaivitz, 1984b)

Table 2.2R: Multivariate discriminant functions for tibial measurements (İşcan & Miller- 195 Shaivitz, 1984b)

Table 2.2S: Humeral head sectioning points (Spradley & Jantz, 2011) 197

Table 2.2T: Humerus, radius and ulna functions (Holman & Bennett, 1991) 198

Table 2.2U: Radial head sectioning points (Berrizbeitia, 1989) 199

Table 2.2V: First metacarpal function (Scheuer & Elkington, 1993) 199

Table 2.2W: First metatarsal function (Robling & Ubelaker, 1997) 200

Table 2.2X: Functions for the multiple bones method (Stewart, 1979) 201

Table 2.2Y: Sectioning points of the population-specific long bones method (Raxter, 202 2007)

9

Table 2.2Z: Scapula functions (Dabbs, 2010) 203

Table 2.2AA: MC1 sectioning point for “Living Egyptian” Method 1 (Eshak et al, 2011) 203

Table 2.2BB: MC1 sectioning point for “Living Egyptian” Method 2 (El Morsi & Al 204 Hawary, 2013)

Table 2.5A: Summary of studies using discriminant function analysis or logistic 228 regression in metric sex estimation methods

Table 2.6A: Summary of the Saqqara-West necropolis skeletal sample 234

Table 3.1A: The sex distribution of the Egyptian samples included in the study 236

Table 3.1B: Frequency of skeletons with pelvic or cranial material in the study sample 237

Table 3.1C: Age distribution of study sample 238

Table 3.1D: Results of the Shapiro-Wilk normality test, and skewness and kurtosis 241 values for each variable

Table 3.1E: Descriptive statistics of skeletal dimensions 244

Table 3.1F: Variability statistics of the skeletal dimensions included in the study 247

Table 3.1G: Results of an independent samples t-test comparing male and female 251 means for all 63 skeletal dimensions included in the study

Table 3.2A: Results of a paired samples t-test showing the dimensions exhibiting 257 statistically significant differences between the original and retaken measurements

Table 3.2B: Results of the inter-observer error test between EJM and MR using 259 skeletal remains from the Peabody Museum collection

Table 3.2C: Kappa values and level of agreement for assessment of inter-observer 261 morphological sex estimation

Table 3.3A: Communalities for the initial and extraction principal components solution 262

Table 3.3B: Total variance accounted for by each of the components after the initial, 263 extraction, and rotation phases of the analysis (only the first four components are shown)

Table 3.3C: Unrotated component matrix showing component loadings 263

Table 3.3D: Rotated component matrix showing component loadings 264

10

Table 3.3E: Rotated component matrix showing component loadings in males and 265 females separately

Table 3.3F: Rotated component matrix showing component loadings 267

Table 3.3G: Rotated component matrix showing component loadings in males 268

Table 3.3H: Rotated component matrix showing component loadings in females 269

Table 3.4A: Correct sex estimates and accuracy rates associated with 12 “modern” 271 metric sex estimation methods when tested on ancient Egyptian skeletal remains

Table 3.4B: Correct sex estimates and accuracy rates associated with two 279 “population-specific” metric sex estimation methods when tested on the study sample

Table 3.4C: Correct sex estimates and accuracy rates associated with two “living 282 Egyptian” metric sex estimation methods when tested on the study sample

Table 3.4D(i): Cranial discriminant function 1 (complete sample) 285

Table 3.4D(ii): Classification results associated with cranial function 1 (complete 287 sample)

Table 3.4E(i): Cranial discriminant function 2 (complete sample) 288

Table 3.4E(ii): Classification results associated with cranial function 2 (complete 288 sample)

Table 3.4F(i): Femoral discriminant function (complete sample) 289

Table 3.4F(ii): Classification results associated with the femoral function (complete 289 sample)

Table 3.4G(i): Tibial discriminant function (complete sample) 290

Table 3.4G(ii): Classification results associated with the femoral function (complete 290 sample)

Table 3.4H(i): Upper limb discriminant function (complete sample) 291

Table 3.4H(ii): Classification results associated with the upper limb function (complete 291 sample)

Table 3.4I(i): MC1 discriminant function (complete sample) 292

Table 3.4I(ii): Classification results associated with the MC1 function (complete 292 sample)

11

Table 3.4J(i): Lower limb discriminant function 1 (complete sample) 293

Table 3.4J(ii): Classification results associated with the lower limb function 1 293 (complete sample)

Table 3.4K(i): Lower limb discriminant function 2 (complete sample) 294

Table 3.4K(ii): Classification results associated with the lower limb function 2 294 (complete sample)

Table 3.4L(i): PC1 variables discriminant function (complete sample) 295

Table 3.4L(ii): Classification results associated with the first principal component 295 variables with the highest loadings (complete sample)

Table 3.4M(i): Cranial discriminant function 1 (Late Period Giza sample) 297

Table 3.4M(ii): Classification results associated with cranial function 1 (Late Period 297 Giza sample)

Table 3.4N(i): Cranial discriminant function 2 (Late Period Giza sample) 298

Table 3.4N(ii): Classification results associated with cranial function 2 (Late Period 298 Giza sample)

Table 3.4O(i): Intercept model statistics for cranial equation 1 (complete sample) 300

Table 3.4O(ii): Complete model statistics for cranial equation 1 (complete sample) 301

Table 3.4O(iii): Classification table for cranial equation 1 (complete sample) 302

Table 3.4O(iv): Variables in cranial equation 1 (complete sample) 302

Table 3.4O(v): Variables in cranial equation 1 (complete sample); backward likelihood 304 ratio method

Table 3.4P(i): Intercept model statistics for cranial equation 2 (complete sample) 305

Table 3.4P(ii): Complete model statistics for cranial equation 2 (complete sample) 305

Table 3.4P(iii): Classification table for cranial equation 2 (complete sample) 306

Table 3.4P(iv): Variables in cranial equation 2 (complete sample) 306

Table 3.4Q(i): Intercept model statistics for femoral equation (complete sample) 306

Table 3.4Q(ii): Complete model statistics for the femoral equation (complete sample) 307

12

Table 3.4Q(iii): Classification table for the femoral equation (complete sample) 307

Table 3.4Q(iv): Variables in the femoral equation (complete sample) 307

Table 3.4R(i): Intercept model statistics for tibial equation (complete sample) 308

Table 3.4R(ii): Complete model statistics for the tibial equation (complete sample) 308

Table 3.4R(iii): Classification table for the tibial equation (complete sample) 308

Table 3.4R(iv): Variables in the tibial equation (complete sample) 309

Table 3.4S(i): Intercept model statistics for the MC1 equation (complete sample) 309

Table 3.4S(ii): Complete model statistics for the MC1 equation (complete sample) 309

Table 3.4S(iii): Classification table for the MC1 equation (complete sample) 310

Table 3.4S(iv): Variables in the MC1 equation (complete sample) 310

Table 3.4T(i): Intercept model statistics for the lower limb equation (complete sample) 310

Table 3.4T(ii): Complete model statistics for the lower limb equation (complete 311 sample)

Table 3.4T(iii): Classification table for the lower limb equation (complete sample) 311

Table 3.4T(iv): Variables in the lower limb equation (complete sample) 311

Table 3.4U(i): Intercept model statistics for cranial function 1 (Late Period sample) 312

Table 3.4U(ii): Complete model statistics for cranial equation 1 (Late Period sample) 312

Table 3.4U(iii): Classification table for cranial equation 1 (Late Period sample) 312

Table 3.4U(iv): Variables in cranial equation 1 (Late Period sample) 313

Table 3.4V(i): Intercept model statistics for cranial equation 2 (Late Period sample) 313

Table 3.4V(ii): Complete model statistics for cranial equation 2 (Late Period sample) 314

Table 3.4V(iii): Classification table for cranial equation 2 (Late Period sample) 314

Table 3.4V(iv): Variables in cranial equation 2 (Late Period sample) 314

Table 3.4W: Results of the two-factor ANOVA tests examining the effect of sex and 315 time period on skeletal dimensions

13

Table 3.4X: Per cent dimorphism and sexual dimorphism index scores for each of the 323 63 skeletal dimensions included in the study (complete study sample)

Table 3.4Y: Descriptive statistics of skeletal measurements after adjustment for size 328

Table 3.4Z: Variability statistics of skeletal dimensions after adjustment for size 330

Table 3.4AA: Results of the independent samples t-test comparing male and female 332 size-adjusted means for all 63 skeletal dimensions used in the study

Table 3.4BB: Mean values and significance of skeletal dimensions after adjustment 335 for size effects, broken down by principal time period

Table 3.5A: Discriminant functions that were blind tested on skeletons from the 338 Saqqara-West necropolis

Table 3.5B: Accuracy rates, broken down by sex and time period, associated with the 339 application of discriminant functions to the complete sample of skeletons from the Saqqara-West necropolis

Table 3.5C: Logistic regression equations that were blind tested on skeletons from the 341 Saqqara-West necropolis

Table 3.5D: Accuracy rates, broken down by sex and time period, associated with the 342 application of logistic regression equations to the complete sample of skeletons from the Saqqara-West necropolis

Table 3.5E: Comparison of accuracy rates associated with the discriminant functions 343 and logistic regression equations when tested on the Saqara-West sample

Table 4.1A: Summary of previous studies reporting metric sex estimation methods for 350 the tibia and femur

14

List of Figures

Figure 1.2A: Types of peoples (‘The Medley of Invaders’) who, according to Petrie 32 (1939), invaded and settled in ancient Egypt

Figure 1.2B: Mural illustrating the Book of Gates from the tomb of , showing 39 (from left to right) four Libyans, a Nubian, a Semite, and an Egyptian

Figure 1.3A: The process of endochondral bone formation 53

Figure 1.5A: Trait expression and ordinal scores for the the subpubic 104 concavity/contour (top), the medial aspect of the ischio-pubic ramus (middle), and the ventral arc (VA). Figure 2 from Klales et al, 2012

Figure 1.5B: Lateral rendered images and midsagittal transformation grids depicting 109 mean female (A) and male (B) cranial shape

Figure 2.1A: Map of ancient Egypt showing the location of Keneh (Qena) 142

Figure 2.1B: Map of ancient Egypt showing the location of Naga-ed-Dêr 144

Figure 2.1C: Map of ancient Egypt showing the location of Giza 148

Figure 2.1D: Plan of the Giza Necropolis showing the Eastern and Western 149 cemeteries ( Fields) of the pyramid of

Figure 2.1E: Overview plan of the Giza necropolis with the American, German– 150 Austrian, and Egyptian concessions indicated. Figure 2.1 from Der Manuelian, 2009: 24

Figure 2.1F: Map of ancient Egypt showing the location of Thebes 154

Figure 2.1G: Plan of Thebes. (Source: Redford, 2001: 385) 155

Figure 2.1H: Plan of the central area of the Theban necropolis, western Thebes. 156 (Source: Redford, 2001: 382)

Figure 2.1I: The site of Deir el-Medina 157

Figure 2.1J: Scenes from the tomb of Sennedjem. The North wall 158

Figure 2.2A: The Phenice characteristics 164

Figure 2.2B: Sex differences in the greater sciatic notch. Figure 2 from Buikstra & 166 Ubelaker, 1994: 18. Drawing by P. Walker

15

Figure 2.2C: Scoring system for sexually dimorphic cranial features. Figure 4 from 168 Buikstra & Ubelaker (1994: 20)

Figure 2.2D: The human cranium showing location of the “forehead landmark” (red 187 arrow). Figure from Buikstra & Ubelaker, 1994: 72, with additions

Figure 2.2E: Line drawing of the second cervical vertebra from a superior view (A) 191 and lateral view (B) illustrating measurements used. Figure 1 from Wescott, 2000

Figure 2.2F: The supero-inferior femoral neck diameter measurement (SID). Figure 193 1 from Seidemann et al, 1998

Figure 2.5A: The relationship between the alpha (α) level and P-value of the test 212 statistic. The bell-shaped curve represents the probability of every possible outcome under the null hypothesis

Figure 2.6A: Map of the Memphite region showing the locations of Giza, Saqqara, 232 and Memphis

Figure 3.1A: Boxplot for transverse breadth of tibia (TB) showing three 239 measurements considered to be outliers (two male and one female) and one male measurement considered to be an extreme score

Figure 3.2A: Dimensions included in the study demonstrating mean per cent error 255 >1%

Figure 3.2B: Skeletal dimensions that exhibited scores >1.00 mm 256

Figure 3.2C: Skeletal variables demonstrating values of R <0.95 in the inter- 259 observer error test between EJM and IK-O

Figure 3.4A: Total, male, and female accuracy rates associated with 12 modern 276 metric sex estimation methods, ranked in descending order according to total accuracy

Figure 3.4B: Total, male, and female accuracy rates associated with two population- 280 specific metric sex estimation methods, ranked in descending order according to total accuracy

Figure 3.4C: Total, male, and female accuracy rates associated with two “living 282 Egyptian” metric sex estimation methods, ranked in descending order according to total accuracy

Figure 3.4D: Skeletal dimensions showing per cent sexual dimorphism >15% 326

16

Figure 3.5A: Accuracy rates associated with Function 1 when applied to the test 340 sample from the Saqqara-West necropolis

List of Equations

Equation 2.3A [(M1 – M2) ÷ M1] x 100 = per cent error 205

Equation 2.3B √ [(∑D2) ÷ (2N)] = TEM (mm) 206

Equation 2.3C 1 – [(TEM2) ÷ (SD2)] = R 206

Equation 2.3D [(Male mean – female mean) ÷ female mean] x 100 = %D 208

Equation 2.3E t= (b1 – b2) ÷ √(AB ÷ df), where df = (M1 + F1 + M2 + F2) – 4 210

Equation 2.3F A = [(M1 + F1) ÷ (M1 x F1)] + [(M2 + F2) ÷ (M2 x F2)] 210

2 2 2 2 Equation 2.3G B = (M1 – 1) x sm1 + (F1 – 1) x sf1 + (M2 – 1) x sm2 + (F2 – 1) x sf2 210

Equation 2.3H (b1 – b2) = (Ȳm1 – Ȳf1) – (Ȳm2 – Ȳf2) 210

17

Abbreviations

In line with the Guidelines for Human Gene Nomenclature (Wain et al, 2002), gene symbols are presented in uppercase, italicised text. Protein designations are the same as the gene symbol but are not italicised.

AAR Aspartic acid racemisation AD Anno Domini aDNA Ancient deoxyribonucleic acid AMTL Antemortem tooth loss ANOVA Analysis of variance BC Before Christ BL Buccolingual BMD Bone mineral density BMI Body mass index c. Circa (“around”) C Carbon C2 Second cervical vertebra CT Computed tomography DF Degrees of freedom DFA Disciminant function analysis DNA Deoxyribonucleic acid EJM Emily Jane Marlow ELISA Enzyme-linked immunosorbent assay EM Expectation maximisation ER Oestrogen receptor FCG Four core genotypes FSH Follicle-stimulating hormone GH Growth hormone GM Geometric morphometrics GnRH Gonadotropin-releasing hormone HES Health Examination Survey IGF-1 Insulin-like growth factor 1 IgM Immunoglobulin M IK-O Iwona Kozieradzka-Ogunmakin JJL Jago James Livingstone LEH Linear enamel hypoplasia LH Luteinising hormone LP Late Period LRA Linear regression analysis

18

MANOVA Multiple analysis of variance MC1 Metacarpal 1 MD Mesiodistal MDCT Multidetector computed tomography mtDNA Mitochondrial deoxyribonucleic acid MR Michelle Raxter mRNA Messenger ribonucleic acid MT1 Metatarsal 1 N Nitrogen NHM Natural History Museum OK Old Kingdom PCA Principal components analysis PCR Polymerase chain reaction PDYN Prodynorphin PP Ptolemaic Period RSH Relative sitting height RT-PCR Reverse transcriptase-polymerase chain reaction SD Standard deviation SE Standard error SEE Sum of squares for error SNP Single nucleotide polymorphism SRY Sex-determining region on Y chromosome gene TAC1 Tachykinin precursor 1 TDF Testis-determining factor TEM Technical error of measurement WEA Workshop of European Anthropologists

19

Abstract

When considered in the context of other sources of information, the analysis of human remains can provide important insights into the population of ancient Egypt. Sex is an important component both of the biological profile of an individual, and the demographic profile of a population. Metric sex estimation methods are usually preferred in instances where skeletal remains are fragmentary or damaged; however, metric techniques are prone to error as a result of several biases, notably population differences in body size and skeletal proportions. In addition, many commonly used metric equations have not been validated for use with ancient

Egyptians, and few population-specific equations exist. The study sample consists of 318 adult individuals, represented by a complete skeleton (n=162) or isolated cranium (n=156). The majority of individuals date to Old Kingdom (n=106) or Late Period (n=154) Giza. In addition, 43 individuals date to Pre-dynastic Period Keneh, 13 individuals to Middle Kingdom Sheikh Farag, and two individuals to New Kingdom Thebes. The sex of each individual was estimated using standard morphological methods. A total of 63 dimensions of the skeleton, or as many as it was possible to obtain, were measured for each individual in the study sample. Tests of intra- and inter-observer error revealed that the majority of measurements used in the study can be reliably measured. Testing of 12 modern methods of metric sex estimation revealed total weighted accuracy rates as low as 30–40%; many of the methods were additionally exceptionally poor at correctly estimating the sex of males. Previously created population- specific metric equations produced total accuracy rates ranging from 79.2–100% when tested on the study sample. New population-specific equations created using data collected from the study sample are associated with high rates of correct sex classification, often in excess of 95%.

Three discriminant functions and three logistic regression equations created as part of this research were tested on a sample of 119 adult skeletons from late Old Kingdom and Ptolemaic

Period Saqqara. Two functions and one equation produced acceptable accuracy rates in males and females separately in the Old Kingdom subsample. Results of principal components analysis and adjusting for body size suggest that some skeletal dimensions are sexually dimorphic with regards to both size and shape. All results arising from this research are discussed in the context of known factors affecting skeletal size and proportion differences between different populations and the expression of sexual dimorphism.

20

Declaration

No portion of the work referred to in the thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning.

Copyright Statement

i. The author of this thesis (including any appendices and/or schedules to this thesis)

owns certain copyright or related rights in it (the “Copyright”) and she has given The

University of Manchester certain rights to use such Copyright, including for

administrative purposes.

ii. Copies of this thesis, either in full or in extracts and whether in hard or electronic

copy, may be made only in accordance with the Copyright, Designs and Patents Act

1988 (as amended) and regulations issued under it or, where appropriate, in

accordance with licensing agreements which the University has from time to time.

This page must form part of any such copies made.

iii. The ownership of certain Copyright, patents, designs, trade marks and other

intellectual property (the “Intellectual Property”) and any reproductions of copyright

works in the thesis, for example graphs and tables (“Reproductions”), which may be

described in this thesis, may not be owned by the author and may be owned by

third parties. Such Intellectual Property and Reproductions cannot and must not be

made available for use without the prior written permission of the owner(s) of the

relevant Intellectual Property and/or Reproductions.

iv. Further information on the conditions under which disclosure, publication and

commercialisation of this thesis, the Copyright and any Intellectual Property and/or

Reproductions described in it may take place is available in the University IP Policy

(see http://documents.manchester.ac.uk/DocuInfo.aspx?DocID=487), in any

relevant Thesis restriction declarations deposited in the University Library, The

University Library’s regulations (see

http://www.manchester.ac.uk/library/aboutus/regulations) and in The University’s

policy on Presentation of Theses.

21

Acknowledgements

There are so many people to thank. None of the things I have ever done or achieved would have been possible without the love, support, and no-nonsense attitude of my wonderful mother, Marianne, and sister, Abi. I do not know what I would do without you. To my former and current supervisory team, Professors Andrew Chamberlain, and Rosalie , and to John

Denton and Drs. Ryan Metcalfe and Ian Burney, thank you for all your help and support. I hope you have enjoyed the past three years as much as I have. I am also particularly indebted to

Professor David and the KNH Centre for Biomedical Egyptology for funding or part-funding two of my overseas research trips and for contributing to my third year tuition fees. To my PhD colleagues both past and present, and in particular Conni, Iwona, Steph, Lidjia, and Roger, thank you for making our days in the office so much fun. Iwona – thanks for all your help in testing inter-observer error. I look forward to our future work together! I also wish to extend my thanks to Joanne Robinson for proofreading my Introduction, to Dr Janet Fletcher for being a friend and unofficial mentor, and for letting me gatecrash her dig year upon year, to Angela

Thomas for proofreading my entire thesis, and to Dr Michelle Raxter for providing data for my inter-observer error test.

I am particularly grateful to the curators and staff at the museums and institutions I visited for allowing me to examine and collect data from their collections of ancient Egyptian human skeletons. To Dr Michèle Morgan, Olivia Herschensohn and colleagues at the Peabody

Museum of Archaeology and Ethnology at Harvard University in Boston, and to Dr Marta

Mirazon Lahr, Dr Ronika Power, and Ms Maggie Bellatti at the Leverhulme Centre for Human

Evolutionary Biology at the University of Cambridge, thank you for all your help before, during and after my trips. To Dr Maria Teschler-Nicola, Ronald Mühl, August Walch and colleagues at the Natural History Museum of Vienna, danke schön! I also wish to extend special thanks to

Peter Der Manuelian, Philip J King Professor of Egyptology at Harvard University, for his help and advice, for inviting me to lectures and demonstrations, and for sending me home from

Boston with a complimentary copy of his wonderful book, of Nucleus Cemetery G

2100, Part I. Last but not least, to all my friends, family and extended family, thank you for taking an interest in what I do and for giving me the time and space to write my thesis. To those of this group who are no longer with us – I wish you could have seen this.

22

Preface

I first visited Egypt in May/June 2002 at the age of 19, having spent the previous nine months working in a care home for the elderly in a small Arabic village in Israel for my gap year project, and finally fulfilled a life-long ambition. Prior to this, I had completed Sixth Form College, receiving an A and two B grades in A-Level Chemistry, Biology and Mathematics, respectively, in the summer of 2001. My career in higher education began at the University of Leeds from where I received an upper second class honours degree (BSc) in Medical Sciences (with an emphasis on gross and regional anatomy) in 2005. During the second and third years of this degree I was granted special permission to study modules in human evolution and the archaeology of death and burial that were taught outside of the School of Biomedical Sciences.

In September 2005 I began an MSc degree in Forensic Anthropology at the University of

Bradford. I received distinctions for all of my modules, which included musculo-skeletal anatomy, mathematics and quantitative methods, archaeology of human remains, taphonomy and chemistry of human remains, and advanced forensic anthropology, and I graduated with a

Distinction in September 2006. I subsequently accepted a position as a trainee Medical Writer for a small medical communications agency in November 2006, and was proud to be promoted to Senior Medical Writer in June 2009. I continued to work for this company on a part-time basis throughout the duration of my PhD, which I began in September 2010. During this four year project, I was lucky enough to travel to some beautiful cities – Boston, Vienna, and Cambridge – to study museum or institutional collections of ancient Egyptian skeletal remains. I have met some wonderful people and learned some important skills. Ultimately, this PhD project has been my dream, my goal, and my ambition for as long as I can remember. I hope I have done it justice.

23

Map of Ancient Egypt

Map of ancient Egypt showing key cemetery sites and important cities and settlement sites. (Adapted by Jago James Livingstone (JJL; Graphic Designer) from: http://oi.uchicago.edu/research/lab/map/maps/egypt.html, with additions/amendments. Accessed February 2013).

24

Chronology of Ancient Egypt

Adapted from Shaw, 2000: 481–489.

Period Dates (BC) Key rulers

PREHISTORY

. Palaeolithic c. 700,000–7000 _ . Epipalaeolithic c. 10,000–7000 . Saharan Neolithic c. 8800–4700

PREDYNASTIC PERIOD

. Maadi Cultural Complex c. 4000–3200 (North only) . Badarian Period c. 4400–4000 . Amratian (Naqada I) c. 4000–3500 _ Period . Gerzean (Naqada II) c. 3500–3200 Period . Naqada III/‘Dynasty 0’ c. 3200–3000

PHARAONIC/DYNASTIC PERIOD (3000–332 BC)

Early Dynastic Period 3000–2686 . 1st Dynasty 3000–2890 Aha, , , , Queen Merneith, , , Qa’a . 2nd Dynasty 2890–2686 Hetepsekhemwy, Raneb, , , Sened, Peribsen,

Old Kingdom 2686–2160 . 3rd Dynasty 2686–2613 , , , , ?, . 4th Dynasty 2613–2494 , Khufu, Djedefra, Khafra, Menkaura, . 5th Dynasty 2494–2345 , Sahura, Neferirkara, Shepseskara, Raneferef, Nyuserra, Menkauhor, Djedkara,

. 6th Dynasty 2345–2181 , Userkara, Pepy I, Merenra, Pepy II (Neferkara) . 7th and 8th Dynasties 2181–2160 Numerous kings called Neferkara

25

Period Dates (BC) Key rulers

First intermediate period 2160–2055 . 9th and 10th Dynasties 2160–2025 Herakleopolitan, Khety (Meryibra), Khety (Nebkaura), Khety (Wahkara), Merykara . 11th Dynasty (Thebes 2125–2055 , Intef II, Intef III only)

Middle Kingdom 2055–1650 . 11th Dynasty (all Egypt) 2055–1985 Mentuhotep II, Mentuhotep III, Mentuhotep IV . 12th Dynasty 1985–1773 , , Amenemhat II, Senusret II, Senusret III, Amenemhat III, Amenemhat IV, Queen Sobeknefru . 13th Dynasty 1773–after 1650 Wegaf, Sobekhotep II, Iykhernefert Neferhotep, Ameny-intef- amenemhat, , , Sobekhotep III, Sahathor, Sobekhotep IV, Sobekhotep V, . 14th Dynasty 1773–1650 Minor rulers

Second Intermediate Period 1650–1550 . 15th Dynasty () 1650–1550 , , , . 16th Dynasty (minor 1650–1550 Theban early rulers Hyksos) . 17th Dynasty (Thebes) 1650–1550 , Sobekemsaf I, Intef VI, Intef VII, Intef VIII, Soekemsaf II, ?, Taa,

New Kingdom 1550–1069 . 18th Dynasty 1550–1295 Ahmose, , , Thutmose II, Thutmose III, Queen , Thutmose IV, Amenhotep III, Amenhotep IV/, , , Ay,

Ramesside period 1295–1069 . 19th Dynasty 1295–1186 Rameses I, Sety I, Rameses II, Merenptah, Amenmessu, Sety II, , Queen Tausret

26

Period Dates (BC) Key rulers

. 20th Dynasty 1186–1069 Sethnakht, Rameses III–XI

Third Intermediate Period 1069–664 . 21st Dynasty 1069–945 , Amenemnisu, , , Osorkon the Elder, Siamun, Psusennes II . 22nd Dynasty 944–715 Shesonq I, , , Osorkon II, Takelot II, Shesonq III, Pimay, Sheshonq V, Osorkon IV . 23rd Dynasty 818–715 Kings in various centres . 24th Dynasty 727–715 Bakenrenef . 25th Dynasty (Kushite) 747–656 Piy, Shabaqo, Shabitqo, Taharqo, Tanutamani

Late Period 664–332 . 26th Dynasty (Saite) 664–525 Psamtek I, Nekau II, Psamtek II, , Ahmose, Psamtek III . 27th Dynasty (1st Persian 525–404 Cambyses, Darius I, , period) , Darius II, Artaxerxes II . 28th Dynasty 404–399 Amyrtaios . 29th Dynasty 399–380 , Nepherites II . 30th Dynasty 380–343 , Teos, Nectanebo II . 2nd Persian period 343–332 Artaxerxes III Ochus, Arses, Darius III Codoman

PTOLEMAIC PERIOD (332–30 BC)

. Macedonian Dynasty 332–305 , Philip Arrhidaeus, Alexander IV . 305–30 Ptolemy I–XII, VII Philopator, Ptolemy XIII–XV

ROMAN PERIOD (30 BC–AD 395)

27

1 INTRODUCTION

1.1 General introduction and project overview

The civilisation of ancient Egypt is one of the oldest, best known, and instantly recognisable of all ancient cultures. Sources of information about the people of ancient Egypt have traditionally focused on the things they built, drew, made, and recorded, with less emphasis on the physical remains of the individuals themselves. Human remains, however, are able to provide important information about past peoples that cannot be understood from other kinds of data. For example, population dynamics, environmental settings, and socioeconomic conditions may all be related to the distribution of individual health and wellbeing observable within the skeleton and how these parameters changed across time and space (Milner et al, 2000: 471). Numerous methods have thus been developed or adapted to enable biomedical, anthropological, and molecular analysis of human remains. However, as technology progresses and the ability to extract more and more information from skeletal materials increases, the importance of basic demographic parameters such as the sex of an individual should not be forgotten or overlooked.

Sex is most accurately assessed by morphological examination of the bony pelvis; however, this is not always possible when skeletons are badly damaged or fragmented. The rationale for this project therefore rests on one key observation: when presented with skeletal remains from archaeological contexts that might have suffered some degree of post- depositional damage or disintegration, the ability to estimate sex is often reliant on population- specific metric size differences between males and females. This is best illustrated by the current practices of osteologists who are contributing to ongoing excavation projects in Egypt.

For example, Jessica Kaiser (University of California, Berkeley), osteologist for the

Mapping Project, discussed the necessity of using metric methods to estimate the sex of skeletons excavated from Khentkawes’ Town, located slightly north of the valley temple of

Menkaure (c. 2532–2503 BC), in the 2008 season Preliminary Report. The metric methods used involved measurement of the femur, humerus, and sternum, and were not population-specific.

However, their use was considered essential to estimate sex because of the poor level of preservation of the remains (Lehner et al, 2008: 51). Similarly, metric methods of sex estimation are commonly used by the osteologists for the Amarna Project, an ongoing project to excavate

28

the city of Tell El-Amarna in (Rose, 2006; Zabecki & Dabbs, 2010; Zabecki et al,

2012).

Unfortunately, few population-specific metric methods of sex estimation are available for the ancient Egyptians, and those that are available have not been tested for accuracy on independent and/or dissimilar population samples. It is not surprising, therefore, that well- established metric sex estimation methods created using modern population samples have been applied to ancient Egyptian skeletal remains, despite a lack of studies validating this practice. It is hoped that the results of this project will help further to highlight the importance of population-specific metric sex estimation methods in osteological research, and will contribute to our understanding of ancient Egyptian culture and civilisation by providing a means to reconstruct demographic profiles of the population with increased accuracy and known precision.

1.1.1 Research questions

The principal question that this project was initially designed to address was:

‘How accurate are metric methods of sex estimation, created using modern population

samples, when applied to ancient Egyptian human skeletal remains?’

In seeking to address this question, a number of other lines of evidence and enquiry became relevant, including the developmental basis of sexual dimorphism of the skeleton and its expression in ancient Egyptian samples, and factors affecting skeletal size and proportions and the extent to which these impacted on the ancient Egyptians compared with other human populations. The impact of social, political and economic changes in ancient Egypt on skeletal size and the subsequent ability of metric sex estimation equations to accurately separate the sexes in different population samples were additionally explored. Thus, the research questions for this project may be more appropriately phrased as:

 Did sexual dimorphism exist in ancient Egyptian populations, and if so, how, if at all, did

patterns change over time and by geographic location?

 Do patterns of sexual dimorphism correlate with socioeconomic and political changes

occurring in ancient Egypt?

29

 To what extent do the size and proportions of ancient Egyptian skeletons affect the

ability of modern methods of metric sex estimation to accurately separate the sexes?

 Can population-specific methods of metric sex estimation be created, and if so, how

accurate are the equations?

 How accurate are population-specific methods of metric sex estimation when tested on

a temporally or geographically distinct ancient Egyptian sample?

1.1.2 Thesis outline

Following the traditional format, this thesis consists of five principal sections: Introduction,

Materials and Methods, Results, Discussion, and Conclusions. The purpose of the Introduction is to provide the context and background to the project, to establish why the present research was conducted and why it is important. It begins with an examination of the contribution that studying human remains can make to Egyptology, with particular emphasis on the importance of knowing the sex distribution of skeletal samples to conclusions about past societies. It then explores how it is possible to estimate sex from the skeleton by exploring the processes of sex determination and differentiation, as well as the presence of sex differences in embryonic growth and development, and growth of the skeleton during infancy, childhood and adolescence. The contribution of these differences to the attainment of adult sexual size dimorphism is additionally addressed, along with consideration of other factors that account for the causation or maintenance of sexual dimorphism. Finally, current methods of sex estimation are reviewed and critiqued, with. Chapter 2, Materials and Methods, provides a detailed description of the collections from which the skeletons analysed in this project were derived, including the geographical location of the cemetery sites, historical context, and excavations, as well as presenting the study methodology and statistical analyses used, with relevant justification provided where necessary. The results arising from this piece of research are presented in Chapter 3, primarily in tabular and graphical form to aid clarity, and these are discussed in depth within the context of previous research and historical and scientific evidence in Chapter 4. Chapter 5 and is followed by the list of references and appendices.

30

1.2 Human remains in Egyptology

When addressing the question ‘why study human remains?’ two overlapping categories of evidence are usually presented – the biological evidence that can be derived about an individual or population, and cultural information. For both of these categories, the fundamental importance of an accurate sex distribution should not be underestimated. The composition of mortuary samples in terms of sex, age at death, and social status is a critical source of information about past societies, and provides a more complete understanding of the potential effects of social processes on cemetery composition (Milner et al, 2000: 467–497). Furthermore, studies of past mortuary practices and causes of mortality, important for their presumed relationship to social organisation, behaviour and functioning, would be severely limited were it not possible to estimate the sex (and age at death) of individuals with an acceptable degree of accuracy and reliability (Milner et al, 2000: 467–497).

1.2.1 Biological information

Biological information derived from human remains may relate to a number of different life history and population-level factors, including patterns of health and disease, diet, mortality, migration, and population affinity, which are all of importance when studying past populations.

1.2.1.1 The ancient Egyptian population

1.2.1.1.1 The origin of the ancient Egyptians

Skeletal evidence for population movements has been used to address one of the most long- standing and controversial questions in Egyptology and biological anthropology: who were the ancient Egyptians and what was their geographical origin? In answer to this question, two conflicting hypotheses have been presented: that the ancient Egyptian state was formed via invasion and large-scale population replacement by a ‘Dynastic race’, or via an in situ development of indigenous Predynastic groups who presumably descended from regional

Neolithic people (Schillaci et al, 2009; Zakrzewski, 2007). Evidence in support of the former theory was predominantly presented by several early researchers including W. M. Flinders

Petrie (1853–1942), who suggested that the process of State formation was caused by new populations entering the Nile Valley during the ‘Semainean Period’ (most commonly viewed as a branch of the Gerzean or Naqada II culture), a time he described as the “tumultuous age of

31

dynastic invasion” (Petrie, 1939: 65). In his 1939 volume ‘The Making of Egypt’, Petrie takes as evidence for this theory the ivory handle of a flint blade showing a combat between short-haired men with bullet heads (the ‘Egyptian type’) and long-haired men (the ‘foreign type’ or ‘invaders’;

Figure 1.2A). According to Petrie, the images depicted in this ivory relief demonstrate large- scale invasion, something Petrie described as “…one of the greatest events in the history of

Egypt…” (Petrie, 1939: 65). Yet Petrie fails to acknowledge that the evidence presented by the ivory-handled blade is limited, being only one small example, and that the images could alternatively be a depiction of warfare between Egypt and her neighbours.

Figure 1.2A: Types of peoples (‘The Medley of Invaders’) who, according to Petrie (1939), invaded and settled in ancient Egypt. Number 1: the Prehistoric Egyptian, 2: representing a people who came from the

East, 3 & 4: peoples from the Eastern Mountains or the North, 5: a curly-headed, bearded people of whom nothing is known, 6: chief of the Fayum lake, and 7 & 8: types of the ‘Dynastic race’, king and follower, who are entirely different from all the other types (Source: image and legend text from Petrie, 1939: 67, 68, 73).

Even as late as 1972, Emery was continuing to postulate the theory that: “…the rapid advance of civilization in the Nile Valley immediately prior to the Unification was due to the advent of the ‘dynastic race’…” (Emery, 1972: 30). Additional evidence in support of this theory takes the form of cultural (such as cultural markers) or material (as in pottery from

Palestine and shells from the Near East) features within Egyptian Predynastic Period sites

(Zakrzewski, 2007). Interestingly, however, this theory is not supported by any skeletal evidence, although craniometric data have previously been used in an attempt to identify an

Egyptian ‘racial type’ (Zakrzewski, 2007). In comparison, more recent studies have used

32

skeletal remains from numerous time-successive sites in Upper and to demonstrate that State formation was primarily an indigenous process, albeit with prolonged small-scale migration. Zakrzewski (2007) used several statistical approaches, including principal component analysis and Mahalanobis D2 matrix computation, to assess 16 cranial measurements of individuals from six time-successive Egyptian populations (Badarian, the Early

Predynastic, the Late Predynastic, the Early Dynastic, the Old Kingdom, and the Middle

Kingdom) from Middle and . The results of the study demonstrated that genetic continuity occurred over the Egyptian Predynastic and Early Dynastic periods, but that a high level of genetic differentiation was sustained over this period of time. This suggests that the process of State formation itself may have been a mainly indigenous process, but that it likely occurred in association with in-migration to the Abydos region of the Nile Valley. This is easy to understand given that the Nile Valley probably acted as a ‘corridor’ of sorts through the hostile desert environment (Zakrzewski, 2007). These findings are generally supported by Schillaci and colleagues (2009), who also suggested that the Egyptian State was formed as a result of both in situ development and migration and gene flow; however, this group of researchers additionally demonstrated that temporal and geographic distributions of biological diversity were present in their samples, and that outside influence and admixture with other regional groups primarily occurred in Lower Egypt (Schillaci et al, 2009). These findings were a continuation and refinement of a previous study conducted by Irish (2006) that examined 36 dental morphological variants in 15 time-successive Neolithic to Roman Period Egyptian samples. Irish (2006) demonstrated that: 1) there may be a connection between Neolithic and subsequent

Predynastic Egyptians; 2) Predynastic Badarian and Naqada peoples may be closely related; 3) the Dynastic Period is likely an indigenous continuation of the Naqada culture; 4) there is support for overall biological uniformity through the Dynastic period; and 5) this uniformity may continue into postdynastic times (Irish, 2006). Further support for the theory of population continuity and the formation of the Egyptian State primarily by the indigenous people is provided by Johnson and Lovell (1994), who used nonmetric dental morphological data to further dispel the ‘Dynastic race’ theory by providing evidence that Cemetery T at Naqada was populated by a ruling or elite lineage of the local Naqada population, rather than a ruling or elite immigrant population (Johnson & Lovell, 1994). Thus, it can be seen that consideration of data arising

33

from skeletal analysis in questions of cultural and historical origins has allowed previous theories to be largely supplanted by others based on more compelling and scientifically rigorous evidence.

1.2.1.1.2 Population affinity

The concept of population affinity or “race” has a long and controversial history in the physical anthropological literature (Cartmill, 1998). Terms such as ‘White’ and ‘Black’ are commonly used to describe population or biological affinities, despite the fact that skeletal analysis may provide a reasonable estimate of original geographic origins but not direct assessment of skin colour. The terms ‘White’ and ‘Black’ were introduced by a German professor of medicine,

Johann Friedrich Blumenbach (1752–1840), in his work ‘On the Natural Variety of Mankind’, which was first published in 1775. In the second edition Blumenbach changed his original geographically based four-race arrangement to a five-group one that emphasised physical morphology. Blumenbach’s five categories were: Caucasian, the ‘white race’; Mongolian, the

‘yellow race’; Malayan, the ‘brown race’; Ethiopian, the ‘black race’; and American, the ‘red race’

(American Anthropological Association, 2007). In the contemporary physical anthropological literature, the terms ‘White’ and ‘Black’ are commonly used to denote individuals of European or

African origin, respectively, who are distinct in terms of skeletal morphology (Patriquin et al,

2003; see Section 1.5.1.3).

Early researchers such as Morton (1844) classified the ancient Egyptians as Caucasian of European descent, in no small part because of the prevailing misconception of the time that

“Black people have developed no important civilizations, nor have they made any significant contributions to world culture” (Foster, 1974: 175). Later researchers employing more rigorous anthropological techniques suggested that the ancient Egyptians were a hybrid of “Negroid” and

“non-Negroid” (Semitic) elements (Thomson, 1905), a theory that was additionally supported by metric evidence (Keita, 1993). A more generally accepted theory of ancient Egyptian biological affiliation was first suggested as early as 1946 by Batrawi, and subsequently gained support during the 1970s with the publication of numerous anthropological, osteological, and dental based studies. Batrawi (1946) suggested that the ancient Egyptian population consisted of two distinct but closely related types: northerners and southerners. Hillson (1978; cited in Keita,

1993) extended this by suggesting that there were two trends in the Nile Valley series of skeletal

34

remains: a Lower Egyptian/northern tendency, and an Upper Egyptian/southern tendency, the latter of which overlapped with more southern African groups. This is supported by the results of more recent studies, which found broad clinal variation from north to south (Keita, 1992), and a greater degree of similarity between a Somali population and a Predynastic Period sample from

Upper Egypt than between the Somali population and a late Dynastic sample from Lower Egypt.

The latter in turn was also found to show ties to the peoples of the circum-Mediterranean basin, both past and present (Brace et al, 1993). Despite this, the Predynastic of Upper Egypt and the

Late Dynastic of Lower Egypt were found to be more closely related to each other than they were to any other population (Brace et al, 1993).

Other more recent studies have demonstrated the close biological association between the ancient Egyptians and more southern African groups. For example, Godde (2009) demonstrated the presence of a close relationship with Nubian populations, and suggested that the homogeneity between the two groups may be accounted for by gene flow, which is easy to understand given the Egyptian occupation of Upper at various points in history (Godde,

2009). In addition, Irish and Konigsberg (2007) noted both similarities and differences between the inhabitants of Jebel Moya in the Sudan and the peoples from North Africa and sub-Saharan regions after examining up to 36 dental traits in 19 African population samples, lending support to the work of earlier authors.

Evidence from genetic studies of modern Egyptian populations may also add to our understanding of their population affinity. For example, a relatively early study by Mahmoud and colleagues (1987) using samples from the Dakhleh Province analysed by eight different blood group systems found that the Egyptian population has a mixture of African, Asian, and Arabian characteristics. Evidence from Y-chromosome haplotypes (allelic combinations) or haplogroups

(groups of similar haplotypes that share a common ancestor having the same single nucleotide polymorphism, SNP, mutation) demonstrated that the modern Egyptian population reflects a mixture of European, Middle Eastern and African characteristics, which highlights the importance of ancient and recent migration waves and gene flow (Manni et al, 2002). Using principal components analysis of haplogroup frequencies, Capelli et al. (2006) suggested that populations of the Mediterranean basin could be divided into four main clusters: North African,

Arab, Central-East Mediterranean, and West Mediterranean. Other groups of researchers such

35

as Bosch and colleagues (1997) were more specific, suggesting that the main feature of the genetic landscape in North Africa was a distinction between the populations of Egypt and Libya and the Berber and Arab population groups of Morocco, Algeria, Tunisia, and the Western

Sahara. They further found that Egypt and Libya demonstrated the smallest genetic distance to

Europe, a finding that has been supported by other researchers (Terreros et al, 2005)

Historical migrations between Africa and Europe and/or the Middle East via the

Levantine corridor are supported by a number of different studies (Cruciani et al, 2007; Luis et al, 2004; Rowland et al, 2007; Terreros et al, 2005). Acording to Arredi et al. (2004) the North

African pattern of Y-chromosomal variation is largely of Neolithic origin and shows an east-west cline that extends into the Middle East. The bi-directional nature of the Levantine corridor was suggested by Luis and colleagues (2004), who found that frequency distributions of specific Y- chromosome alleles in extant Egyptians were indicative of a greater similarity to Middle East populations than to any sub-Saharan African population. Salem et al. (2014) similarly found that

Egyptians living in the Asian part of Egypt (Sinai) were more closely related to Middle East populations than to populations from North Africa, while the opposite was true for Egyptians living in the North African part of Egypt (Ismalia). Examining mitochondrial DNA (mtDNA) haplotypes, Musilová et al. (2011) noted a close association between Egyptians and populations from the Arabian Peninsula, which highlights the importance of known trade routes across the Red Sea in the seventh to fourth millennia BC. Whole genome analysis of both mtDNA and Y-chromosome haplotype gene pools of a small Egyptian population from the

Western Desert oasis of el-Hayez conducted by Kujanová et al. (2009) revealed clear genetic evidence of a strong Near Eastern input. Futhermore, the whole genome sequencing strategy and molecular dating allowed the authors to detect the accumulation of local mtDNA diversity dating to 5,138 ± 3,633 years before the present (Kujanová et al, 2009).

The presence of trade routes passing into and out of Egypt through the Levant further helps to explain the findings of other studies which show distinct genetic clustering of Egyptians with Jordanians, Palestinians, and Saudi Arabians, as well as populations from Yemen (Badro et al, 2013). For example, known trade networks linking Egypt with Yemen included those for obsidian, and later through Aksum, spices, incense and other precious materials, in addition to slaves (Badro et al, 2013). Using principal components analysis, Boattini and colleagues (2013)

36

examined mtDNA variability in individuals from 30 East African populations and compared the data with those from 40 populations from Central and Northern Africa and the Levant. They identified four clusters, each with different geographic distribution and/or linguistic affiliation

(Boattini et al, 2013). One cluster was widespread in Ethiopia, where it was associated with different Afro–Asiatic-speaking populations, and showed shared ancestry with Semitic-speaking groups from Yemen and Egypt (Boattini et al, 2013). Additional studies have further highlighted links between Upper Egyptians and Ethiopians. For example, Stevanovitch and colleagues

(2004) demonstrated genetic similarities between a sedentary Egyptian population from Gurna and an Ethiopian population, suggesting conservation of the trace of an ancestral genetic structure from an East African population (Stevanovitch et al, 2004). Badro et al. (2013) suggested that the River Nile produced a natural genetic barrier and boundary of mtDNA differentiation within Africa, which accounts for the links seen between Egyptian, Ethiopian, and

Yemeni populations.

The unique location and geography of Egypt produced other genetic barriers, as well as gene-flow corridors, which have been identified through genetic studies of migration within

Egypt herself. Krings and colleagues (1999) analysed mtDNA variation in 224 individuals from various locations along the Nile and used data on restriction-site polymorphism frequency differences between Eurasian (northern) and sub-Saharan African (southern) population groups from the published literature to designate each mtDNA to the “northern” or “southern” type

(Krings, 1996). They found that proportions of northern and southern mtDNA differed significantly between Egypt, Nubia, and southern Sudan, with highest and lowest northern mtDNA diversity in Egypt and southern Sudan, respectively, and highest and lowest southern mtDNA diversity in southern Sudan and Egypt, respectively (Krings et al, 1999). The authors interpreted these results as evidence of bi-directional migration along the River Nile Valley, with no indication of barriers. They further found that Egypt and Nubia have low and similar amounts of divergence for both mtDNA types, which they suggested is consistent with historical evidence for long-term interactions between Egypt and Nubia (Krings et al, 1999). These findings are supported by those of Lucotte and Mercier (2003), who examined Y-chromosome haplotypes in

274 unrelated Egyptian males from Alexandria (the Delta), Upper Egypt, and Lower Nubia. They identified 15 different p491,fTaqI haplotypes in the Egyptian sample, of which three (V, XI, and

37

IV) were the most common. Of these, they found that haplotype V is a characteristic Arab haplotype and had a northern geographic distribution in the Nile River Valley of Egypt, while haplotype IV was characteristic of sub-Saharan populations and had a southern geographic distribution in Egypt (Lucotte & Mercier, 2003). Similarly, Fox (1997) observed a south–north decreasing gradient of the frequency of the Hpa I mtDNA polymorphism along the east African continent, indicative of migration from sub-Saharan Africa to the Mediterranean coast in the

Meroitic period (c. 300 BC–AD 400). Henn and colleagues (2012) suggested that the sub-

Saharan ancestries are a more recent introduction into North Africa, possibly occurring around

750 years ago in Egypt and coinciding with the trans-Saharan slave trade. These authors further suggested that two other important episodes contributed to the present-day ancestry in North

Africa: ancient “back-to-Africa” gene flow prior to the Holocene, and more recent gene flow from the Near East (Henn et al, 2012).

In addition to a south–north genetic gradient, other authors have reported longitudinal gene flow patterns (Fadhlaoui-Zid et al, 2011; Henn et al, 2012). For example, Fadhlaoui-Zid et al. (2011) noted an east–west clinal distribution of mtDNA haplotypes as well as a genetic discontinuity between Egypt and Libya, suggesting that the north Sahara Desert may represent a geographical barrier to gene flow. Nikita and colleagues (2012a) found no evidence for uniform gene flow across North Africa, and suggested that whenever the Sahara Desert intervened between different population groups, their biodistances increased. By contrast, no such pattern was observed for populations that could follow routes along the Nile River and/or the Mediterranean coast. As such, the authors suggested that the Sahara Desert restricted gene flow among North African populations and that the trade networks evidenced by archaeological studies possibly involved only a subset of male merchants and local traders

(Dupras & Schwarcz, 2001; Nikita et al, 2012a). In summary, evidence from cranial, dental, and genetic studies highlights the unique biological affinity of the ancient Egyptians, fitting neither into a ‘European’ nor ‘African’ group, but rather demonstrating a combination of European, Near

Eastern, and African characteristics. This reflects how the ancient Egyptians viewed themselves, as intermediate (in skin colour at least) between populations from the north and south, east and west (Figure 1.2B). Such findings emphasise the importance of Egypt as a

38

central entrepôt along trade and migratory routes that connected the three continents of Africa,

Europe and Asia.

Figure 1.2B: Mural illustrating the Book of Gates from the tomb of Seti I, showing (from left to right) four

Libyans, a Nubian, a Semite, and an Egyptian. (Source: http://genetiker.wordpress.com/2013/03/31/k-5- craniometric-admixture-analysis/. Accessed August 2013).

1.2.1.2 Palaeopathology

Palaeopathology is the study of disease that affected living organisms in the past (Ortner,

2011). According to some authors, the research and publications of Sir Marc Armand Ruffer

(1859–1917), largely on Egyptian mummies, marked the beginning of serious scientific interest in palaeopathology in general, as well as in Egyptology specifically (Baker & Judd, 2012: 212;

Ortner, 2011). Numerous studies using a range of different analytical techniques have explored the presence and/or prevalence of disease in ancient Egypt and Nubia by examining pathological markers in skeletal and mummified tissues (Table 1.2A).

Table 1.2A: Summary of studies exploring palaeopathology in the Nile Valley.

Disease class Diseases and Techniques Studies conditions

Infectious Malaria; tuberculosis; Immunochromatography; Bianucci et al, 2008; Cave, diseases leprosy; immunohistochemistry; 1939; Crubézy et al, 1998; schistosomiasis; macroscopic inspection; Donoghue et al, 2010; periostitis; syphilis(?); radiography; PCR; aDNA Dzierzykray-Rogalski, 1980; leishmaniasis analysis; ELISA; Haas et al, 2000; Kloos & David, spoligotyping 2002; Lalremruata et al, 2013; Nerlich et al, 1997; Nerlich et al, 2008; Rose, 2006; Smith, 1907;

39

Zink et al, 2001; Zink et al, 2003a; Zink et al, 2003b; Zink et al, 2006

Congenital Meningocele; Helical CT scanning; Boano et al, 2009; Cope & diseases and campomelia macroscopic inspection Dupras, 2011; Derry, 1913; malformations dysplasia; Hussien et al, 2009 osteogenesis imperfecta; hydrocephalus; spina bifida occulta

Tumours and Malignant tumours; Macroscopic inspection; Buzon, 2005; Pahl, 1986; Ruffer neoplasms benign tumours, e.g. radiography & Willmore, 1913; Strouhal, neurilemmoma; 1976; Zink, 1999 osteochondroma

Dental disease Periodontitis; caries; CT scanning; macroscopic Buzon & Bombak, 2010; Filce attrition; abscesses inspection Leek, 1966; Gerloni et al, 2009; Lovell & Whyte, 1999; Ruffer, 1913; Ruffer, 1920

Trauma Fractures Macroscopic inspection; Alvrus, 1999; Buzon, 2006; radiography Buzon & Richman, 2007; Judd, 2004; Judd, 2006; Rose, 2006

Endocrine & Anaemia; cribra Macroscopic inspection Fairgreave & Molto, 2000; Keita, metabolic orbitalia; porotic 2003; Keita & Boyce, 2006; disorders hyperostosis Rose, 2006

Degenerative Ankylosing Radiogrpahy; macroscopic Dequeker et al, 1997; Feldtkeller disorders spondylitis; arthritis; inspection et al, 2003; Rose, 2006; Ruffer, osteoporosis 1919; Zaki et al, 2009

Soft tissue Atherosclerosis; CT scanning; scanning Allam et al, 2011; Binder & diseases atheroma electron microscopy; Roberts, 2014; Ruffer, 1911; radiography Thompson et al, 2013

Multiple Dental disease; Radiography; macroscopic Armelagos, 1969; Bourke, 1971; pathologies osteoporosis; trauma; inspection; PCR Finch, 2011; Hussein et al, 2013 (population degenerative disease studies/case series)

Other skeletal Parietal perforation; Macroscopic inspection; Derry, 1914; Dupras et al, 2010; pathologies hyperostosis frontalis radiography Shahin et al, 2014 interna; diabetes mellitus

PCR, polymerase chain reaction; aDNA, ancient DNA; ELISA, enzyme-linked immunosorbent assay; CT, computed tompgraphy.

40

A selection of studies examining sex differences in palaeopathology are summarised in Table

1.2B.

Table 1.2B: Summary of selected studies exploring sex differences in skeletal indicators of palaeopathology.

Study Sample Key findings Implications

Buzon & New Kingdom Statistically significantly Possible consumption of wind- Bombak, Tombos higher rates of AMTL, blown sand in males involved in 2010 caries and severe wear quarrying; potential division of in males vs. females labour by sex necessitating different diets

Buzon, 2006 New Kingdom No differences between Similar levels of stress affecting Tombos males and females in both sexes rates of enamel hypoplasia

Fairgreave & Pre-Roman/ No sex difference in Similar levels of stress affecting Molto, 2000 Roman Period prevalence of cribra both sexes Dakhleh Oasis orbitalia

Zaki et al, Old Kingdom Osteoporosis was more Consistent with modern studies 2009 Giza frequent in females than (Cawthorn, 2011; Holroyd et al, males 2008; Kanis & Pitt, 1992); may be related to hormonal changes in females

Hussien et Graeco- Spina bifida occulta Disagrees with findings of a higher al, 2009 Roman Period affected males and prevalence in males in both Bahriyah females with similar historical (Masnicová & Beňuš, Oasis frequency 2003) and modern populations (Eubanks & Cheruvu, 2009; Fidas et al, 1987)

Allam et al, Dynastic Equal prevalence of Similar access to foods high in 2011; period atherosclerosis in males saturated fat in males and females Thompson et and females (David et al, 2010) al, 2013

41

Study Sample Key findings Implications

Alvrus et al, Middle More healed fractures in Possibly related to: walking over 1999 Kingdom males than females; the harsh, rocky terrain; military Semna South, significantly higher activities; domestic violence; Nubia prevalence of cranial feuding over resources or against a trauma in males than central political authority females; age-related fracture frequency pattern in males but not females

Buzon & New Kingdom No sex differences in Possible parry fractures associated Richman, Tombos and fracture frequency at with defending a blow to the head 2007; Judd, Middle Tombos; significantly (inter-personal violence); decrease 2004 Kingdom higher rate of long bone in traumatic injuries over time Kerma fractures in Kerma appear to coincide with Egyptian males vs. females policies in Nubia

Schrader, New Kingdom Entheseal and Females may have been engaged 2012 Tombos osteoarthritic changes in similar or slightly less strenuous tended to be more activities than males (body size severe in males than was not controlled for in this study, females; only making it difficult to interpret sexual osteoarthritic lipping of division of labour) the ankle occurred statistically significantly more frequently in males vs. females (P≤0.05)

AMTL, anti-mortem tooth loss.

1.2.1.2.1 Palaeopathology and the origin of agriculture in Egypt

It is widely believed that the adoption of an agricultural subsistence strategy provided the economic basis for the rise of states, development of hierarchical civilisation, and increase in social complexity (Dumond, 1961; Ehret, 2002; Ehrlich & Ehrlich, 2013; Larsen, 2006; Savory,

1994; Starling & Stock, 2007). In Egypt, state formation occurred much more rapidly after the adoption of farming than in many other parts of the (Allen, 1997). It is thought that domesticated crops and animals were not used in Egypt until after 6000 BC. Between 6000 and 5000 BC they were simply adjuncts to the dominant system of foraging. It was not until

42

around 5000 BC that purely agricultural villages pursuing a system of mixed husbandry appeared in the Delta and not until around 4000 BC that they appeared in Upper Egypt. It is likely that there was a long transitional phase lasting into the Third Millennium when purely agricultural villages coexisted with settlements pursuing combinations of foraging and farming

(Allen, 1997). According to traditional theory, the Egyptian state was established around 3000

BC when (?), King of Upper Egypt, conquered the Delta and unified the country

(Allen, 1997), although to some authors, this is more a “political legend” than explanation

(Savage, 2001).

It is intuitive to believe that the adoption of agriculture and the associated development of civilisation and complex society resulted in an increase in quality of life. However, the paradox that continues to attract researchers is how little the switch from a hunter–gatherer to an agricultural subsistence strategy appeared to benefit the earliest food producers (Larsen,

2006; Starling & Stock, 2007). As Starling and Stock (2007) pointed out, the archaeological record can be used to answer some important questions about the transition to agriculture, for example, whether such a change emerged gradually as an adaptation over a previously successful strategy, or as a result of environmental or demographic stress. In this context, the analysis of skeletal remains can provide further evidence about the relationship between agriculture and inferred quality of life of the earliest farmers, and whether this relationship is consistent through time and space (Starling & Stock, 2007). To investigate the patterns of disease and dietary health over the course of the Neolithic in the Nile valley, Starling and Stock

(2007) examined linear enamel hypoplasia (LEH), an indicator or general or episodic stress, of

242 dentitions from five ancient Egyptian and Nubian populations: 38 individuals from Jebel

Sahaba (Upper Palaeolithic), 56 from Badari (Predynastic), 54 from Naqada (Predynastic), 47 from Tarkhan (Dynastic), and 47 from Kerma (Dynastic). These populations were chosen because they span the early period of agricultural intensification along the Nile valley. The prevalence of LEH was found to be highest in the “proto-agricultural” (pastoralist) Badari population (69.6%), with a gradual decline throughout the late Predynastic and early Dynastic periods of state formation, with the lowest frequency found among the Kerma (Dynastic) population. According to the authors, this suggests that the period surrounding the emergence of early agriculture in the Nile valley was associated with high stress and poor health, but that

43

the health of agriculturalists improved substantially with the increasing urbanisation and trade that accompanied the formation of the Egyptian State. Fluctuations in dental health status that appear to coincide with changes in subsistence pattern and social complexity have additionally been reported by other authors (Beckett & Lovell, 1994; Larsen, 1982; Lukacs, 1996; Eshed et al, 2006; Keita & Boyce, 2001; Mahoney, 2006; Smith et al, 1984).

Stock et al. (2011: 347–367) reported a decrease in stature and body size from Late

Pleistocene hunter–gatherer populations from Jebel Sahaba in Nubia (c. 13,000–9000 BC) to the earliest agriculturalists of the Neolithic/Predynastic Period from Badari (c. 5000–4000 BC).

An increase in stature and body size was subsequently observed by the Middle Kingdom.

Measures of robusticity further suggested that males experienced a high level of mechanical loading during the Late Pleistocene; humeral strength and femoral midshaft rigidity subsequently decreased during the transition to agriculture. In females, robusticity was found to be similar between the Late Pleistocene hunter–gatherer and Badarian agriculturalist populations, but decreased in the Dynastic period Kerma sample (c. 2100–1500 BC), suggesting relatively constant levels of mechanical loading during the transition to agriculture followed by a decrease in the intensity of habitual loading with the development of the Egyptian

Empire (Stock et al, 2011: 347–367).

Changes in stature over time associated with the adoption of agriculture and state formation in ancient Egypt have additionally been investigated by other authors. For example,

Zakrzewski (2003) sampled 150 adult individuals (78 male and 72 female) from six time- successive sites dating to the Badarian Period, early Predynastic Period, late Predynastic

Period, early Dynastic Period, Old Kingdom, and the Middle Kingdom. A total of 997 long bones were measured and stature was calculated using the Egyptian-specific equations developed by

Robins and Shute (1986). The smallest stature in both sexes was observed in the Badarian

Period samples, with a rapid increase in height occurring through the early Predynastic Period.

Male stature subsequently peaked during the early Dynastic Period, while females were found to continue increasing in height until the Old Kingdom. Both sexes then exhibited a decline in calculated stature between the early Dynastic Period and the Middle Kingdom (Zakrzewski,

2003). All long bone lengths were found to exhibit significant sexual dimorphism, and several additionally exhibited significant changes in length between the various time periods studied.

44

The greatest degree of sexual dimorphism in stature was observed in the skeletal samples dating to the late Predynastic Period, with a high level also found in the Middle Kingdom samples. Considering these observations in relation to the subsistence strategies practised by individuals from the different time periods, Zakrzewski (2003) noted that stature increased over the period of change in subsistence strategy, from pastoralism with gathering and cultivation during the Badarian Period to agriculture in the early Dynastic Period, while stature declined over the phase of agricultural intensification and formation of social hierarchies that occurred up to the Middle Kingdom (Zakrzewski, 2003). More specifically, the increase in stature with intensification of agriculture during the Predynastic Period was likely a reflection of the greater reliability of food production and the formation of a social hierarchy. The later decrease in stature coincided with even greater social complexity and differential access to resources according to social class. Furthermore, this latter change in stature was more prominent in males than females, suggesting that males may be more responsive to the effects of socioeconomic change than females (Zakrzewski, 2003). This is supported by the results of other Martin et al. (1984: 193–214), who noted a statistically significant decrease in male but not female stature from the A-group (non-intensive agricultural phase) to the X-group (intensive agriculture phase) of Nubia.

1.2.1.3 Diet

The diet of an individual or a population is another key piece of biological information that can be obtained from human skeletal remains, notably the dentition (Gamza & Irish, 2012; Greene et al, 2005; Hillson, 1979; Keita & Boyce, 2001, amongst others). A further commonly used approach is to sample carbon (C) and nitrogen (N) stable isotope ratios of bone collagen and/or dentine (Thompson et al, 2008), a procedure that can provide direct information about the average diet of an individual over the last 25–30 years of his or her life (Iacumin et al, 1998).

Carbon isotope ratios can give information about the types of plants in the foodweb, in terms of

C3 plants (typically temperate plants, grasses, shrubs and trees, including economically important crops such as wheat and barley), C4 plants (arid-adapted plants and grasses, such as sorghum and millet), or dietary protein from marine organisms. Nitrogen isotope ratios, on the other hand, provide information about other dietary sources of protein (Thompson et al, 2005), notably the trophic level of the organism from which the protein was derived. Using this

45

approach, it has been found that populations living at Kerma in Nubia during the Ancient (c.

4450–4000 BC), Middle (c. 4000–3700 BC), and Classic (c. 3700–3450 BC) Periods had a mixed dietary regimen including C3 and C4 plants (C4 plants being more important during the

Ancient Kerma Period), proteins from bovid sources (cattle being more important during the

Ancient Kerma Period), and freshwater (Iacumin et al, 1998), compared with a predominantly C3 plant-based diet among ancient Egyptian populations ranging from the

Predynastic Period to the Thirtieth Dynasty and from numerous geographically diverse sites

(Thompson et al, 2005; Touzeau et al, 2014). Iacumin and colleagues (1998) found no apparent differences in diet between males and females; however, the number of samples was not large enough to determine the statistical significance of this finding. Similarly, White et al. (1999) found no dietary differences between males and females from Roman–Byzantine Period (400–

700 AD) individuals from the . Dietary differences by sex were not assessed by

Thompson et al. (2005) or Touzeau et al. (2014).

Analysis of stable carbon and nitrogen isotopes from human remains has additionally been used to supplement the limited amount of information about infant feeding and weaning practices in Roman Period Egypt available from documentary and iconographic sources

(Dupras et al, 2001), and to explore migration and population movements within Egypt (Dupras

& Schwarcz, 2001).

1.2.2 Cultural information

The study of human skeletal remains may also make a significant contribution to our understanding of how past societies functioned by providing information about mortuary practices and beliefs, social organisation, and hierarchies. The sex of an individual is an integral component of these types of studies, as differences in body treatment, grave form, and cemetery or burial location can provide important information about the roles of males and females within a society, as well as the overall cultural implications of differential treatment of the sexes (Knudson & Stojanowski, 2008). According to some authors, the analysis of mortuary populations, practices, and material culture provides more information about the social and hierarchical organisation of a society than any other set of behaviours (Savage, 1997; Tainter,

1978). Carr (1995), on the other hand, suggested that in addition to social organisation, a wide array of factors affect mortuary practices and remains, including philosophical–religious beliefs,

46

world views, physical constraints, circumstances of death, and ecological relations. Other authors have proposed that there is a direct relationship between the energy expended on grave construction and provisioning, and the social status that the individual held in life (Binford,

1971; Stevenson, 2009; Tainter, 1978), and that the presence of badges of office or prestige goods are indicative of an individual at the top end of the social hierarchy (Frood, 2010: 477;

Meskell, 1999; Stevenson, 2009).

The development of social stratification or internal differentiation within a society is intimately linked to the formation of the State. Some authors have proposed that the formation of the Egyptian State was intimately linked to the development of agriculture and a settlement- centric subsistence economy (Hassan, 1988; Marcus, 2008; Wenke 1989, 1991). In Upper

Egypt, the earliest attestation of both agriculture and hierarchies dates to the Badarian Period

(c. 4400–4000 BC) (Hendrickx & Vermeersch, 2000: 36; Wilkinson, 1999: 29). Prior to this, the prehistoric inhabitants of the Nile Valley are thought to have operated a Palaeolithic style of subsistence, consisting of hunting, fishing, and gathering (Hendrickx & Vermeersch, 2000: 31).

The reason for the adoption of agriculture in Egypt has been attributed to a number of factors, including the geography of the country and circumscribed nature of the habitable areas (Stein,

1998), climatic changes, which forced the indigenous desert inhabitants into the Nile Valley

(Bard, 1994; Brookfield, 2011: 91–108; Hassan, 1997), and growth of the population (Dumond,

1975; Kemp, 2006: 74), although the latter theory has largely been discounted by some authors

(Allen, 1997; Hassan, 1988). How the adoption and development of agriculture led to social inequality and ultimately to the formation of the State has further been explained in several different ways; however, a popular theory proposes that the key to social differentiation lies in the growth and storage of surplus (Bard, 1992; Castillos, 2006, 2007; Frood, 2010). Once agricultural surplus had been accumulated by a particular individual it could be transformed into other forms of wealth, for example craft or prestige goods and increasingly elaborate burials, first as symbols of status and later as symbols of power (Bard, 1992).

Egyptian mortuary remains provide abundant evidence for social classes and inequality related to wealth, power, occupation, and access to resources. Badarian graves show variation in their size and wealth suggesting that different levels of status were accorded to the deceased

(Wilkinson, 1999; 29). For example, examining 262 burials from seven cemeteries at Badari,

47

Anderson (1992) identified an association between the number of burial goods recovered from a tomb and the size and elaborateness of the grave, as well as the age at death (but not sex) of the individual interred within it. By comparison, Meskell (1999) noted divisions in the spatial organisation of New Kingdom period cemeteries according to wealth (and presumably social status), age at death, and sex. At Deir el-Medina, two cemeteries, the Eastern and Western

Necropoleis, were constructed to serve as burial grounds for the poorer (including many females and children) and wealthier members of the Eighteenth Dynasty, respectively. Within the Western Necropolis there was additional and considerable individual variation in overall tomb expenditure, with males receiving the greatest burial wealth, females receiving considerably less, and children relegated to comparatively meagre burials (Meskell, 1999).

Interestingly, in the less affluent Eastern Cemetery, differences in burial wealth appeared to be based on age rather than sex. These findings suggest two things: 1) that within the Deir el-

Medina population as a whole, the primary social divide was wealth, and 2) among the higher social classes, males were viewed as far more important members of society than females, whereas in the poorer classes, males and females were viewed as having more equal status

(Meskell, 1999).

The presence of local elites is even more apparent in the mortuary record of the subsequent Naqada Period (c. 4000–3000 BC), with an increase through time in both mean grave size and mean number of grave goods, as well as the presence of luxury or prestige goods (Bard, 1988; Wilkinson, 1999: 29). A further important stage in socio-economic development that may be inferred from Naqada Period burials is the switch from status or power that was acquired, as through the control of agricultural surplus, to status or power that was inherited (Bard, 1988; Wilkinson, 1999: 29). For example, a large proportion of children’s burials with extremely rich grave goods in the Gerzean culture burials at Badari was proposed by

Murray (1956) to reflect increasing social complexity with the burial of children from particularly important families.

As the elite families of Upper Egypt grew increasingly powerful during the fourth millennium BC, prestige and luxury goods were required to reinforce and legitimate their exalted position (Wilkinson, 1999: 43). Long-distance trade was able to supply a number of high-status goods. According to some authors, the desire to control trade with the southern Levant and

48

Mesopotamia was the driving force behind the northward expansion of Naqadian culture and the eventual unification of the two lands, out of which the State of Egypt emerged (Hassan,

1988; Savage, 1997; Savage, 2001; Trigger et al, 1983: 49).

The unification of Egypt had a considerable impact on mortuary practices and beliefs.

Prior to this event the late Predynastic was characterised by increasing centralisation and coalescence of polities (Hoffman et al, 1986; McGuire, 1983; Savage, 1997, 2001). As a result, the tombs of the elite, reflecting the concentration of power, declined in number. Similarly, provincial elite cemeteries also declined, and by the end of the Predynastic Period, elite burials in large mud brick mastabas occur only at Abydos and Saqqara (McGuire, 1983). The distribution of wealth and power also changed during this period. The process of unification involved subduing and incorporating smaller societies under a single ruler, thus creating a single hierarchy with the king at the summit (McGuire, 1983). Such a process resulted in a significant increase in inequality, both in terms of wealth and power, and culminated in some of the most prominent examples of mortuary architecture in the world: the pyramids (McGuire, 1983).

1.3 Developmental biology

1.3.1 Sex determination and differentiation

The biological or genetic sex of an individual is determined at fertilisation, the process by which the male gamete (sperm) and female gamete (oocyte) unite to form a zygote (Moore & Persaud,

2008; Sadler, 2004). The classical theory of human sex determination states that male and female morphological characteristics do not begin to appear until the seventh week of development. The early genital systems in the two sexes are therefore identical during the initial periods of embryogenesis, a phase that is commonly referred to as the indifferent or undifferentiated stage of sexual development (Moore & Persaud, 2008).

The first sex-specific event leading to divergent development of the gonads in the two sexes is the expression of the sex-determining region on Y (SRY) gene in the undifferentiated gonadal ridge of the male. The protein product of SRY, known as the testis-determining factor

(TDF), is a transcription factor that initiates a cascade of downstream genes that promote testicular development (Sadler, 2004; Arnold, 2012). The absence of a Y chromosome results in the formation of an ovary. Once the gonads have differentiated they secrete gonadal hormones that cause sex-specific patterns of development in many other tissues such as the external

49

genitalia, internal genitalia (Wolffian and Müllerian duct structures), and brain (Arnold, 2012), as well as sex-specific differences in morphology, function, and behaviour (Arnold et al, 2012: 67–

88).

This classical or ‘gonad-centred’ theory of mammalian sex determination provides a convenient explanation of sexual development that incorporates a number of important observations made in the early twentieth century (Lillie, 1917; Zhao et al, 2010). However, it suffers from several inaccuracies. Notably, this theory suggests that gonadal determination and differentiation precede all other sex differences, and does not explain a growing list of sex differences in phenotype that are not controlled by the gonads and therefore are not downstream of differences initiated by SRY. Furthermore, it misses the importance of other X and Y genes that have an equal position to SRY in primary sex differentiation (Arnold, 2012).

1.3.1.1 Sex differences in early development

In considering sex differences in phenotype that are not downstream of gonadal differentiation, two types of evidence are relevant. The first concerns sex differences that occur prior to the development of the gonads at around five weeks after conception, while the second is based on experimental designs in which the effects of the sex chromosome complement of an embryo are dissociated from the effects of the gonads (Arnold, 2012). One of the earliest studies to present data that conflicted with the gonad-centric theory of mammalian sex determination was conducted by Renfree and colleagues (1988) using the tammar wallaby as their experimental model. The results of this study demonstrated that extensive somatic sexual dimorphism precedes the first morphological evidence of testicular formation by several days. The authors suggested that some sexually dimorphic characteristics appeared to develop autonomously depending on genotype rather than under the influence of sex hormones (Renfree et al, 1988).

Other studies have noted sex-related differences in the developmental rate or growth of precleavage zygotes and blastocysts. In a study conducted by Yadev and colleagues (1993), a total of 1,084 bovine zygotes matured and fertilised in vitro were separated according to the time it took to complete their first cleavage (mitotic cell division). At five days post-fertilisation, the proportions of each group that had progressed to the eight-cell stage or beyond were determined and examined cytogenetically (Yadev et al, 1993). A comparison of early (twenty- four and thirty hours post-fertilisation) and late (forty, forty-eight and sixty hours post-fertilisation)

50

cleaving embryos revealed that the former were statistically significantly more likely to be male

(P<0.05), and to have a higher mean cell number (P<0.001; Yadev et al, 1993).

Similar results were obtained by Burgoyne et al. (1995), who found that sex- chromosomally variant murine embryos carrying a Y chromosome demonstrated accelerated pre-implantation development that was ‘carried over’ into the post-implantation period compared with those that did not carry a Y chromosome. Furthermore, this effect was not found to be dependent on the presence of the SRY gene (Burgoyne et al, 1995). Xu and colleagues (1992) used 229 bovine embryos, produced in vitro and fixed on day eight after fertilisation, to investigate sex-related differences in developmental rate. The results demonstrated a clear pattern of sexually dimorphic development, with the sex ratios of embryos in both early and late developmental stages differing significantly from 1:1 (P<0.01). According to the authors, these results suggest that sex-related gene expression affects the development of embryos soon after activation of the embryonic genome and well before gonadal differentiation (Xu et al, 1992).

More recently, Bermejo-Álvarez and colleagues (2008) investigated epigenetic differences between male and female bovine blastocysts produced in vitro. Analyses included sex determination, mitochondrial DNA (mtDNA) content, telomere lengths, methylation analyses, and quantification of mRNA transcripts of DNA methyltransferases. Statistically significant differences between male and female blastocysts were found for mean mtDNA copy number (greater in males than females; P<0.05), telomere length (shorter in males than females; P<0.01), and the level of methylation in a sequence near a variable number of tandem repeats mini-satellite region (higher in males than females; P<0.05). The authors concluded that the results provide evidence of epigenetic differences between male and female bovine embryos produced in vitro and suggested that before initiation of gonadal differentiation, epigenetic events may modulate the differences between speed of development, metabolism, and transcription observed during the pre-implantation developmental period in male and female embryos (Bermejo-Álvarez et al, 2008).

Other studies investigating effects caused by the sex chromosome complement that are dissociated from gonadal effects have used the ‘four core genotypes’ (FCG) model. The four genotypes are XX gonadal males or females, and XY gonadal males and females. This model allows researchers to investigate the differences in phenotypes caused by sex chromosome

51

complement (XX versus XY), the differential effects of ovarian and testicular secretions, and the interactive effects of the two (Arnold & Chen, 2009). Using this model, Chen and colleagues

(2009) investigated sex chromosome effects on the expression of three messenger ribonucleic acids (mRNAs), encoding neurotransmitters or their receptors (the dynorphin precursor PDYN

[prodynorphin], the substance P precursor TAC1 and the dopamine D2 receptor), in the striatum and nucleus accumbens of adult mice (Chen et al, 2009). These regions of the brain are involved in disorders including Parkinson’s disease, Tardive dyskinesia, Huntingdon’s disease, attention deficit hyperactivity disorder and Tourette’s syndrome, most of which show some sex bias in incidence or severity. It was demonstrated in the study that gonadectomised

XX mice had statistically significantly higher expression of PDYN than XY mice of the same gonadal sex in each brain region (P<0.0002). For TAC1, expression was higher in XX than XY mice when comparing wild-type mice and male FCG mice (P<0.0001), but was not statistically significant in the FCG female comparison group (P=0.13). The sex chromosome effect for

PDYN mRNA was further confirmed using quantitative RT-PCR on striata from a second set of gonadectomised FCG mice. Further quantitative RT-PCR demonstrated that PDYN mRNA was expressed higher in mice with two X chromosomes (gonadectomised XX females) than in gonadectomised female mice with one X chromosome (XO) or in gonadectomised sibling males with one X chromosome (XY). These results suggest that the number of X chromosomes was responsible for the greater expression of PDYN in the XX mice (Chen et al, 2009). Taken as a whole, these various studies provide evidence for the presence of sex-differences in development that precede the development of the gonads and therefore contradict the gonad- centered theory of sex determination.

1.3.2 Skeletal biology

1.3.2.1 Skeletal development and growth

1.3.2.1.1 Embryonic development of the skeleton

Bones first appear as condensations of mesoderm-derived mesenchymal cells (cells from one of the three germ or ‘embryonic tissue precursor’ layers) that form bone models. Most flat bones develop from condensations whose cells have differentiated directly into bone-forming osteoblasts. These cells lay down a matrix that is particularly rich in type I collagen in a process known as intramembranous bone formation (Kronenberg, 2003). Most limb bones, however,

52

develop from mesenchymal models that are transformed into cartilage bone models through the differentiation of mesenchymal cells into chondrocytes, the primary cell type of cartilage

(Kronenberg, 2003; Moore & Persaud, 2008). This type of bone formation is known as endochondral ossification. The cartilage model enlarges through chondrocyte proliferation and matrix production, as shown in Figure 1.3A.

Figure 1.3A: The process of endochondral bone formation. a) Mesenchymal cells condense, b) Cells of condensations become chondrocytes (c), c) Chondrocytes at the centre of model stop proliferating and become hypertrophic (h), d) Perichondrial cells adjacent to hypertrophic chondrocytes become osteoblasts, forming bone collar (bc), e) Osteoblasts of primary spongiosa accompany vascular invasion, forming the primary spongiosa (ps), f) Chondrocytes continue to proliferate, lengthening the bone, g) At the end of the bone, the secondary ossification centre (soc) forms. The growth plate below the soc forms orderly columns of proliferating chondrocytes (col). Haematopoietic marrow (hm) expands in marrow space. (Source: Kronenberg, 2003).

These processes generally occur during the fifth to eighth weeks of development. From the beginning of the ninth week, the embryo has generally developed a recognisable human form and is henceforth termed a foetus (Moore & Persaud, 2008; Sadler; 2004). By the end of the twelfth week, primary ossification centres have developed in almost all bones of the limbs.

53

While some secondary ossification centres appear in utero, the majority develop after birth

(Moore & Persaud, 2008).

Like the gonad-centred theory of sex determination, the process of embryonic skeletal development outlined above assumes that there are no differences in the formation and ossification of bone models between the sexes. However, several early studies demonstrated sex differences in the timing of the appearance of ossification centres during prenatal development. In a study published in 1923, Pryor observed that centres of ossification, notably in the calcaneus, talus, femur and tibia, appeared earlier, and tended to be larger, in female foetuses compared with male foetuses. He further noted that these differences were also present after birth, and that fusion of the epiphysis with the diaphysis tended to occur three to four years earlier in females than males (Pryor, 1923). Garn and colleagues (1974) investigated the degree of histological hand development in 66 human embryos (40 male and 26 female) in the 15–75 mm crown-rump length range. Embryos were analysed as a total sample, and separately according to crown-rump length groups (15–30 mm and 30–75 mm). In the total sample, 27 of the 40 (67%) male embryos demonstrated statistically significantly advanced development of hand bones over size-comparable female embryos. Separate analyses of crown-rump length groups confirmed male advancement of embryonic development of the hand skeleton, particularly in the 15–30 mm group (P≤0.05), and to a lesser and non-significant extent in the 30–75 mm group. In the former group, hand development was approximately 50% greater per millimetre of crown-rump length in male compared with female embryos (Garn et al,

1974).

Sex differences in skeletal development occurring early in embryogenesis were additionally identified by Brook and colleagues (1994) using the curly tail mutant mouse strain.

They found statistically significant differences between male and female embryos from all 20 litters for mean somite (paired blocks of mesoderm) number (P<0.001) and mean morphological score (P<0.05), two indicators of progression of development, both of which were greater in males than females. Similarly, the mean protein content of the male embryos was statistically significantly greater than that of the female embryos across all 20 litters (P<0.002). By contrast, the rates of increase of somite number and morphological score were not statistically significantly different between males and females (P>0.05).

54

1.3.2.1.2 Sex differences in skeletal growth

Despite the presence of sex differences in the early stages of embryonic skeletal development, it has generally been accepted that sexual dimorphism in the immature postnatal skeleton is minimal prior to puberty (Loth & Henneberg, 2001), after which differential growth rates during the post-pubescent adolescent period result in the attainment of differing adult sizes between males and females (Rogol et al, 2002). The term puberty refers to the activation of the hypothalamic–pituitary–gonadal axis that culminates in gonadal maturation (Sisk & Foster,

2004). From a neurobiological perspective, a hallmark of the onset of puberty is the heightening of the neurosecretory activity of gonadotropin-releasing hormone (GnRH) neurons in the basal forebrain leading to an increase in the amplitude of GnRH pulses (Mauras et al, 1996; Tena-

Sempere, 2012). This in turn triggers a cascade of events including increases in the amplitude of follicle-stimulating hormone (FSH) and leuteinising hormone (LH) pulses, followed by marked increases in output of gonadal sex steroids, which in turn increases growth hormone (GH) and insulin-like growth factor-1 (IGF-1) production (Mauras et al, 1996).

The production of sex steroids at the start of puberty is clearly linked with an increase of bone mineral acquisition during this period (Saggese et al, 1997; Venken et al, 2008). The discovery of sex steroid receptors for oestrogen, androgens and progesterone on the surface of bone-forming osteoblasts and bone-resorbing osteoclasts further highlighted the importance of these hormones in the processes of bone formation, growth, and remodelling (Colvard et al,

1989; Eriksen et al, 1988; Pensler et al, 1990). In a landmark study conducted by Turner and colleagues (1990), a reduction in periosteal perimeter was observed in orchidectomised growing male rats compared with an increase in periosteal circumference in ovariectomised female rats.

This led to two key assumptions: that sex hormones are the primary mediators of sexual dimorphism in bone size and strength, and that in terms of radial bone growth, androgens are stimulatory and oestrogens inhibitory in males and females, respectively (Callewaert et al,

2010a). This latter theory was supported by observations in males with hypogonadotropic hypogonadism (Finkelstein et al, 1987, 1989) and in pubertal mouse models (Callewaert et al,

2010a). However, more recent findings have served to redefine and challenge this concept, with a body of evidence now indicating that oestrogen additionally plays a role in males as well. For example, in male mice, oestrogen deficiency in addition to androgen withdrawal was found to

55

further reduce radial bone expansion versus androgen withdrawal alone, at least during the early stages of puberty (Callewaert et al, 2010b), suggesting that aromatisation of androgens into oestrogens also contributes to skeletal sex differences (Callewaert et al, 2010a). Evidence from human mutational studies has further challenged this concept. For instance, unfused epiphyses were reported in men presenting with either a homozygous inactivating mutation in the oestrogen receptor (ER)-α gene (Smith et al, 1994) or mutations in the aromatase gene, which encodes the enzyme that converts androgens to oestrogens (Carani et al, 1997). This suggests that oestrogen deficiency or resistance (as a result of the absence of ERα) leads to incomplete epiphyseal fusion even in males (Vico & Vanacker, 2010).

The levels of other hormones, including GH and IGF-1, also increase during puberty and are therefore additionally regarded as critical regulators of pubertal bone growth (Callewaert et al, 2010a). The GH/IGF-1 axis is an important regulator of bone formation during growth.

Secretion of GH stimulates IGF-1 production in the liver and regulates serum as well as tissue

IGF-1 levels (Courtland et al, 2011). Murine models have shown that ablation of the IGF-1 gene results in substantially reduced body weight, femur length, and femoral bone mineral density

(BMD), growth retardation, delayed onset of peripubertal growth, and a significantly slower growth rate (He et al, 2006; Liu & LeRoith, 1999). Similar findings were also reported in a human patient with a homozygous partial deletion of the IGF-1 gene (Woods et al, 1996). Other authors have found evidence that mice deficient in IGF-1 exhibit a greater impairment in bone accretion than mice deficient in IGF-2 or GH, that GH/IGF-1, but not IGF-2, is critical for puberty-induced bone growth during the postpubertal period, and that the effects of IGF-1 on bone accretion during prepuberty are mediated predominantly via mechanisms that are independent of GH, whereas during puberty they are mediated via both GH-dependent and GH- independent mechanisms (Mohan et al, 2003). Additional studies using murine models have presented evidence that sexual dimorphism of the skeleton is established early in puberty and is reliant on independent and time-specific effects of sex steroids and IGF-1 (Callewaert et al,

2010b).

Despite the obvious contribution of sex steroids, GH/IGF-1, and the pubertal growth spurt to the attainment of adult sexual size dimorphism in the skeleton, a number of studies have identified skeletal size and proportion differences in prepubescent children, suggesting

56

that sex-related differences in embryonic development and prenatal growth may be ‘carried over’ into the postnatal period and have a greater impact on the morphology of immature male and female skeletons than had previously been considered possible. Using a sample of lateral cephalograms obtained for 32 individuals (16 male and 16 female) at ages six, nine, 12, 14, 16, and 18 years, Ursi and colleagues (1993) observed statistically significant dimorphism in the anterior cranial base length (sella–nasion, S–N) for males and females in all age groups, beginning at six years (S–N for males and females: 67.4 mm vs. 65.0 mm, respectively;

P<0.01). This result was obtained in spite of similar growth rates between the sexes (Ursi et al,

1993). These results are supported by those obtained by Bulygina and colleagues (2006), who used geometric morphometric techniques to digitise and analyse 18 landmarks and semilandmarks in X-rays of the anterior neurocranium, face, and basicranium taken as part of the Denver Growth Study. A total of 500 lateral radiographs from 28 individuals (14 male and 14 female) taken at one, three, nine, and 12 months of age, then approximately every year up to age 21 years or later, were included in the study. The authors found statistically significant sexual dimorphism in shape and form from the earliest age stage included in the study until around three years of age. They further suggested that four factors contribute to cranial sexual dimorphism in human postnatal development: 1) initial, possibly prenatal, differences in shape;

2) differences in the association of size and shape; 3) male hypermorphosis; and 4) some degree of difference in the direction of male and female growth trajectories (Bulygina et al,

2006).

To further evaluate the age at which sex differences first appear in the skeleton,

Humphrey (1998) used a sample of 192 (94 juvenile and 98 adult) known sex and age at death individuals from the church crypts of St Bride’s Church on Fleet Street, St Barnabas Church, and Christ Church, Spitalfields in London, UK, to investigate cross-sectional growth patterns in

59 skeletal variables. The results of this study demonstrated that there is considerable variation in the age at which skeletal variables attain adult size and functional capability. For example, at age one year, the frontal bone was shown to have grown to more than 80% of adult breadth, compared with the long bones which had only reached around 30% of adult length. After division of the skeletal variables into growth groups based on the time to reach adult size, statistically significant differences in mean sexual dimorphism were found to occur between the

57

early–late (attain 70% of adult size by six years of age and 90% of adult size between ages 12 and 18 years) and intermediate–late (attain 70% of adult size between ages six and 12 years and 90% of adult size between 12 and 18 years) groups (P<0.01) and the intermediate–late and intermediate–very late (attain 70% of adult size between ages six and 12 years and 90% of adult size between 18 and 24 years) groups (P<0.05). A highly significant relationship was also observed between the age at attainment of 90% of adult size and sexual dimorphism (P<0.001), which suggested that around half of the variation in sexual dimorphism is related to the age of attainment of 90% of adult size (Humphrey, 1998). The developmental basis of sexual dimorphism was also found to vary within the postcranial skeleton, being caused either by sex differences in growth rate or a combination of sex differences in growth rate and duration. For a number of skeletal variables, including maximum diameter of the clavicle, humerus, radius, ulna and femur, and minimum diameter of the radius, ulna and femur, sexual dimorphism caused by growth rate differences was already apparent at birth. Overall, these results demonstrate that there is a significant relationship between the pattern of growth and the development of sexual dimorphism in the skeleton, with early growing bones generally showing less sexual dimorphism than later growing elements. The author suggested that this may be the result of similar growth trajectories between males and females prior to puberty, as well as differences in the time available for sex differences to accumulate. However, sexual dimorphism was also shown to occur in many parts of the skeleton that complete their growth prior to puberty, suggesting that for some skeletal elements, male and female growth patterns have already diverged prior to the adolescent growth spurt (Humphrey, 1998). The development of sexual dimorphism in the skeleton should not, therefore, be viewed as a uniform phenomenon, but rather the result of a complex pattern of interacting factors (Humphrey, 1998).

Additional evidence for the presence of morphological differences in the immature skeletons of males and females, suggesting sex-specific patterns of prepubertal growth, is provided by studies exploiting such differences to produce sex estimation methods for use in physical and forensic anthropology. For example, Schutkowski (1993) found that a prominent and angular mental eminence and a wide anterior dental arcade could be significantly attributed to boys in 94.1% and 82.6% of cases, respectively (P=0.006 and P=0.030 for the occurrence of these features in males compared with their occurrence in females, respectively). A greater

58

sciatic notch angle of around 90 degrees was additionally found to be statistically significantly more common in boys compared with girls (95.0% vs. 71.4%, P<0.001). When the results were broken down by age category, it was further revealed that for the majority of features, sexual dimorphism appeared to develop in the first year of life (Schutkowski, 1993). Loth and

Henneberg (2001) also identified features of the mandible that in a series of blind-tests were able to correctly classify sex in individuals aged 0–7 years with an average of 81% accuracy.

More recently, Gonzalez (2012) used lateral cephalometric radiographs from the Michigan

Craniofacial Growth Study of individuals aged 5–16 years to create discriminant sex estimation equations based on 20 craniofacial measurements. Testing of the equations in different age categories resulted in cross-validated accuracy rates of between 72% and 85% (Gonzalez,

2012).

Given the arguments and evidence presented above, it is appropriate to conclude that the attainment of adult skeletal size differences between males and females involves a complex and interactive process of differing patterns of sex-specific embryonic development, and pre- and post-pubertal growth, as well as the effects of GH/IGF-1 and sex steroids, particularly the combined action of androgens and oestrogen in males, which enhance periosteal bone formation resulting in greater bone size and strength in males compared with females. In other words, human growth may be viewed as a complex process resulting from the integrated actions of genetic, metabolic, and endocrine mechanisms (Martorell et al, 1979). Such a process additionally allows considerable scope for modification by external factors, with nutritional and environmental stresses having particular prominence among them.

1.3.2.1.3 Factors affecting growth

Growth and development of the skeleton require an adequate supply of many different nutrients, including calcium, phosphate, magnesium, zinc, carbonate, and vitamins C, D and K (Prentice et al, 2006). Deficiency of these nutrients is associated with well-known skeletal disorders, including rickets (vitamin D deficiency) and linear growth retardation (protein-calorie and zinc deficiency) (Prentice et al, 2006). The importance of adequate protein-calorie intake during the neonatal and adolescent periods of accelerated growth has been demonstrated by a number of studies. Sampling Guatemalan children aged up to seven years involved in a food supplementation programme, Martorell and colleagues (1979) found that protein-calorie intake

59

was strongly related to body size, as measured by growth in supine length and increases in body weight. Skeletal maturation was also found to be affected by malnutrition, but to a lesser extent than body size. The authors suggested that poor diet may therefore hinder the possibility of catch-up growth, accounting to a large extent for the smaller body size characteristics of adults of malnourished populations for whom sufficient dietary sources of protein and calories never become available (Martorell et al, 1979).

A further study using human subjects noted prolongation of the pre-pubescent growth period in response to malnutrition. Dreizen and colleagues (1967) included a sample of 30 undernourished and 30 well-nourished girls yet to undergo menarche in their study, which aimed to explore the effect of nutrition on skeletal maturation during the transition from childhood to early adulthood. Progress was evaluated every three and six months using height measurements and radiographs of the left hand, respectively (Dreizen et al, 1967). The authors found that compared with the well-nourished group, chronic malnourishment statistically significantly delayed the onset of menarche (P<0.001), slowed the rate of pre-menarcheal skeletal maturation (P<0.001), and prolonged the growth period. At the mean menarcheal age of the well-nourished subjects (12 years, 5 months), the difference in mean standing heights between the two groups was statistically significant (9.2 cm difference; P<0.001). By age 17 years, the difference in height between the undernourished and well-nourished groups had reduced to 1.1 cm (not significant), most likely as a result of the extended growth period experienced by the undernourished girls (Dreizen et al, 1967).

Periods of fasting and plenty are not uncommon among both past and contemporary human populations, particularly those of the latter group living in hostile or remote rural environments. A number of studies have documented changes in stature in response to changing subsistence strategies (Cardoso & Gomes, 2009; Larsen, 1982; Mummert et al, 2011;

Taylor, 2010; Tobias, 1962), with decreases in stature often being reported coincidentally with the introduction of agriculture (Cohen & Armelagos, 1984; Larsen, 1995; Stini, 1971;

Zakrzewski, 2003). Sex differences in growth in response to nutritional stresses have also been reported (Stini, 1969; Stinson, 1985). These and other issues related to the evolution of growth and sexual dimorphism will be discussed in greater depth in subsequent sections.

60

The susceptibility of skeletal growth to environmental insults such as inadequate protein-calorie intake has been further demonstrated in studies examining the relationship between skeletal development, dental development, and maturation. A basic assumption in physical anthropology is that dental development is less influenced by environmental insults than skeletal development, with the added implication that tooth development stages provide a better estimation of chronological age than skeletal growth in both modern and archaeological populations (Cardoso, 2007a; Cardoso, 2007b). This assumption is supported by a number of different studies. For example, Lewis and Garn (1960), using the coefficient of variation, found that tooth formation was less variable than tooth eruption, which in turn was less variable than skeletal maturation at the hand–wrist, and the appearance of ossification centres. Similar results were obtained by Lewis (1991) and Conceição and Cardoso (2011), the latter demonstrating in a sample of known-age juvenile skeletons from Portugal that individuals of low socioeconomic status show statistically significant differences in skeletal maturation, but not in dental maturation, to individuals of high socioeconomic status. These results are additionally supported by studies that have demonstrated only weak to moderate relationships between skeletal and dental development (Lacey et al, 1973; Šešelj, 2013; Smith, 2004). Thus, while studies of dental development may make an important contribution to our understanding of growth and maturation in general, and in estimating age at death of juveniles specifically, a detailed discussion of the literature is not relevant here, given that the focus of the present research is growth and development of the skeleton, with a particular emphasis on sex differences and the attainment of adult skeletal size.

1.3.2.2 Temporal changes in human growth

A comprehensive discussion of the evolution of growth, comparing and contrasting the life and growth trajectories of humans and nonhuman anthropoid primates, is unfortunately beyond the scope of this project. What is relevant, however, is consideration of the factors that impacted growth and the attainment of adult size and proportions in past human populations and how these factors have altered over time. In most studies of human growth, long bone length as a proxy of stature is the most commonly used measure, despite some important limitations (Ruff et al, 2013). For example, while long bone length and stature are correlated, the ratio of limb length to stature changes dramatically during growth (Ruff et al, 2013), so that

61

changes in long bone length do not exactly reflect changes in stature. Relative limb length to stature proportions also vary between populations (Cowgill et al, 2012). As a result, differences between populations in long bone lengths at a given age are only approximations of differences in stature (Ruff et al, 2013). Nevertheless, stature is used by the World Health Organization as a primary index of growth alteration in contemporary populations (World Health Organization,

1995); therefore, its use as such in archaeological populations is not without justification.

Several authors have noted that among living human populations, growth in body size during infancy and early childhood is relatively similar in populations growing under optimal environmental conditions, and therefore any deviations in growth during this period reflect environmental and socioeconomic rather than genetic differences (Graitcer & Gentry, 1981;

Habicht et al, 1974). Using a sample of preschool children from both developed and developing countries whose height and weight were measured from birth until the age of seven years,

Habicht and colleagues (1974) found that among the well-nourished children from developed countries, differences in height did not exceed 3% across the groups of children from different ethnic backgrounds. In comparison, differences between these children and those, often of similar ethnic and geographical background, who live in the poor, urban and rural regions of developing countries approximates to 10% after 12 months of age; in the rural Indian and

Guatemalan Indian samples the differences increased to between 12% and 17% after 12 months of age (Habicht et al, 1974). Similar results were obtained by Graitcer and Gentry

(1981), who concluded that among privileged children aged six to 59 months from developing countries, growth was mainly influenced by socioeconomic status and not by race or ethnicity

(Graitcer & Gentry, 1981). These results suggest that during infancy and early childhood at least, nutritional status is the single most important factor affecting growth, and that genetic influences are of lesser relevance in the early stages of growth than they are during other growth periods such as adolescence (Ruff et al, 2013).

That an environmental factor such as nutrition affects skeletal growth to such a large extent is an important finding, particularly when considered in the context of areas of research examining subsistence strategies in past human populations. In particular, studies examining the stature of samples of individuals from time-successive groups within the same population may provide evidence of when that population changed from a nomadic hunter–gatherer

62

lifestyle to a settled agricultural existence, a switch that is often viewed as being coincidental with the establishment of a cultured and hierarchical civilisation (Dumond, 1961; Ehret, 2002;

Savory, 1994). Meggers (1954) highlights this relationship particularly well, by suggesting that:

“…the level to which a culture can develop is dependent upon the agricultural potentiality of the

environment it occupies” (Meggers, 1954: 815).

To explore the biological adaptation of a prehistoric (c. 2200 BC–AD 1550) population living on St. Catherine’s Island off the coast of Georgia, USA, Larsen examined 593 skeletons dating to either the pre-agricultural period (2200 BC–AD 1150; n=269) or a later stage of the known cultural sequence in which a mixed agricultural and hunter–gatherer subsistence strategy was practised (AD 1150–1550; n=324). He found that the agriculturalists living on St.

Catherine’s Island were statistically significantly shorter than the earlier pre-agricultural population group (P<0.01 for all stature estimates). Furthermore, the decrease in stature occurring with the introduction of agriculture was found to be greater in females than males; female stature decreased by an average of 3.0%, while male stature decreased by an average of 1.1% (Larsen, 1982). Coupled with this was an observed increase in the degree of stature sexual dimorphism between the pre-agricultural and agricultural societies, from an average of

4.1% to 6.2%. In other words, the pre-agricultural females and males were more similar in skeletal size than the agricultural females and males. The subsequent increase in size disparity between males and females was predominantly the result of the proportionally greater degree of female size reduction that occurred over time (Larsen, 1982). These observations may be explained by two factors, both of which act to decrease bone size and robusticity: a decrease in mechanical stress with the adoption of agriculture, as evidenced by a statistically significant decrease over time in the prevalence of stress-related pathologies such as degenerative joint disease, and a decrease in the quantity of dietary protein as a result of the introduction of corn as a dietary staple, as shown by a statistically significant increase in the prevalence of dental caries over time (Larsen, 1982).

In a similar piece of research, Nickens (1976) aimed to test the hypothesis that: “…a reduction in body size follows the development of intensive agriculture and is a reflection of nutritional stress” (Nickens, 1976: 33). Evaluating stature data for Mesoamerican skeletal populations available in the literature (Genovés, 1970: 35–49), Nickens (1976) found that

63

nutritional consequences of food production played a more important role in the decline of stature in the central and southern portions of Mexico than the natural environment. He further suggested that the evidence presented in his study supports the hypothesis that human body size is reduced as a result of a shift from a hunting and gathering lifestyle to an intensive agricultural way of life (Nickens, 1976).

The evidence that adoption of agriculture causes reductions in stature is compelling. In addition to the skeletal studies discussed above, others have suggested that agricultural populations consumed a narrower range of foods than did hunter–gatherers. In essence, their diet was characterised by a reduced availability of animal protein and a greater reliance on a limited number of domesticated plants, some of which have only marginal or poor nutritional value (Larsen, 1995). Maize, for example, is deficient in several amino acids that cannot be synthesised by the body, including lysine, isoleucine, and tryptophan, and additionally contains nutrients of low bioavailability (Larsen, 1995). However, there are two important limitations to the agriculture–stature argument that should be noted. Firstly, the observation that the development of an agricultural subsistence pattern results in reduced stature is not universal; some studies have shown no change or even increases in stature in some geographic regions

(Larsen, 1995; Mummert et al, 2011). For example, changes in stature were not observed in

Peruvians along with their adoption of agriculture, although stature was found to decline in relation to the Spanish colonisation of the region, nor were they observed in population groups from Ecuador (Allison, 1984: 515–529; Mummert et al, 2011; Ubelaker, 1984: 491–513).

Secondly, other lines of evidence discussed previously suggest that nutritionally-deficient girls undergo a longer period of prepubertal growth (as a result of the delayed onset of menarche) allowing them to attain a similar height compared to well-nourished girls (Dreizen et al, 1967).

According to Hansen and colleagues (1971; cited in Martorell et al, 1979: 387), “…protein and calorie depletion can thus be said to be strong stimuli for growth hormone secretion and this ensures efficient nitrogen retention and catch-up growth when proteins and calories become available". Accordingly, self-limiting, above-normal growth rates are usually observed in children during the early weeks of recovery from severe malnutrition (Martorell et al, 1979). Thus, it may not be a reasonable conclusion that growth-retarded children will be short-statured adults, and

64

probably there are several other factors in addition to nutritional status that affect changes in stature, including climatic adaptation and genetics (Ruff, 2002).

The preceding discussions concerning the relationship between adequate dietary intake and stature are based on an underlying assumption that small body size equates to poor nutritional status and insufficient resources. However, this may not necessarily be the case. In the early 1980s, David Seckler, an economist, challenged the prevailing view of nutritionists by suggesting that people who are short in stature as a result of moderate malnutrition in childhood may still be “healthy”, that is, they demonstrate a normal weight for height ratio and apparently suffer no functional impairment. These people, he argues, are therefore simply adapted to the circumstances of reduced food availability (Messer, 1986; Pelto & Pelto, 1989). This “small but healthy” hypothesis has attracted considerable attention and debate, primarily because of its far-reaching implications in a number of different contexts and fields of research, including those related to anthropology, socioeconomics, politics, and international food/nutrition policy.

Seckler’s argument (1980, 1982; cited in Pelto & Pelto, 1989) is composed of the following elements: 1) in childhood, the human body reduces its rate of growth to adjust or adapt to low nutrient intake, and thereby maintains an equilibrium of physiological functioning; 2) individuals who experience mild to moderate malnutrition suffer no impairment other than lowered growth rate and shortened stature; 3) mild to moderate malnutrition is likely to be a life-long phenomenon; therefore, smaller-bodied adults are better adapted than large persons to situations of frequent food scarcity; 4) in developing countries there is a large proportion of

“small but healthy” people – people who are not properly nourished but who are not functionally impaired, and therefore should not be counted among the world’s malnourished population

(Pelto & Pelto, 1989). In support of this final point, Sukhatme and Margen (1978; 1982) demonstrated that individual protein and calorie requirements are controlled in a homeostatic manner and autoregulated over a range of intakes, suggesting that an individual can adapt to a range of protein-calorie intakes and still remain healthy. In other words, they suffer no functional impairment. Indeed, a number of different metabolic adaptive mechanisms have been described previously (Waterlow, 1986). However, in considering this evidence, it is important to clarify the term “no functional impairment”, given that it implies a very broad definition of health. For example, the World Health Organization defines health as “…a state of complete physical,

65

mental, and social well-being and not merely the absence of disease or infirmity” (World Health

Organization, 2003). Therefore, to accept the “small but healthy” hypothesis, it would be necessary to conclude that individuals who posses a small body size as a result of insufficient nutritional intake are unimpaired in all of the following ways: 1) resistance to disease; 2) reproductive capacity; 3) work capacity, including endurance; 4) cognitive capabilities; 5) social skills or competence; and 6) general socioeconomic success (Pelto & Pelto, 1989). With this definition in mind, it is possible to find numerous examples in the literature in which protein- calorie malnutrition has a negative impact on a particular aspect of “health”, including, but not limited to, susceptibility to disease (including impairment of the immune system; Rodríguez et al,

2011; Smythe et al, 1971; Taylor et al, 2013), impairment of wound healing (Schäffer et al,

1997), cognitive development and functioning (Fernanda Laus et al, 2011; Rausch, 2013;

Waber et al, 2014), and work capacity (Behrman, 1993; Satyanarayana et al, 1977; Strauss,

1986). Other researchers have argued against the nature of the “small but healthy” hypothesis itself, suggesting that the insistence that some populations have lower nutritional needs than others because they are surviving on average nutritional intakes well below international recommendations perpetuates nutritional and socioeconomic underdevelopment and has a negative impact on international views of malnutrition and subsequent policy decisions (Messer,

1986). For example, Gopalan (1983) argues that there is a big difference between the nutritional requirements needed to “exist” and those needed to “live”, and that the “small but healthy” hypothesis fails to account for the additional energy requirements of activities that go beyond mere survival.

While the socio-economic and political implications of the “small but healthy” hypothesis are an interesting area of debate, they fall beyond the scope of this project. What is important to consider here are the implications to physical anthropology and the study of past human populations. For example, if one accepts the “small but healthy” hypothesis, studies in which a reduction in stature is viewed as a negative side effect of changing subsistence strategies, such as the switch to agriculture, may have to be re-evaluated as presenting evidence not of a decrease in food quality and quantity, but of the occurrence of physiological adaptation that enables individuals to survive better in an environment of changing resources. On the other hand, rejecting the hypothesis implies that small body size should be viewed as an outward

66

manifestation of general poor health and debilitation as a result of poor nutrition, which itself could be the result of a number of different factors such as subsistence patterns, or ecological, environmental, or social changes. To look at the “small but healthy” hypothesis another way, the underlying argument is actually the same, regardless of whether one accepts or rejects it: that lower than “normal” protein-calorie intake in childhood results in reduced growth rate and small adult body size. The pertinent question is therefore whether small body size is a positive outcome of insufficient childhood nutrition which allows adults to survive better in their environment, or a negative outcome which indicates poor health. It is likely that the answer will vary for different populations, given that the “small but healthy” hypothesis considers only the role of nutrition in growth, and does not take into account other factors such as genetics, climate, or the prevalence of disease.

1.4 Evolutionary and environmental biology

Sexual dimorphism refers to the phenotypic differences between males and females of the same species. Though sex differences that are directly related to mating and reproduction (the primary sex differences) may be described as being sexually dimorphic, the term is usually reserved for secondary sex differences such as body size and skeletal proportions, or indeed any sex differences, including genetic, biochemical or behavioural differences (Plavcan, 2001).

Historically, the phenomenon of sexual dimorphism was viewed as the differences in secondary sexual characteristics that arose as a result of sexual selection (Frayer & Wolpoff, 1985).

However, it has become increasingly apparent that many other factors contribute to the expression of sexual dimorphism, including some that produce short-term fluctuations in this parameter and which cannot therefore be the result of a genetic component.

1.4.1 Sexual selection and genetic influences

Since Charles Darwin published his seminal works ‘On the Origin of Species’ in 1859 and ‘The

Descent of Man’ in 1871, the primary cause of sexual dimorphism has been attributed to sexual selection. According to Darwin, sexual selection arises due to: “…the advantage which certain individuals have over others of the same sex and species solely in respect of reproduction”

(Darwin, 1871: 216). In other words, sexual selection can be thought of as intra-specific reproductive competition (Hosken & House, 2011). The theory of sexual selection may be

67

broadly divided into two categories: mate competition and mate choice. Individuals may increase their reproductive fitness relative to rivals by either excluding rivals from mating (mate competition) or selectively choosing mates (mate selection) (Plavcan, 2001; 2012a). Both methods emphasise the importance of physical characteristics such as a large male body size and the production of offspring with a genetic predisposition for the same desirable phenotype.

While neither mechanism is limited to one or the other sex, in primates mate competition is largely associated with males, whereas mate choice is associated with females (Plavcan, 2001;

Puts, 2010).

It is important to note that sexual selection can only occur if there is a disparity in mating success that is associated with phenotypic variation among individuals (Isaac, 2005). Selection will thus favour phenotypic adaptations which enhance the ability of a male to prevail in a contest with another male (Isaac, 2005), or the ability of females to bear and nurture more offspring. Historically, the primary mechanism favouring the development of sexual dimorphism was considered to be mate competition, with mate selection commonly being viewed as either a consequence of or alternative strategy to the former mechanism (Plavcan, 2001; 2012a).

Classic sexual selection theory dictates that males should compete vigorously with one another for access to potential mates. As such, reproductive competition among males is thought to have played a fundamental role in the evolution of traits such as morphological weaponry, sexual dimorphism in body size, and the highly conspicuous courtship displays demonstrated by many species (Díaz-Muñoz et al, 2014). It follows, therefore, that the degree of sexual dimorphism should be greater in species in which males fight with each other for direct access to receptive females than in species that exhibit less male–male competition (Kelaita et al,

2011), or correspondingly in populations that demonstrate a polygamous rather than monogamous mating system (Moorad, 2013; Plavcan, 2011). Comparative studies have demonstrated that there are consistent relationships between the size of female groups and the development of secondary sexual characters in males, including increases in relative body size, the size and elaboration of male weaponry and the extent of male ornaments (Clutton-Brock &

Huchard, 2013; Harvey et al, 1978). Among easily observed species, such as baboons, males are largely intolerant of one another, often engaging in spectacular fights and threat displays

(Plavcan, 2001). Such displays and fighting seem to determine access to females. Importantly,

68

species like baboons are intensely dimorphic. Males have enormous canine teeth and are much larger than females, providing evidence for the concept that sexual selection has favoured these traits because of the advantages they confer in winning fights (Plavcan, 2001). In contrast, monomorphic gibbons are very obviously monogamous, showing no apparent differential male reproductive success that is associated with male–male competition (Plavcan, 2001).

However, sexual selection is not the only explanation for sexual dimorphism in primates. Other factors have been hypothesised to impact on male size and weaponry, including predation defence, substrate constraints, and dietary factors, including niche dimorphism (Plavcan, 2012a). Despite this, male agonistic contest competition is the only factor that has consistently received support from comparative analyses in explaining why males are larger and have larger canine teeth than females in nonhuman anthropoid primates (Gaulin &

Sailor, 1984; Gordon, 2006; Plavcan, 2012a). However, male competition does not explain all inter-specific variation in dimorphism (Plavcan, 2012a). For example, the strongest behavioural correlation of size dimorphism in anthropoid primates, competition levels (Plavcan & van Schaik,

1992), was found to explain only 48% of the variation in dimorphism in a sample of 128 species

(Plavcan, 2012a). Indeed, this is probably partly because classic sexual selection therapy does not account for female size and behaviour, and the role of mate selection in the determination of sexual size dimorphism.

Although Darwin’s original definition of sexual selection was broad and did not exclude sexual selection in females, a large proportion of the research that followed became almost entirely focused on males and the importance of male competition. However, this view has recently been re-evaluated and it is now widely accepted that sexual selection can also occur in females (Fritzsche & Arnqvist, 2013; Rosvall, 2011). The contribution of changes in female body size to size dimorphism is more complex and can be broken down into factors that favour an increase or a decrease in female size. For example, increased female body size might be favoured either through the advantages that large female size might confer in resource or mate competition (Gordon, 2006; Lindenfors, 2002; Plavcan, 2012a), or in protecting offspring

(Plavcan, 2012a). In contrast, selection might favour smaller female size if early maturation results in an increase in reproductive rates through earlier production of offspring (Plavcan,

2012a).

69

One of the major problems in assessing the contribution of females to sexual size dimorphism is whether the use of phenotypic traits such as female ornamentation, vocalisation, and weaponry in same-sex competition constitutes sexual selection (Rosvall, 2011). Rosvall

(2011), for example, poses some interesting questions. Are these traits non-functional by- products of a genetic correlation with males – a ‘shared genetic architecture’ (Tobias et al,

2012)? Are they primarily shaped by fecundity or survival selection (natural selection that excludes competition for mates)? Or do females use these exaggerated traits and behaviours to compete for mates in a context similar to sexually selected male–male competition? (Rosvall,

2011). Geary and colleagues (2014: 394) suggest that the traditional definition of sexual selection, in which the sex with the slower rate of reproduction and higher investment in offspring (typically females) is a resource over which the lower investing sex (typically males) competes, is too narrow in scope. Drawing from the arguments presented by other researchers, they suggest that within-species social dynamics may be partitioned into sexual selection, the competition for mates, and social selection, a sub-division of sexual selection, which is itself a sub-division of natural selection (Geary et al, 2014: 394). Under this proposal, ‘social selection’ refers to “competition for access to resources other than mates that can affect reproductive success” (Lyon & Montgomerie, 2012; Tobias et al, 2012). For example, parental investment often entails considerable costs. When the resources needed to support these costs are in short supply, females must compete intensely for priority access to them (Geary et al, 2014: 395;

Tobias et al, 2012). Studies in various animal species have demonstrated an increase in female–female aggression in instances of scarce food sources, during pregnancy and lactation, in groups with high numbers of females, and in the protection of offspring (Murray et al, 2006;

Rosvall, 2011; Wolff & Peterson, 1998). This type of competition results in female status hierarchies and the evolutionary elaboration of behavioural and other traits that signal relative status and that enable its establishment and maintenance (Geary et al, 2014: 395). Thus, even in the absence of Darwin’s (1871) traditional intra-sexual competition for mates, intra-sexual competition for ecological resources can be a potent selection pressure that contributes to the evolution of sex differences (Geary et al, 2014: 395).

That is not to say, however, that females are not also subject to traditional sexual selection. Several lines of evidence suggest that females choose mates and compete to mate

70

with them in a similar manner to males, although such behaviour is highly dependent on the size and structure of the social group. For example, females are more likely to compete for access to mates in populations where the number of males is limited (Schuster, 1983), where the operational sex ratio, a term used to describe the ratio of sexually competing males to sexually competing females, is female biased, or where the sex roles are reversed (Rosvall, 2011).

Females may also compete for high-quality males; those with a particular phenotypic trait which indicates that they would be able to provide direct benefits such as food, defended space

(territories), or parental care, or indirect (genetic) benefits, such as viability advantage to offspring (Plavcan, 2001; Rosvall, 2011). Female–female competition may therefore lead to a number of different fitness benefits for the winning female, which suggests that phenotypic traits conferring a competitive edge in females are unlikely to exist merely as a by-product of sexual selection acting on males (Rosvall, 2011). Unfortunately, it is difficult to quantify the effect that competitive adaptations have on female body size. Lindenfors (2002), for example, found no association between increasing female body size and increased intensity of resource competition in primate species. Furthermore, it is not yet known how much size dimorphism varies as a function of changes only in female body size, given that changes in body size are difficult to partition precisely into male and female components (Plavcan, 2012a).

Another important question relates to whether and how modern humans, particularly those living in the agricultural era, are still affected by Darwinian sexual selection for body size.

Some authors have suggested that human size and stature dimorphism may be a product of sexual selection from the recent past that persists in modern humans (Plavcan, 2012b; Wade &

Shuster, 2004). It has been suggested that hominids and other extinct taxa demonstrating strong size dimorphism lived in highly polygynous groups with intense male competition

(Gordon et al, 2008a; Plavcan, 2012b), given that a strong relationship has been demonstrated between sexual dimorphism in size and the intensity of male mating competition among polygynously mating primates (Mitani et al, 1996). However, among modern human societies, serial monogamy is the most common mating pattern (Marlowe, 2000). Under traditional

Darwinian theory, monogamous mating systems which allow all males to mate do not foster male–male competition and should therefore be less sexually dimorphic. However, such a theory does not take into account indirect mate competition, for example competition between

71

males for social status and resources. If higher social status is associated with dominance, and dominance is associated with larger body size, sexual selection may favour large body size through female preference for dominant males (Plavcan, 2012a). Evidence of a female preference for taller men has further been presented, with the additional finding that this is associated with greater reproductive success (Mueller & Mazur, 2001; Nettle, 2002; Pawlowski et al, 2000). Nettle (2002) also found that men prefer shorter females and suggested that this reflects different reproductive strategies between the sexes. For females, a preference for tall men is suggested to be a remnant of an earlier association between male competitive ability and height, while for males, female stature is associated with fertility cues (Plavcan, 2012a).

In contrast to the evidence discussed previously, neither mate competition nor mate choice, the fundamental aspects of Darwinian sexual selection theory, are able to account for body size in all human populations. For example, among the Baka pygmies a tall mate is often favoured (Becker et al, 2012). This study thus raises the question, if sexual selection is not responsible for body size in pygmy populations, then what is? Several hypotheses have been proposed including morphological adaptation to tropical rainforest environments (Bernstein,

2010; Diamond, 1991), and ontogenetic scaling caused by shifts in circulating levels of hormones such as IGF-1 and GH (Shea & Bailey, 1996). Hypotheses that do not include sexual selection have additionally been considered by other authors to explain sexual size dimorphism in human populations and mammalian species. According to Isaac (2005), it is unlikely that sexual size dimorphism in mammals is the result of a single selection factor. For example, body size and mass show plasticity with respect to adapting to changes in the short-term, immediate environment, and relationships between somatic growth and other life parameters are likely to constrain the evolution of both (Isaac, 2005). As such, short-term changes in sexual size dimorphism may be the result of differing growth patterns that reflect competition for food and are not related to selective pressures (Isaac, 2005). Other authors have suggested that sexual selection can no longer be viewed as the major cause of variability in sexual dimorphism after they found, using a quantitative genetic model, that body size was the most important contributor to sexual dimorphism (Leutenegger & Cheverud, 1982). However, after disregarding sexual selection as the primary mechanism, they do somewhat confusingly state that: “…if there is selection for size increase, whatever its cause, directional selection in both males and

72

females will lead to an increase in sexual dimorphism based on differences in genetic variance between the sexes” (Leutenegger & Cheverud, 1982: 387). This paper subsequently received some harsh criticism, with other groups of researchers pointing out, for example, the use of a biologically unrealistic measure of sexual dimorphism (Gaulin & Sailer, 1984), and a lack of correlation between body size and sexual dimorphism in some primate species (Kappeler,

1990). Importantly, this model only holds true if body size continually increases in a lineage over time. However, in the human lineage, dimorphism has decreased over time while body size has increased (Plavcan, 2001). Furthermore, according to Gaulin and Sailer (1984), there is currently no convincing description of a mechanism whereby allometric considerations alone, in the absence of sexual selection, could produce the patterns of dimorphism apparent in the available data sets. Overall, it may be concluded that while sexual selection has made and continues to make a significant contribution to the evolution of sexual dimorphism, it is one of many combined factors that contribute to the sexual differences in adult phenotype. These other factors that may additionally have an influence on the development and/or maintenance of sexual dimorphism are discussed in the following sections.

1.4.2 Environmental influences

In addition to or in combination with Darwinian selection, sexual size and shape dimorphism may also develop as a result of adaptation to environmental stresses. Following Baker (1984),

‘stresses’ are defined here as the natural or cultural environmental forces which potentially reduce the ability of an individual or population to function in a given situation. The term

‘adaptation’, on the other hand, may have several different meanings depending on the context in which it is used. For example, ‘adaptation’ may refer to the adjustment of the pupil and retina of the eye to varying degrees of illumination, or to the dynamic process in which the behaviour and physiological mechanisms of an individual continually change to adjust to variations in living conditions. This indicates that the term has both simple, specific physiological meanings, as well as broader and more all-encompassing definitions. In the context of the present study, the most appropriate definition of ‘adaptation’ is the process of change in an organism to conform better with (new) environmental conditions, whereby the organism acquires characteristics, notably changes in morphology, that improve their survival and reproductive success in the particular

73

environment (Bijlsma & Loeschcke, 2005). Humans generally adapt to their environment in four different ways:

1) Genetic adaptation: adaptation that occurs through changes in allele frequencies as a

result of the selection pressure exerted by the environment over several generations.

This type of adaptation is also known as evolutionary adaptation (Bijlsma & Loeschcke,

2005).

2) Developmental adjustment: changes in childhood growth patterns and development in

response to environmental stresses that result in irreversible anatomical and/or

physiological changes in adulthood (O’Neil, 2014).

3) Acclimatisation: reversible anatomical and/or physiological adjustments to

environmental stresses that may occur in childhood or adulthood (O’Neil, 2014).

4) Cultural practices and technology: adaptations that allow humans to occupy new

environments without first having to evolve biological adaptations to them (O’Neil,

2014).

While there is a huge body of literature pertaining to human morphological, physiological and behavioural adaptation to a wide range of different environmental conditions, the focus of this section is morphological adaptation of the skeleton, and in particular, sex differences in the development, adjustment and morphological adaptation of the skeleton to specific environmental stresses related to diet, climate, altitude, ecology, and activity patterns.

1.4.2.1 Diet and nutritional stresses

Considering the evidence presented in previous sections, it is fair to suggest that nutritional status plays an important role in growth and development, with well-nourished children tending to demonstrate increased stature compared with their under-nourished counterparts (see

Sections 1.3.2.1.3 and 1.3.2.2). Other studies have additionally noted that this pattern is not uniform across the sexes, and that males may be more susceptible to fluctuations in nutritional quality than females. According to this theory, females are less affected by nutritional shortages as a result of reproductive demands, the storage of more subcutaneous body fat, and their overall smaller body size (Frayer & Wolpoff, 1985). As early as 1951, Greulich observed an apparent retardation in the growth of Guamanian boys, but not Guamanian girls, in comparison to a White American population. In this study, a relatively long period was observed, from nine

74

to 15 years, during which the stature of the Guamanian boys was less than that of their female counterparts. In comparison, the same pattern in the White American population was only observed for the ages 10.5 to 13.5 years (Greulich, 1951). While the author suggested that this might be a ‘racial peculiarity’ of the Guamanians, he thought it was more likely to represent the greater vulnerability of the Guamanian boys, as compared with the girls, to the unfavourable environmental conditions they encountered on the island during and after the Second World

War (Greulich, 1951). In stark contrast to these findings, Gavan (1952) examined the same data and came to conclusions that were almost completely opposite to those reached by Greulich.

Gavan (1952) examined the relationship beween mean weight and mean stature using regression analysis. He found that the relationship was the same in both groups (White

American children and Guananian children), although the former were absolutely larger than the latter. Assuming that part of this absolute size difference was due to the difficult environmental conditions under which the Guam children lived, these conditions must have affected weight and stature in a dimensionally equivalent manner. That is, the Guam children were no more underweight for their stature than the White American children and the Guam boys were no more underweight for their stature than the Guam girls (Gavan, 1952).

Since these findings were published, several other authors have provided evidence in support of the theory that males are more readily affected by nutritional shortages than females

(Bogin et al, 1992; Gray & Wolfe, 1980; Malina et al, 1985; Nikitovic & Bogin, 2014; Stini, 1969;

Stini, 1972; Wolánski & Kasprzak, 1976; Zakrzewski, 2003). For example, Malina and colleagues (1985) demonstrated statistically significant differences in the growth status of boys according to different economic backgrounds, but not of girls. Boys from more affluent households were found to be statistically significantly heavier (P<0.01), with longer sitting heights (P<0.05), larger arm circumferences (P<0.001), and thicker triceps skinfolds (P<0.001) than those from the poorer households. Bogin and colleagues (1992) examined the rate of growth and timing of adolescent growth events in two samples of Guatemalan children living under different environmental conditions. One sample consisted of Mayan children (n=45) between the ages of five and 18 years living under poor conditions for growth and development; the other sample included Ladino children (n=163) of the same age range living under favourable conditions for growth. The authors found significant sex and ethnicity

75

(socioeconomic) effects on both the timing and rate of growth. Mayan and Ladino boys were found to differ significantly in age at peak height velocity while Mayan and Ladina girls did not.

However, the mean height of Mayan girls was found to be significantly less (a difference of 6.5 cm) than that of Ladina girls at the age of take-off growth; this difference increased to an estimated 11.1 cm at adulthood (growth cessation). Mayan boys were 6.6 cm shorter than

Ladinos at the age of take-off growth and were estimated to be 7.7 cm shorter than the Ladinos at adulthood (Bogin et al, 1992). The authors explained these findings in terms of the relative, weighted contributions of the environment and genetics at different stages of growth. For example, growth during childhood may be more influenced by environmental than genetic factors, while the opposite may be true during adolescence and after the pubertal growth spurt

(Bogin et al, 1992). The delayed age at which Mayan boys reach peak height velocity relative to

Ladinos, a trend that was not observed in the female cohorts, was further taken as evidence that males are less buffered than females against environmental determinants of growth such as poor nutrition (Bogin et al, 1992). Wolánski and Kasprzak (1976) summarise this trend particularly well by stating that:

“…the difference in stature between men and women in a given population is a measure of the

intensity of environmental influences and of differences between the sexes in sensitivity to

environmental stimuli” (Wolánski & Kasprzak, 1976).

However, the majority of these studies were not designed to specifically assess sex differences in environmental sensitivity. Rather, this theory was and continues to be postulated as a way of explaining observed trends in male as opposed to female growth and stature in nutritionally or economically stressed groups compared with non- or less-stressed groups. For example, Cardoso and Garcia (2009) examined the effect of relative living standards on growth in two distinct populations from Portugal: medieval Leiria and early twentieth century Lisbon.

They found that growth in femur length of medieval children did not differ significantly from that of early twentieth century children, but after puberty medieval adolescents seem to have recovered, as they had significantly longer femora as adults than individuals from the Lisbon sample (Cardoso & Garcia, 2009). Under the assumption that the growth of girls is more buffered against environmental stresses than the growth of boys, the authors suggested that the results may be explained by the composition of the samples. That is, “…the Lisbon subadults

76

can appear similar in size to the Leiria subadults if they comprise more girls than Leiria and hence show an overall pattern of growth which is less affected by poor living conditions”

(Cardoso & Garcia, 2009). Unfortunately, this hypothesis is difficult to verify given that the undocumented, prepubertal skeletons of the Leiria sample could not be accurately sexed.

Similarly, Vercellotti and colleagues (2011) examined patterns of biological variation in body size and shape attributable to sex and status in a medieval Italian population and found significant differences in trunk height and long bone lengths between high and low status males but not females. The authors suggested that the lack of significant differences between female sub-samples could be interpreted as the result of higher “environmental buffering” of the female body, regardless of social status. As the authors state, “If this were indeed the case, similar body size in the female subsamples would be due to the fact that low status females did not experience a decrease in body size even though they experienced inferior life conditions. In contrast, low status males would have suffered major, significant reduction in size in response to inferior environmental quality” (Vercellotti et al, 2011).

In addition to the largely untested nature of the hypothesis that male growth is more retarded by environmental stress than is female growth, if one assumes that this is true it follows that the level of sexual dimorphism in nutritionally stressed populations should be lower than in non-nutritionally stressed populations (Stinson, 1985). However, this is not necessarily the case.

For example, nutritional patterns and the degree of sexual dimorphism were not correlated in two Venezuelan populations (Lopez-Contreras et al, 1983: 277–281), and an increase in the level of sexual dimorphism was observed in Bushmen populations during the transition from a nomadic, hunter–gatherer subsistence strategy to a more settled, pastoral, food-storage way of life, as a result of increases in male stature (Tobias, 1962). This is in contrast to other studies which have demonstrated a decrease in stature during the transition to an agricultural subsistence pattern, possibly because agriculture is initially a less-reliable source of food than is hunting and gathering (Larsen, 1995; Stini, 1971; Zakrzewski, 2003).

Other groups of researchers have reported results that contradict the hypothesis that males are more susceptible to nutritional stresses than females. Leonard and colleagues (2002) examined anthropometric indices of growth status of children aged less than six years in three indigenous Evenki communities of Central Serbia during the early period of post-Soviet

77

transition, between 1991 and 1995. They found that children of the 1995 cohort had significantly lower height-for-age measures than those of the 1991/2 cohort. Furthermore, the magnitude of the between-cohort differences were significantly greater in girls than boys (P<0.05). The proportion of children with severe stunting almost doubled between 1991/2 and 1995, increasing from 35% to 61% over this time period (P<0.001). The increased prevalence of stunting was evident in both sexes; however, the magnitude of the increase was greater in girls than boys (64% in 1995 vs. 29% in 1991/2; P<0.01 and 57% in 1995 vs. 39% in 1991/2;

P=0.17, respectively). The authors explained these results in terms of the socioeconomic changes occurring in Russia during the study period. Children under the age of around three years from the 1995 cohort would have lived for their entire postnatal growth period in poorer, post-Soviet conditions. In contrast, children who were over the age of three in 1995 spent at least a portion of their earliest childhood growing up under what appear to have been better economic and nutritional circumstances (Leonard et al, 2002). The authors further suggested that the greater magnitude of growth defects in girls compared with boys, which contradicts the findings of other studies, may reflect cultural issues such as sex-related bias in parental attention, resource allocation, or access to health care, as has been demonstrated in parts of

Asia (Behrman, 1988; Chen et al, 1981; Das Gupta, 1987; Khera et al, 2013) and Africa (Hadley et al, 2008). According to Meadows-Jantz and Jantz (1999), the driving force for secular changes in stature is considered to be changes in the nutritional and disease environment.

Malnutrition and infection have further been postulated to be “synergistic”, given that through a series of biological mechanisms, infections can cause loss of appetite, gastrointestinal malabsorption of nutrients, and metabolic wastage of available nutrients in the body (Chen et al,

1981; Gupta et al, 2009; Katona & Katona-Apte, 2008).

In addition to studies of growth and development, investigations of morbidity and mortality may therefore also provide information on sex differentials in environmental and nutritional sensitivity. Studies sampling modern populations consistently report that morbidity and mortality are higher in males than females, particularly in early life

(McMillen, 1979; Waldron, 1976; Waldron, 1983; Wells, 2000). Some studies in both animals and humans have suggested that males are more susceptible to nutritional insults early in life, particularly with respect to the development of brain structure and

78

function (Katz, 1980; Lucas et al, 1998), while others have suggested that males are less resistant to disease than females, in part because the X chromosome carries quantitative genes for the production of immunoglobulin M (IgM), the first type of antibody produced in response to an infection, which results in higher serum levels in females (Waldron, 1976;

Waldron, 1983). However, one problem with this latter theory is that serum levels of IgM do not appear to correlate with the ages at which female mortality from infectious diseases is lower than in males (Waldron, 1983). Using the burial and baptismal records for Christ Church Spitalfields, London, Humphrey et al. (2012) examined the estimated contribution of endogenous (birth-related) and exogenous (postnatal environment-related) factors to neonatal and infant mortality between 1750 and 1839. They found an excess of male deaths at a time when infant mortality was already high. In 1750–59, males were significantly more likely to die in infancy than females. Neonatal mortality was also higher in males than females in the 1750s compared with later cohorts (Humphrey et al, 2012).

Interestingly, a decline in excess male mortality was observed during the 1790s, a period in which there was a reduction in both endogenous and exogenous mortality; however, no further reduction in male disadvantage occurred in the early nineteenth century when reductions in infant mortality can be almost entirely attributed to exogenous causes such as infection or poor nutrition. Sex differences in mortality are due to a number of biological, social, and environmental factors (Drevenstedt et al, 2008), including excess risk of premature birth (Ingemarsson, 2003) and respiratory distress syndrome (Khoury et al, 1985) in males. Other studies have demonstrated that improved infant mortality after

1800 is strongly linked to increased adult height and life expectancy (Crimmins & Finch,

2006), further supporting the relationship between environmental factors and skeletal growth.

1.4.2.2 Climate and altitude

In addition to the proposed sex differential effect of poor nutrition on the development of adult size dimorphism, other authors have suggested that more specific environmental stresses such as high altitude and temperature may have sex-specific effects on growth. A vast body of literature has examined the effect of high altitude on growth and development, including morphological adaptation of the skeleton to hypoxic environments (Beall et al, 1977; Bejarano et

79

al, 2009; Clegg et al, 1972; Cowgill et al, 2012; De Meer et al, 1993; Frisancho, 1969;

Frisancho, 1970; Gonzales et al, 1982; Gonzales et al, 1984; Greksa et al, 1984; Haas et al,

1980; Haas et al, 1982; Julian et al, 2009; Leatherman et al, 1995; Leonard et al, 1995; Little et al, 2013; López Camelo et al, 2006; Majumder et al, 1986; Malik & Singh, 1978; Moore et al,

1998; Moore, 2001; Palomino et al, 1979; Pawson, 1977; Rothhammer & Spielman, 1972;

Stinson, 1980; Stinson, 1982; Weinstein, 2005; Weinstein, 2007; Weitz et al, 2000). In general, these studies have demonstrated consistent patterns in developmental responses to high altitude. Compared with low altitude natives, populations that are native to high altitude environments tend to demonstrate lower birth weight, which is thought to reduce oxygen requirements, reduced stature, which is thought to result from slow prenatal and postnatal growth and a delayed or absent adolescent growth spurt, and increased dimensions of the thoracic skeleton (chest width and depth), which allows for increased lung capacity (Beall et al,

1977; Cowgill et al, 2012; Frisancho, 1970; Malik & Singh, 1978; Stinson, 1980; Weinstein,

2007). However, these findings are by no means universal; nor is the more general hypothesis that high altitude affects growth and development to a significant extent. For example, Clegg and colleagues (1972) found the highlands of Ethiopia to be more favourable to growth than the lowlands, as demonstrated by increased stature, weight, and physical dimensions of highland children, notably boys, relative to their lowland counterparts. Retardation of skeletal maturation was further found to be more marked in the lowland compared with the highland children.

Similarly, Malik and Singh (1978) found that male Bods from Ladakh, a highland region of India between the Kunlun mountain range and the Himalayas, exhibited faster growth than plains dwelling Indians and were taller and heavier than them at age 19 years, while Pawson (1977) found that Sherpas, a people inhabiting Khumba in northeastern Nepal at altitudes of between

3,475 and 4,050 metres, did not demonstrate increases in chest circumference relative to

Tibetan children living in Kathmandu.

The findings of other studies have suggested that high altitude hypoxia plays only a small role in reducing physical growth. Greksa and colleagues (1984) found considerable variation in stature among high populations; average differences between the tallest (Puno,

Peru; 3,825 m) and shortest (Nuñoa, Peru; 4,000 m) samples were around 10 cm in males and

8 cm in females. This suggests that altitude had only a small effect on statural variation, given

80

that the maximum difference in average altitude between the samples was only around 300 metres (Greksa et al, 1984). According to Greksa et al. (1984), the primary cause of the differences in achieved stature between the samples is likely to be variation in general living conditions as reflected by factors such as nutritional status and disease experience. Leonard et al. (1995) found that children aged less than 60 months from highland (c. 3,000–3,500 m) agricultural communities of Ecuador were significantly shorter that their lowland, coastal community counterparts (<300 m), but were not significantly lighter and had similar linear growth rates. Furthermore, highland girls were found to have significantly greater weight gains relative to their coastal peers. The authors suggested that the similarity in growth rates between the high and low altitude samples indicates that high altitude hypoxia plays a relatively small role in shaping growth during the first five years after birth, and that most of the disparity in height between the samples can be attributed to differences established by six months of age

(Leonard et al, 1995). Like Greksa and colleagues (1984), Leonard et al. (1995) similarly suggest that growth retardation in high altitude population is likely associated with the influence of nutritional and disease stressors.

The confounding effect of other factors associated with growth has additionally been considered by other authors attempting to define the impact that high altitude has on development. Such factors include socioeconomic (Little et al, 2013; López Camelo et al, 2006;

Stinson, 1982) and nutritional status (De Meer et al, 1993; Leatherman et al, 1995; Weitz et al,

2000), discussed in previous sections, as well as genetics and population differences

(Frisancho, 1970; Moore et al, 1998; Moore, 2000; Wiley, 1994). Given the difficulty in extracting the effect of high altitude from other confounding factors affecting growth and development, it is not surprising that relatively few studies have examined the differential effect of high altitude habitation on males and females, or on sexual dimorphism. Although not directly investigating sex-specific growth responses to hypoxic stress, Stinson (1980) noted that while high altitude populations tend to exhibit limited sexual dimorphism in general, the reduced sexual dimorphism observed in children from Ancoraimes, Bolivia, examined in her study may be the result of several factors including greater male growth reduction under hypoxic conditions, and nutritional differences between males and females, or it may be a genetic characteristic of the population.

Frisancho and Baker (1970) noted that sexual dimorphism in stature of natives of Nuñoa, Peru,

81

located at an altitude of c. 4,000–5,500 metres, was not well defined until the age of around 16 years. Within this population sample, growth rates demonstrated a late and poorly defined spurt in both males and females, as well as a very prolonged body growth period to the age of 20 years in females and 22 years in males (Frisancho & Baker, 1970).

To adjust for the confounding effect of nutrition on growth, Schutte and colleagues

(1983) included only well-nourished, middle-class children in their study to examine whether there was a statistically observable alteration in the growth of children of European ancestry during sojourn at high altitude. Semilongitudinal height/weight measurements were gathered retrospectively from medical records of 18 boys and 17 girls who were born at low altitude and were clinically normal, and who had lived at Achoma, Peru (altitude 3,200 metres), continuously for 9–36 months (Schutte et al, 1983). The results of the study demonstrated that boys experienced statistically significant decreases in weight in all sojourn duration cohorts (3.0–6.0 months; 6.1–12.0 months; 12.1–18.0 months; and 18.1–24.0 months), whereas only girls who resided at high altitude for between 6.1 and 12.0 months demonstrated significant decreases in weight. Boys also demonstrated a consistent decline in height gain, although this decline was slower than for weight and did not reach statistical significance until 18–24 months at altitude

(Schutte et al, 1983). In comparison, girls demonstrated a significant decline in height gain for the first 3–6 months at high altitude only. However, at least half of the girls in the sample appeared to have already experienced their pre-adolescent growth spurt prior to coming to high altitude, which may have accounted for the observed differential effects in girls and boys

(Schutte et al, 1983). Haas and colleagues (1980) also controlled for nutritional effects in their study which compared birth weight and crown-heel length of full-term babies born to mothers from two different populations: the Quechua from La Paz, Bolivia, located at an altitude of 3,600 metres and who typically are of Amerindian descent (n=105) and the Aymara from Santa Cruz,

Bolivia, located at an altitude of 400 metres, and who are typically of Spanish descent (n=77).

The results of the study demonstrated a clear trend towards smaller infants being born to mothers living at high altitude, particularly mothers of non-Indian descent who are likely to be less well adapted to such an environment. Male babies born at high altitude were found to be

11.9% lighter and 2.3% shorter compared with male babies born at low altitude, while female babies born at high altitude were only 4.6% lighter and 0.4% shorter compared with female

82

babies born at low altitude (Haas et al, 1980). This suggests that high altitude affects males more than females and provides further evidence that female foetuses and infants may possess a greater buffering capacity or canalisation to a growth-disrupting stress such as high altitude hypoxia than males (Haas et al, 1980).

A further confounding factor to the study of high altitude effects on growth and development is climate, and specifically temperature, given that such environments tend to be colder than low altitude environments. According to some authors, thermal stress is one of the most variable constraints to which humans have had to adapt following our emergence from

Africa and subsequent colonisation of virtually every ecosystem on earth (Katzmarzyk &

Leonard, 1998). Effective thermoregulation requires both physiological and morphological adaptive strategies that minimise or maximise heat loss depending on the ambient temperature.

For example, wider trunks help reduce heat loss in cold environments by decreasing the surface-to-volume ratio of the body (Betti, 2014). In mammals, the relationship between anthropometric variation and climatic stress may be summarised by two so-called ecological rules (Katzmarzyk & Leonard, 1998). The Bergmann Rule states that “within a polytypic warm- blooded species, the body size of the subspecies usually increases with decreasing mean temperature of its habitat” (Bergmann, 1847; cited in Katzmarzyk & Leonard, 1998), while

Allen’s Rule states that, “in warm-blooded species, the relative size of exposed portions of the body decreases with decrease of mean temperature” (Allen, 1877; cited in Katzmarzyk &

Leonard, 1998). The association between body size and proportions with temperature, climate and/or latitude has been widely explored in the literature (Betti, 2014; Bogin & Rios, 2003;

Bogin, 1978; Cowgill et al, 2012; Endo et al, 1993; Evteev et al, 2014; Fukase et al, 2012;

Holliday, 1997; Hubbe et al, 2009; Newman & Munro, 1955; Noback et al, 2011; Nowaczewska et al, 2011; Pomeroy et al, 2014; Roberts, 1953; Rothhammer & Silva, 1990; Ruff, 1991, 1993,

1994, 2002; Schreider, 1975; Smith et al, 2007; Steegmann et al, 2002; Stock, 2006; Temple et al, 2008; Weaver, 2009; Weaver & Steudal-Numbers, 2005; Weinstein, 2005, 2008). To follow the Bergmann and Allen Rules, one would expect populations living in cold climates to exhibit a wide trunk and relatively short limbs to reduce surface-to-volume ratio, while those living in hot climates are expected to demonstrate elongated body shape and relatively long limbs to facilitate sweat evaporation via a high surface-to-volume ratio (Betti, 2014). A number of

83

different studies have presented findings in support of these rules and expectations, both in modern and fossil hominids (Endo et al, 1993; Ruff, 1991, 1993, 1994). For example, Ruff

(1991) demonstrated a strong correlation between bi-iliac breadth and latitude, but not between bi-iliac breadth and stature (except in the sub-Saharan African group). In other words, despite variation in stature (and body mass), modern populations living under the same general climatic conditions appear to maintain similar body surface area/body mass ratios by limiting variation in body breadth (Ruff, 1991). Furthermore, populations living in different temperature zones were found to exhibit large systematic differences in absolute bi-iliac breadth: those in progressively colder climates had progressively wider bodies than those in warmer climates, regardless of stature (Ruff, 1991). Katzmarzyk and Leonard (1998) re-examined published data on stature, body mass, the body mass index (BMI), surface area/body mass ratio, and relative sitting height

(RSH) for 223 males and 195 females from 10 distinct geographic/climatic areas. In both sexes, they found a statistically significant inverse relationship between body mass and mean annual temperature and BMI and mean annual temperature, a significant positive relationship between surface area/body mass ratio and temperature, and a negative relationship between RSH and temperature. These results indicate that populations inhabiting hotter regions have physiques that maximise surface area per unit body mass, as well as a linear body build that is characterised by relatively long leg lengths, both of which conform to the ecological rules of

Bergmann (1847) and Allen (1977), both cited in Katzmarzyk and Leonard (1998).

More recently, Temple and colleagues (2008) examined variation in limb proportions of two distinct populations from Japan – prehistoric Jomon and Yayoi – to test among others the hypothesis that relative limb proportions will reflect the latitudinal origins of the different groups of people which colonised Japan. Jomon Period foragers occupied modern-day Japan between

13,000 and 2,500 BC; cranio-dental evidence suggests these people were nomads who migrated to Japan from southeast or North/Central Asia around 30,000 BC. In contrast, Yayoi

Period (2,500–1,700 BC) agriculturalists were thought to be the descendents of people from modern-day Korea or northern China who migrated to Japan and interbred with Jomon foragers

(Temple et al, 2008). The results of the study demonstrated that Jomon limb shape is similar to groups from tropical environments at lower latitude, including samples from Africa, Australia, the

Philippines, and Thailand, whereas the Yayoi demonstrated similarities with individuals from

84

higher latitudes and colder environments such as Alaska, Germany, England, and Kyoto. These findings therefore support one of the key hypotheses of the study that predicted greater similarity in limb shape between Jomon people and groups from warm, low latitude environments and Yayoi people with groups from colder, high latitude locations (Temple et al,

2008). In contrast, Fukase and colleagues (2012) found no north-south climatic cline in intralimb proportion ratios, nor any significant differences between the limb proportions of five regional

Jomon Period skeletal populations ranging from subarctic Hokkaido to subtropical insular

Okinawa. They did, however, find a gradual increase in body size, as measured by femoral head breadth, and stature, calculated from maximum limb lengths, with latitude, which coincides with Bergmann’s Rule (Fukase et al, 2012).

As with the studies examining the effect of high altitude hypoxia on growth, the potential confounding effect of other factors should be considered when examining the impact of thermal stress on growth and development. For example, isotopic analyses of Jomon skeletal remains have indicated that there were region-specific dietary habitats among coastal sites in Japan

(Fukase et al, 2012; Yoneda et al, 2002), and these showed that Jomon people in Hokkaido and

Okinawa particularly consumed larger marine mammals and reef fishes/shells, respectively, compared with those in Honshu (Yoneda et al, 2011). As such, regional differences in protein- calorie intake during growth and development cannot be ruled out. The potentially confounding effect of nutrition, and in particular the suggestion that better living conditions can lead to increases in relative limb length through differential growth, was additionally considered by Ruff

(1994). He argued that malnutrition and lower living standards tend to be more prevalent in modern equatorial regions than in higher latitudes. As such, if nutrition had a greater impact on body proportions than climate, the opposite geographic trend in relative limb length than has actually been observed should be apparent; in other words, tropical populations in general should have relatively shorter, not longer limbs (Ruff, 1994).

According to some authors, temperature stress has a greater effect on male growth than on female growth (Stinson, 1985). For example, statistically significant lower mean male height but not mean female height has been observed in populations living in colder climates than those living in warmer climates (Gray & Wolfe, 1980). By contrast, Ruff (2002) found no statistically significant differences in sexual dimorphism of body mass between extant human

85

populations living at higher and lower mean latitudes (per cent dimorphism, 15.5% and 14.7%, respectively). Similarly, among primate species from Madagascar, Dunham and colleagues

(2013) found no significant relationship between sexual size dimorphism and any climatic variables including temperature, rainfall, and seasonality. A relationship between male body mass and seasonality of temperature was observed; however, this relationship was not found to be statistically significant after a Bonferroni adjustment for multiple testing (Dunham et al, 2013).

Johnston and colleagues (1982) examined growth patterns of body size, proportion, and composition in male (n=57) and female (n=56) Eskimos from St. Lawrence Island in the Bering

Sea. Overall, the results of this study indicated that the Eskimo sample possessed a physique characterised by a large body mass relative to height, and by relatively short extremities, which conforms to the Bergmann and Allen Rules. The authors further indicated that the body shape and size patterns observed were more pronounced in males than in females suggesting a

“…greater resistance of females to environmental factors” (Johnston et al, 1982). Wells (2012) collected sex-specific anthropometric data (stature, weight, and skinfold thickness) for 96 non- industrialised populations and explored patterns of sexual dimorphism in relation to mean annual temperature. He found that the magnitude of dimorphism was not randomly distributed across global regions, but had the lowest magnitudes for stature, lean mass and fat mass in

African populations, and the highest magnitudes for adiposity in Arctic populations (Wells,

2012).

Other studies have investigated the relationship between Bergmann’s Rule, which states that organisms are larger at higher latitudes (or in colder climates), and Rensch’s Rule, which states that male body size varies (or evolutionarily diverges) more than female body size among species (Blanckenhorn et al, 2006; Gustafsson & Lindenfors, 2009). Blanckenhorn and colleagues (2006) used published studies of sex-specific latitudinal body size clines in 98 vertebrate and invertebrate species to examine these rules. They found that Bergmann’s Rule was obeyed by 58 of the 98 species studied, while 40 species demonstrated a converse

Bergmann body size cline (larger at lower latitudes). Ignoring latitude, male size was observed to be more variable than female size in only 55 of 98 species, suggesting that intra-specific variation in sexual size dimorphism does not generally conform to Rensch’s Rule. By contrast, in a significant majority of species (66 of 98) male latitudinal body size clines were steeper than

86

those of females, a finding that is consistent with a latitudinal version of Rensch’s Rule. The authors additionally found no evidence to support the proposed hypothesis that the larger sex tends to have the stronger body size–latitude association (Blanckenhorn et al, 2006). More recently, Gustafsson and Lindenfors (2009) conducted a study with similar aims in a cross- cultural sample of 124 human populations. The results of the study indicated that male and female stature are weakly associated with latitude, with increases in stature correlating with increasing distance from the equator, but were not able to support the hypothesis that the degree of sexual stature dimorphism is greater at higher latitudes. However, more generally, the results obtained by Gustafsson and Lindenfors (2009) are in line with those of other studies (for example, Katzmarzyk & Leonard, 1998; Ruff, 1994; Ruff, 2002) that human morphology to some extent conforms to Bergmann’s Rule.

1.4.2.3 Ecological influences

Several of the lines of evidence presented above are also relevant to arguments in favour of ecological causes of sexual dimorphism. A number of animal studies have suggested that there are differences in the way males and females interact with their environment (Selander, 1966;

Slatkin, 1984); however, a particular problem with the ecological theory in general lies in distinguishing cause and effect. In other words, are ecological factors responsible for the origin of sexual dimorphism, or do sex-related differences in ecological factors exist because of sexual dimorphism? The most commonly cited theory appears to be that feeding competition between the sexes results in niche divergence. As such, niche divergence would serve as an amplifier of sexual size differences originating for other reasons (Lindenfors & Tullberg, 1998). This phenomenon has previously been observed in some bird species (Plavcan, 2001); however, the hypothesis of niche divergence has not received much attention in primates, mostly because there are only small niche differences between the sexes in most primate species (Lindenfors &

Tullberg, 1998). Therefore, in lieu of more robust data, it seems most appropriate to suggest that it is unlikely that inter-sexual niche divergence plays more than a subsidiary role in the evolution and maintenance of common patterns of sexual dimorphism (Fairbairn, 1997; Shine,

1989).

87

1.4.2.4 Division of labour

A number of researchers have suggested that sex differences in economic or subsistence strategy roles may be an important factor in determining and maintaining the degree of sexual dimorphism expressed within a population (Frayer, 1980). If this theory were true, it would be expected that populations operating a hunter–gatherer subsistence pattern would demonstrate a high degree of sexual dimorphism because males and females would have clearly defined and physically distinct roles. Conversely, a decrease in sexual dimorphism would be expected during the transition to an agricultural subsistence pattern as the roles occupied by males and females would be less distinct, more overlapping, and possibly less physically vigorous

(Armelagos & Van Gerven, 1980; Frayer, 1980). However, this raises the question: are the biological consequences of differential role performance and the exclusivity of jobs by sex the cause of sexual dimorphism, or did males and females adopt these differing roles because of their pre-existing physical and size-related differences? Thus, is sexual division of labour simply a convenient theory to explain observed changes in sexual dimorphism in the hominid lineage, such as a gradual gracilisation of human form over time?

According to Frayer (1980), the level of sexual dimorphism within a population is roughly proportional to the exclusivity of the division of labour by sex. In other words, sexual dimorphism decreases in societies where males and females share similar subsistence responsibilities. Reviewing dental, cranial, and body size data for European Upper Palaeolithic,

Mesolithic, and Neolithic populations, he noted a substantial decrease in the level of sexual dimorphism expressed over the three time periods. An analysis of trends suggested that the major cause of this decrease was gracilisation of males between the Upper Palaeolithic and

Mesolithic, which coincides with shifting technological patterns associated with hunting and changes in the types of animals hunted (Frayer, 1980). Additional reductions in sexual dimorphism from the Mesolithic to the Neolithic, and from the Neolithic to modern European populations were found to be more closely related to changes occurring in females, such as increases in stature, which coincided with a greater degree of sharing economic and subsistence chores.

These findings are supported by those of Ruff (1987), who compared cross-sectional geometric properties of the femur and tibia in males and females from populations spanning the

88

Middle Palaeolithic to recent times. He found a consistent decline in sexual dimorphism from hunter–gatherer to agricultural to industrial subsistence strategy levels in properties which measure relative antero-posterior bending strength of the femur and tibia in the region around the knee (Ruff, 1987). He further suggested that this trend parallels and is indicative of reductions in the sexual division of labour, in particular differences in the relative mobility of males and females. In pre-industrial societies, tasks that require a high degree of mobility and running are almost exclusively assigned to males, whereas sedentary roles are usually the province of females, as they are more readily suited to the demands of childcare. Thus, sex differences in relative mobility will always be greatest in hunter–gatherer societies and, according to Ruff (1987), will likely decline with the adoption of agriculture. Such changes will undoubtedly be reflected in the skeleton given its known response to mechanical stimulation and stress (Frost, 1990; Huiskes et al, 2000; Turner, 1998).

However, the findings of other studies suggest some important limitations to the theories put forward by Frayer (1980) and Ruff (1987), which are outlined above. Firstly, not all hunter–gatherer groups exhibit high levels of sexual dimorphism. For example, Carlson et al.

(2007) reported a low level of sexual dimorphism in the lower limbs of pre- and post-colonial

Australian Aborigine hunter–gatherer groups, which is consistent with ethnographic accounts of equivalently high mobility among females and males. As such, it cannot be concluded that elevated postcranial robusticity and sexually dimorphic mobility are universal characteristics of hunter–gatherers (Carlson et al, 2007). In some respects, these findings do provide indirect support for the suggestion that sexual dimorphism will be greatest in populations where males and females perform physically distinct tasks and activities. However, the suggestion by Frayer

(1980) that such a population is likely to operate a hunting/foraging subsistence pattern may also be challenged because males and females often have overlapping roles within hunter– gatherer societies. For example, among the Mbuti of the African Congo, the Tiwi of Australia, the Matses of the Peruvian Amazon rainforest, and the Pinatubo Negritos and Agta of the

Philippines, females are involved in hunting, which often includes the use of weapons such as machetes or a bow and arrows (Goodman et al, 1985). Furthermore, males do not always make the biggest contribution to the provisioning of dietary resources. Johnson (2014) reported that subsistence procurement among ethnologically documented hunter–gatherers may be as low as

89

a 30% male contribution to the diet. This may, however, increase to as much as 99%, usually in societies that have a high dependence on sea-based or terrestrial mammals (Johnson, 2014).

Other lines of evidence relate to the nature of hunter–gatherer groups, which are often, although not always, egalitarian (Cashdan, 1980; Johnson, 2014; Testart et al, 1982). Among such groups, cultural practices select against domineering, arrogant, or authoritative behaviour in males (Cashdan, 1980), which could be a potent force for reducing sexual size dimorphism

(Gray, 2013). Furthermore, such societies are typified by abundant sharing of food and resources (Hurtado et al, 1985). That is not to say, however, that males and females within egalitarian societies do not still have distinct roles. For example, among the Ache hunter– gatherers of Eastern Paraguay, males are primarily engaged in hunting and are the principal providers of food, while female roles include harvesting crops, food processing, domestic chores, manufacturing goods to sell, and foraging (Hurtado et al, 1985). While females are sometimes involved in hunting activities, it is usually only to accompany the males of the group and is something they do while simultaneously caring for their children and taking numerous rest stops. As such, male food acquisition techniques are very different to female techniques: they walk faster, run, climb trees, and use a bow and arrows. For females, the primary driving force in choosing behaviours, which often are sedentary, is the enhancement of offspring survival (Hurtado et al, 1985).

Secondly, agricultural groups and hunter–gatherers may show similar levels of sexual dimorphism. For example, Wolfe and Gray (1982) used data from 73 societies contained within the Standard Cross-Cultural Sample, a collective series of data based on a sample of 186 ethnographically well-described societies from relatively independent cultural and geographic provinces (Murdock & White, 1969), to demonstrate that agricultural groups are no less dimorphic than hunter–gatherers, and that a more equal division of labour is not necessarily associated with a decrease in sexual dimorphism. Thirdly, some sources of data on overall physical activity levels do not support the notion that sex differences in workload were reduced with the transition to agriculture (Panter-Brick (2002). This is supported by the findings of

Holden and Mace (1999), who suggested that sexual dimorphism of stature is related to the overall sexual division of labour but not the nature of the subsistence activity. In other words, it is the amount rather than the type of work that affects the degree of sexual dimorphism

90

exhibited. For example, females tend to be taller relative to males in societies where females make a large contribution to food production (Holden & Mace, 1999). The authors suggested that this may be because female nutritional status is better in these societies than in those where females contribute little to subsistence and therefore undertake only a limited amount of work. However, an alternative explanation could be that the large contribution to subsistence practices is associated with a high level of activity that involves repeated mechanical loading of the skeleton. Studies have demonstrated that, prior to epiphyseal fusion, compression or tension forces applied across the epiphyseal plate may stimulate longitudinal bone growth by increasing the rate of proliferation and enlargement of chondrocytes and production of extracellular matrix (LeVeau & Bernhardt, 1984; Villemure & Stokes, 2009). However, the type and magnitude of mechanical load is very important. For example, while intermittent compression can cause bone growth, constant compression of great magnitude can result in bone atrophy and a subsequent reduction in longitudinal bone length (LeVeau & Bernhardt,

1984).

Lastly, the transition from a hunting and gathering subsistence pattern to agriculture is not always associated with a decrease in sexual size dimorphism. For example, Marchi and colleagues (2006) used diaphyseal robusticity measures obtained from cross-sectional geometric properties of the humerus and femur to explore the skeletal effects of the transition to a Neolithic subsistence strategy, which focused on terrestrial resources and pastoralism, in western Liguria, Italy. In osteological contexts, the term ‘robusticity’ refers to the strength of a skeletal element relative to some mechanically relevant measure of body size (Center for

Academic Research & Training in Anthropogeny, 2014). For certain bones of the post-cranial skeleton such as the femur or humerus, robusticity is influenced by the combined effects of body mass and physical activity (Marchi et al, 2006; Ruff et al, 1991; 1993). Measures of body mass may include long bone length (Mummert et al, 2011; Ruff et al, 1993; Shakelford, 2007), which is additionally used to estimate stature, or femoral head diameter, which has been shown to correlate strongly with body weight (Lieberman et al, 2001; Ruff et al, 1991; Wescott, 2006).

Marchi et al. (2006) used stature itself, calculated from femoral length, and bi-iliac breadth as measures of body mass to standardise femoral cross-sectional properties. This allowed the authors to isolate the effects of activity, as opposed to the combined effects of activity and body

91

mass, on bone robusticity. The findings of the study revealed that the transition from hunter– gatherer to Neolithic agricultural economies in western Liguria did not reduce functional requirements in males, nor did it decrease the level of sexual dimorphism exhibited within the population. The results did, however, demonstrate increased robusticity in male humerii, which was interpreted as evidence that males engaged in repetitive tasks using bimanual-use axes, which was likely to be associated with pastoral activities such as the procurement of fodder

(Marchi et al, 2006).

These results are consistent with other lines of evidence which suggest that sex-related differences in limb robusticity may be more greatly influenced by environmental or population- specific factors such as the physical terrain of the habitable land, use of tools or weapons, and cultural practices rather than by broad subsistence patterns (Nikita et al, 2011; Pomeroy &

Zakrzewski, 2009; Sparacello & Marchi, 2008; Sparacello et al, 2011; Weiss, 2009). For example, Nikita and colleagues (2011) used long bone cross-sectional geometric properties to examine sexual dimorphism of shape and rigidity within the Garamantian civilisation (c. 900

BC–AD 500), who lived in what is now Fezzan, Libya, and between other populations from ancient Egypt and Nubia. They found that among the Garamantes, only the male lower limbs were significantly more robust than the female ones, while the comparative sample from el-

Badari and Jebel Moya demonstrated very low levels of dimorphism (Nikita et al, 2011). The authors suggested that the lack of a consistent pattern among North African population groups is indicative of their different lifestyles. For the Garamantes, the low level of sexual dimorphism in the upper limbs is thought to conform to the pattern found in agricultural populations in general, while the fact that the males were stronger than females in lower limb total subperiosteal area most likely relates to their involvement in short and long distance mobility due to herding on the uneven Saharan terrain (Nikita et al, 2011). Sparacello and Marchi (2008) similarly suggested that femoral robusticity does not correlate directly with the level of logistical mobility in males, but is instead due to the summation of several diverse factors that place biomechanical loads on the hindlimb, particularly unevenness of the terrain (Sparacello &

Marchi, 2008). However, the gracility often seen in female femora was taken to indicate that below a certain “threshold” of mobility, for example, movement over the natural terrain, terrain conformation no longer acts as the main contributing factor to femoral robusticity (Sparacello &

92

Marchi, 2008). Cultural, behaviour-related practices were proposed by Pomeroy and Zakrzewski

(2009) to explain a high (within the hunter–gatherer range for some measurements) and statistically significant degree of sexual dimorphism in lower limb diaphyseal cross-sectional shape of skeletons excavated from the medieval Muslim cemetery of Écija, south-western

Spain. Religious textual evidence may provide an explanation for these findings given that within Muslim societies, the roles and activities (and hence mobility) of females is often strictly limited compared with males (Pomeroy & Zakrzewski, 2009).

Other researchers have explored the possible correlation between division of labour and sexual dimorphism by examining sex-related differences in activity or mobility-related stress markers in the skeleton (Derevenski, 2000; Eshed et al, 2004; Hawkey & Merbs, 1995; Molnar,

2006). For example, Hawkey and Merbs (1995) examined robusticity markers, stress lesions, and ossification exostosis at muscle and ligament insertion sites in the skeletal remains of 318 individuals recovered by the Northwest Hudson Bay Thule Eskimo Project. The authors found that activity-induced stress patterns exhibited a statistically significant dichotomy of labour between the sexes, which was consistent with ethnographic information obtained from historic

Central Canadian Inuit and modern Labrador Inuit. Similar results were obtained by Molnar

(2006) in a Stone Age population from Gotland in the Baltic Sea.

However, some authors have suggested that a cautious approach should be taken to inferring cultural, socioeconomic, or activity-related patterns from the skeleton, given that sex- related differences in skeletal morphology may simply reflect intrinsic sexual dimorphism within the human species, or have a cause other than sexual division of laour (Meyer et al, 2011;

Thomas, 2014). For example, entheseal remodelling is often used in studies as an “osteological marker of activity”, with differences between males and females being interpreted as evidence for sex-differentiated activity patterns (Havelková et al, 2011; Schrader, 2012; Villotte et al,

2010). Villotte and colleagues (2010), examining skeletal remains from European Upper

Palaeolithic and Mesolithic hunter–gatherer societies, used the absence of enthesopathies of the medial epicondyle of the humerus in females, a condition that was observed in 7.3% of males, to suggest that females did not engage in subsistence activities that involved the use of projectile hunting weapons. In a similar way, the high rate of enthesopathies observed in females living in Mikulčice Castle in ninth century Moravia (eastern Czech Republic) compared

93

with both castle-dwelling males and female workers in the castle hinterland, was used as evidence that they did not represent a privileged social class who were kept from physically strenuous tasks (Havelková et al, 2011). However, in studies using skeletal samples for which age, sex, and occupation were documented, entheseal remodelling has been found to correlate strongly with age but only moderately, or not at all, with lifetime activity (Cardoso & Henderson,

2010; Milella et al, 2012). Furthermore, other studies have suggested that the higher rates of entheseal remodelling often seen in males compared with females may be the result of factors other than differences in activity patterns, such as hormone levels and body size (Schlecht,

2012; Weiss, 2007; Weiss et al, 2012; Wilczak, 1998).

Thus, returning to the questions posed at the beginning of this section, it is difficult to propose sexual division of labour as the primary cause of sexual size dimorphism given the available skeletal evidence. According to Meyer and colleagues (2011), a major problem in this area of research is the tendency to superimpose known or presumed sex-linked cultural divergence onto observed biological processes affecting the skeleton. As these authors state “If differences between the sexes are apparent from activity studies, these are almost universally and for the most part uncritically explained as resulting from sex-specific division of labour in the respective societies” (Meyer et al, 2011). An alternative hypothesis which should therefore be considered is that any differences in skeletal morphology between the sexes are simply an outward manifestation of the intrinsic phenotypical sexual dimorphism of Homo sapiens (Meyer et al, 2011).

1.4.3 Ontogeny

Ontogeny, or ontogenesis, refers to the development of an organism, or an anatomical or behavioural feature of that organism, from the earliest stage of origin to maturity. Most studies of sexual dimorphism focus on size and proportional differences of adult individuals. However, sexual dimorphism in adults is necessarily a product of sex differences in development and growth (Plavcan, 2001). Males can become larger than females by two methods: 1) by extending a common growth trajectory and maturing later than females (known as ‘time hypermorphosis’ or ‘bimaturism’), or 2) by growing faster than females in a given period of time

(‘rate hypermorphosis’). Likewise, dimorphism can be produced if females mature earlier than males, or grow more slowly than them (Plavcan, 2001). It was initially suggested by Shea

94

(1983) that time and rate hypermorphosis were two distinct processes that resulted in adult sexual size differences within and between primate species. His comparisons indicated that sex differences in the duration of growth were mainly responsible for sexual size dimorphism within species, while differences in rates of growth accounted for variation in adult size between species (Shea, 1983). While more recent evidence does not completely refute this proposal, it does suggest that sexual size dimorphism cannot be as neatly compartmentalised as Shea had hoped. According to Leigh (1992), who documented and quantified patterns of growth in males and females of 38 primate species, sexual dimorphism within species arises as a result of both bimaturism and sex differences in growth rate. Cobb and O’Higgins (2007) examined ontogenetic patterns in the facial skeleton of African apes using geometric morphometric techniques. They found no significant difference between the ontogenetic shape trajectories of juvenile males and females in any of the species, indicating that they share a parallel ontogenetic shape trajectory in the postnatal period up to around the time of the eruption of the second permanent molar. The males and females within each species were also found to be ontogenetically scaled during the same ontogenetic period. After this period, males and females were found to diverge both from each other and from the common juvenile ontogenetic shape and scaling trajectories within each species (Cobb & O’Higgins, 2007). In other words, within and between species, male and female adult body sizes represent different endpoints on a single ontogenetic trajectory – they demonstrate bimaturism. After the eruption of the second permanent molar, body size differences between males and females of the same species and between species result from differences in the rate of growth. Other studies have demonstrated that the processes culminating in the attainment of adult body size can be highly variable, with similar levels of sexual size dimorphism resulting from substantially different ontogenetic pathways (Leigh, 1992; Shea, 1986; Taylor, 1997). For example, in chimpanzees, adult males reach a bigger body size than females primarily as a result of increased growth rate rather than by increased growth duration, whereas the opposite is true for gorillas (Berge & Penin, 2004;

Leigh and Shea, 1996).

While studies of this nature provide important insights into how males and females attain different adult body sizes, a further question of interest is why does the optimal body size differ between the sexes? The answer to this question may be related to how reproductive

95

success is affected by body size (Blanckenhorn, 2005; Fairbairn, 1997). According to Shea

(1983), intra-specific differences in growth rates and durations that influence heterochronic processes (processes that lead to a change in the timing of ontogenetic events) could be linked to sexual selection. For example, extended male growth occurs in primates when male reproductive success is concentrated in a period of dominance resulting from intense male– male competition (Lockwood et al, 2007). Assuming the top position on the dominance hierarchy typically requires a large body size, bimaturism may be viewed as part of a male strategy to delay competition with high-ranking males until the likelihood of success is greatest

(Lockwood et al, 2007). Leigh (1995) further suggested that males can follow alternative developmental pathways in response to selection pressures faced as juveniles, such as male competition, after he found that species living in single-male groups tend to achieve adult dimorphism through rate hypermorphosis, while those living in multi-male groups achieve adult dimorphism through time hypermorphosis. According to Rensch (1950; cited in Blanckenhorn et al, 2007), sexual size dimorphism increases with body size in species where males are larger and decreases with body size in species where females are larger (known as ‘Rensch’s Rule).

In other words, male body size varies or diverges more over evolutionary time than female body size, irrespective of which sex is larger (Blanckenhorn et al, 2007). In females, growth patterns are thought to be strongly influenced by female competition over resources (Plavcan, 2001), although the contribution that this makes to adult sexual size dimorphism tends to vary between species. For example, early female growth cessation in Pan paniscus and Gorilla gorilla has been shown to measurably enhance dimorphism in these species relative to Pan troglodytes, implying that variation in ape dimorphism has a strong “female component” (Leigh & Shea,

1995). As discussed in Section 1.4.1, this ‘social selection’ in P. paniscus and G. gorilla may reflect lower levels of both inter-female competition and ecological risk relative to P. troglodytes for which there is no intra-sexual competitive advantage to large female body size (Leigh &

Shea, 1995). Although the final magnitude of body size differences between the sexes is undoubtedly a sum of the selective pressures on males and females, overall it is thought that sexual selection acting on males has the greatest influence on ontogenetic processes and therefore makes the biggest contribution to adult sexual size dimorphism (Leigh & Shea, 1995;

Plavcan, 2001).

96

In summary, consideration of the available evidence discussed within this and the preceding sections suggests that while the primary cause of sexual dimorphism is genetic, whether or not males and females are likely to reach their genetic growth potential is reliant on a complex interplay of numerous other factors affecting individuals throughout their lives, including environmental factors, such as nutritional status, ecology, and climate; and socioeconomic factors such as division of labour, societal roles, and levels of activity. Duren and colleagues

(2013) summarise these relationships particularly well:

“…skeletal phenotypes result from a balance between genetic and environmental influences.

Components of the skeleton are not wholly controlled by one or the other, but rather represent

environmental adaptations that play against a backdrop of genetic constraints. The shape and

content of bone will change throughout the life span with fluctuating contributions of these

overarching influences. Integrating the study of genetic and environmental components of

skeletal form provides insight into the driving forces underlying the observed morphology that

serves as the basis for many core inquiries within physical anthropology” (Duren et al, 2013).

1.5 Fundamentals of palaeodemography: the estimation of sex

Sex is one of the most biologically basic and important pieces of information about a human being, and is usually the first parameter to be assessed during the analysis of both contemporary and archaeological human skeletal remains. Sex may be estimated by morphological and metric methods, both of which have advantages and disadvantages. Often, the methods available for use will be dictated by the state of preservation of the human remains.

In instances of good preservation, morphological examination of the bony pelvis is widely considered to be the most accurate technique, as the form of these bones is directly related to biological function. However, archaeological skeletons in particular are often damaged, fragmented, or represented by a few isolated bones only, necessitating that sex be estimated by other means. Metric methods of sex estimation therefore have considerable value in the analysis of the remains of individuals from past human populations.

1.5.1 Morphological sex estimation

Sex estimation methods of this type require visual inspection of the shape and form of discrete features of the bony pelvis and skull. Morphological sexually dimorphic features of the bony

97

pelvis result from differences between male and female growth patterns during childhood and adolescence (see Section 1.3.2.1.2), and ultimately reflect biological function. Depending on the bones available and their state of preservation, sex may be determined by examining the overall shape, size and architecture of the bony pelvis as a whole, or by examining specific features of individual bones including the subpubic angle, width of the sciatic notch, and shape of the obturator foramen and sacrum (Ferembach et al, 1980). In instances where the ossa coxae are fragmented, the os pubis is almost universally considered to be the most useful single bone for sex estimation. A morphological sexing method that provides accuracy in differentiation of males and females in excess of 95% was developed by Phenice (1969). The so-called ‘Phenice method’ requires examination of three characteristics of the pubic bone: the ventral arc, subpubic concavity, and medial aspect of the ischiopubic ramus, and was reported as being objective and easy to apply irrespective of observer experience (Phenice, 1969).

The high level of accuracy associated with the Phenice method has been confirmed by other groups of researchers (Kelley, 1978; Lovell, 1989; Sutherland & Suchey, 1991), although the findings of some studies suggested that observer experience and population specificity have a greater impact on the accuracy of the technique than was originally proposed (Table 1.5A).

Table 1.5A: Accuracy rates obtained in studies testing the Phenice method.

Study Population Accuracy, % Comments

Ubelaker & Terry Collection 88.0 Discrepancy in accuracy rate vs. Volk, 2002 (n=198); known-sex original study most likely the result of observer expreience

MacLaughlin European 83.0 (English); Results could be result of variation in & Bruce, 68.0 (Dutch); population levels of sexual 1990 59.0 (Scottish) dimorphism or observer inexperience

Bruzek, French & Portugese 95.0 High rate of accuracy requires 2002 (n=402); known-sex observation of entire os coxa, rather than just the os pubis

After the bony pelvis, the skull contains the highest concentration of sexually dimorphic morphological features (Ferembach et al, 1990). Despite their popularity, few qualitative methods of cranial sex estimation have been subjected to tests of accuracy and precision. In an attempt to correct this deficit, Rogers (2005) examined the reliability of 17 morphological traits in

98

estimating the sex of 46 documented skulls from the nineteenth century St. Thomas’ Anglican

Church Cemetery in Belleville, Canada. When all 17 features were considered in combination and each trait given equal weight, sex was accurately assessed in 89% of cases, with nasal aperture, zygomatic extension, malar size/rugosity, and supraorbital ridge proving to be the most useful features (Rogers, 2005). Updating these results, Williams and Rogers (2006) tested the accuracy and precision of the 17 cranial traits evaluated by Rogers (2005) on a collection of

50 modern crania (25 male and 25 female) from the William M. Bass Donated Skeletal

Collection, curated in the Department of Anthropology at the University of Tennessee, Knoxville.

Rogers’ suite of 17 morphological characteristics of the skull was used to estimate the sex of the specimens in the analysis with minor modifications. The features of the mandible were separated into four traits in order to evaluate each individual component and its contribution as a sex indicator. Orbital margins were additionally evaluated (Williams & Rogers, 2006). The authors identified six high-quality morphological traits, defined by intra-observer error ≤10% and accuracy ≥80%: general size and architecture, mastoid size, supraorbital ridge size, rugosity of the zygomatic extension, size and shape of the nasal aperture, and gonial angle. Ninety-six per cent accuracy and 92% precision were achieved using all traits in combination (Williams &

Rogers, 2006). Sex-related bias in accuracy was found for ramus symphysis height (P=0.009), zygomatic extension (P=0.0016), and occipital markings (P=0.0013), which were all statistically significantly more likely to be assessed as male than female (Williams & Rogers, 2006). Other features of the skull that have been objectively and independently tested include the supraorbital margin, which when used alone was found to have a classification accuracy of 70%

(Graw et al, 1999), and the external occipital protuberance, which was found not to be a definitive criterion in the estimation of sex (Gülekon & Turgut, 2003).

1.5.1.1 Reliability, scoring systems and weighting of traits

The term ‘reliability’ is generally defined as the extent to which a test or measuring procedure yields the same results in repeated trials. However, unlike the definitions of ‘accuracy’ and

‘precision’, which are well-established in the scientific literature and listed in the International

Vocabulary of Metrology guidelines (Joint Committee for Guides in Metrology Working Group,

2008: 21–22), the definition of ‘reliability’ is not consistent between different disciplines. In psychometrics, for example, ‘reliability’ means consistency – the ability to provide reproducible

99

scores – and is therefore often used synonymously with ‘precision’ (Swanson, 2014). In the

International Vocabulary of Metrology guidelines, the term ‘reliability’ is not defined as a measurement parameter; ‘reproducibility’ is used instead (Joint Committee for Guides in

Metrology Working Group, 2008: 22). According to these guidelines, measurement precision, the “closeness of agreement between indications of measured quantity values obtained by replicate measurements on the same or similar objects under specified conditions” is used to define measurement reproducibility (Joint Committee for Guides in Metrology Working Group,

2008: 22).

In physical anthropology, the terms ‘accuracy’, ‘precision’, and ‘reliability’ have not only been well-defined but differentiated in the way they are measured. In the context of this discipline, ‘accuracy’ refers to the percentage of skeletons whose sex was correctly assigned in the sample upon which the method was created; ‘precision’ relates to the level of intra-observer error (the proportion of cases in which two separate rounds of assessment conflicted); and

‘reliability’ relates to the effectiveness or repeatability of a particular methodology and may therefore be assessed in two different ways: by testing methods on different populations

(Bruzek & Murail, 2006: 229; Walrath et al, 2004; Williams & Rogers, 2006), or by testing agreement in observed or measured traits made by two different observers (inter-observer error;

Klales et al, 2012). However, this raises a particular issue in that the term ‘reliable’ may be used to indicate the success of a particular methodology in several highly distinct contexts, for example, when a method is tested on different samples from the same population or on one or more samples from a different population. In the latter context, more appropriate terms might be

‘applicability’ or ‘transferability’. Currently, however, ‘reliable’ is the standard terminology in physical anthropology to refer to a method that has been successfully applied to an independent population sample (Bruzek & Murail, 2006: 229; Walrath et al, 2004) and is therefore used in this manner in the context of this project.

The discussion of independent testing of established morphological sex estimation techniques in Section 1.5.1 above demonstrates that there is a high degree of variation in the reliability of different methods. For example, reliability rates for the Phenice method (1969) range from 59% using a modern Scottish population as the test sample (MacLaughlin & Bruce,

1990) to 88% using a sample from the Terry Collection (Ubelaker & Volk, 2002), or as high as

100

94.5% when the ventral arc alone was observed in a sample of Californian Indians (Kelley,

1978). For individual traits of the bony pelvis, reliability ranged from 65% among White South

Africans using the greater sciatic notch (Patriquin et al, 2003) to 100% for Balkan males using the subpubic angle (Đurić et al, 2005), and for individual traits of the skull reliability ranged between 29% for Balkan males using the supraorbital margin (Đurić et al, 2005) to 76% using the palate (Suazo et al, 2008). Tables 2.2D, 2.2E, and 2.2G in Section 2.2.1.5 provide a summary of the reliability rates associated with the morphological sex indicators used in this study.

A number of explanations have been put forward to account for the low level of reliability exhibited by some sex estimation traits. These include population differences in morphology (discussed in Section 1.5.1.3), observer inexperience (Ubelaker & Volk, 2002) and/or subjectivity (Bruzek, 2002), and lack of standardisation in the way morphological or discrete sex indicators are visually assessed (Bruzek, 2002). To reduce the subjectivity of this type of sex assessment method, a number of researchers have suggested the use of scoring systems and associated statistical analyses to standardise trait expression and create reliable methods for the estimation of sex from nonmetric skeletal indicators (Klales et al, 2012; Walker,

2008). The scoring system presented within Standards for Data Collection from Human Skeletal

Remains is perhaps the simplest and best-known system of this type, and has served as the basis for more complicated statistical procedures. It is a five-point scale, where (Buikstra &

Ubelaker, 1994: 21):

1 = female: there is little doubt that the structures represent a female

2 = probable female: the structures are more likely to be female than male

3 = ambiguous sex: sexually diagnostic features are ambiguous

4 = probable male: the structures are more likely to be male than female

5 = male: there is little doubt that the structures represent a male.

Buikstra and Ubelaker (1994: 21) additionally recommend that a score of zero be used to indicate ‘undetermined sex’, where there is insufficient data for sex estimation. This scoring system is in effect a ‘summary’ of the systems developed by Paul Walker, which are presented in Standards for Data Collection and were part of a collaboration to create a set of recommended methods for documenting adult sexually dimorphic skeletal features (Walker,

101

2008). A set of drawings were created showing the five stages of expression of the greater sciatic notch, nuchal crest, mastoid process, supra-orbital margin, supra-orbital ridge/glabella, and mental eminence (Buikstra & Ubelaker, 1994: 18 & 20; see Figures 2.2B and 2.2C in

Chapter 2), where a score of “1” represents minimal expression, and a score of “5” represents maximal expression. Verbal descriptions that accompany the diagrams were additionally developed in response to the questions of volunteers who tested the scoring systems (Walker,

2008).

The use of an ordinal scale to describe skeletal trait expression was first proposed by

Acsádi and Nemeskéri (1970) as a way of reducing the subjectivity of visual assessments of morphological sex indicators. Osteologists vary widely in their experience with known-sex skeletal collections, and this undoubtedly influences the amount of weight they give to their sex estimations. Such inconsistencies make demographic comparisons of collections studied by different investigators difficult (Walker, 2006). The Acsádi and Nemeskéri scale was adopted in a slightly modified form by the Workshop of European Anthropologists (Ferembach et al, 1980) as part of an attempt to standardise osteological techniques. Traits are scored on a scale that runs from –2 to +2, where –2 is “hyperfeminine”, –1 is “feminine”, 0 is “indifferent”, +1 is

“masculine”, and +2 is “hypermasculine” (Ferembach et al, 1980). This scale was subsequently taken as the starting point by Walker for the development of the new system, although it was modified considerably to remove some of its inherent limitations. For example, the Acsádi and

Nemeskéri (1970) system was developed specifically for sexing individuals of European ancestry and did not encompass the full range of human variation (Walker, 2008). In addition, the substitution of a 1–5 scale for the –2 to +2 scale was considered methodologically important because it generalises the system, allows comparable results to be produced by different osteologists, and removes the implicit assumption that the morphological condition assigned a zero value represents the optimal cut-point for separating males from females (Walker, 2008).

To test the reliability of the scoring system for the greater sciatic notch developed by

Walker and presented in Standards for Data Collection from Human Skeletal Remains, Walker

(2006) and 22 volunteers scored the same series of 10 ossa coxae from the known-sex

Hamann-Todd (Cleveland Museum of Natural History), Terry (Smithsonian Institute), and St.

Bride’s Church (London) collections. The kappa statistic was used to examine the degree of

102

inter-observer error (and hence reliability) of the scoring system. This statistic provides a measure of agreement among observers over the scores assigned to a specimen (Walker,

2006). Kappa values are scaled between 0–1, with 0 indicating the amount of agreement expected if scores were assigned randomly to specimens, and 1 indicating perfect agreement.

Inter-observer agreement was found to be highly statistically significant for all sciatic notch scores (P=0.00001), although some morphologies were found to be more difficult to score consistently than others. The kappa values for each sciatic notch score demonstrated that ossa coxae with extreme morphologies (scores of 1 and 5) could be rated more consistently than ossa coxae with intermediate morphologies. Agreement (even among observers who had no previous osteological experience) was almost perfect for the assignment of scores of 1. The lowest kappa values were found for scores of 3 and 4 (Walker, 2006).

A similar study was conducted by Walker to assess the reliability of the cranial trait scoring system he developed, and which is presented in Standard for Data Collection (Buikstra

& Ubelaker, 1994: 20). The system was tested on a sample of 304 adult individuals from the

Hamann-Todd, Terry, and St. Bride’s Church collections. The consistency with which the scoring system could be used by different observers was tested with 20 volunteers using 10 skulls, which were selected to represent the range of variation in each trait. Of the volunteers, six were professional physical anthropologists with years of osteological experience. The rest were mostly undergraduate students with little or no previous osteological training (Walker,

2008). For most traits, the Kruskal-Wallis test indicated no statistically significant difference between observers in the distribution of the scores they assigned to each cranial trait. In other words, no observers systematically assigned scores that were either higher or lower than those that other observers assigned to the same series of test specimens. The exception to this was the mastoid process (P=0.003), suggesting that different observers interpret the diagrams and instructions for scoring the mastoid process in different ways (Walker, 2008). The inter-observer error tests indicate that for most traits there is considerable scoring agreement. Even students with no previous osteological experience were able consistently to assign scores that agreed with those of very experienced osteologists. Overall, 96% of the scores assigned fell within one score of the modal value assigned by all observers (Walker, 2008).

103

The five-point ordinal scale scoring system developed by Walker was used as the basis for a new scoring system for the Phenice characteristics. As discrussed previously, the reliability rate associated with the Phenice method varies considerably between studies and populations, most likely as a result of the level of observer experience, and lack of consistency in the evaluation of the traits. According to Klales and colleagues (2012), the Phenice method is subject to a number of important limitations. For example, simply scoring the extremes of a trait

(presence/absence) fails to encompass the range of variation found in the pubis. In addition, the technique weighs all traits equally and fails to objectively assign greater weight to more informative traits, and the estimated sex is not associated with a posterior probability to quantify uncertainty (Klales et al, 2012). To address these issues, Klales et al. (2012) devised an ordinal scoring system for the Phenice characteristics, analysed the scores using statistical classification, and compared scores to quantify intra- and inter-observer agreement. In this way, the authors were able to provide a method of sex estimation with estimates of reliability and validity (Klales et al, 2012). The skeletal material used in the study was derived from the

Hamann-Todd Collection (Cleveleand Museum of Natural History) and the W.M. Bass Donated

Skeletal Collection (University of Tennessee, Knoxville). Following the methods of Walker

(2006; 2008), five grades of trait expression were described and illustrated (Figure 1.5A).

Figure 1.5A: Trait expression and ordinal scores for the the subpubic concavity/contour (top), the medial aspect of the ischio-pubic ramus (middle), and the ventral arc (VA). Figure 2 from Klales et al, 2012.

104

Each trait was scored on an ordinal scale from one to five without any assumption of

‘‘maleness’’ (masculinity) or ‘‘femaleness’’ (femininity). The Hamann-Todd Collection sample was scored by two of the study authors, as well as two additional individuals with limited osteological experience. Reliability was assessed using the kappa statistic (K), where K=0.0 shows no agreement, K=0.01 to 0.20 is slight agreement, K=0.21 to 0.40 is fair agreement,

K=0.41 to 0.60 is moderate agreement, K=0.61 to 0.80 is substantial agreement, and K=0.81 to

1.0 is almost perfect to perfect agreement (Landis & Koch, 1977). Inter-observer error using the intraclass correlation coefficient (ICC), which measures the proportion of the variance that is attributable to object measurements, for all four observers with multiple levels of experience for the entire sample rendered high values of agreement: 0.9 for the ventral arc, 0.8 for the subpubic concavity, and 0.8 for the medial aspect of the ischio-pubic ramus. This indicates high reliability of the method for scoring the features (Klales et al, 2012).

The previous discussions have demonstrated that ordinal scoring of morphological trait expression is a reliable way of estimating sex that removes some of the subjectivity associated with this type of method. However, subjectivity is not completely eliminated because, on their own, these scoring systems provide no simple means of combining scores to produce an overall sex estimate based on several different indicators (such as the cranial traits). The use of weighted scores was therefore proposed to address this issue (Acsádi & Nemeskéri, 1970;

Ferembach et al, 1980). Numerical values are assigned to each of the diagnostic sex indicators according to the five-point scale proposed by Acsádi & Nemeskéri (1970) and modified by the

Workshop or European Anthropologists, which ranges from –2 to +2. Individual features are multiplied by one, two, or three, based on their significance and importance for the estimation of sex (Walrath et al, 2004). In other words, this technique ‘ranks’ the importance of certain traits.

A weight of three indicates the greatest importance and a weight of one the least. An ‘index of sexualisation (IS)’ is then calculated (see Section 2.2.1.5). This procedure is recommended by the Workshop of European Anthropologists (Ferembach et al, 1980) and has been used by a number of researchers working with human remains from archaeological contexts (Hincak et al,

2007; Kjellström, 2004; et al, 1997; Nagy, 2008; Walrath et al, 2004).

This technique is, however, associated with several limitations. Notably, there is disagreement regarding which indicators should be considered most important for assessment

105

of sex. As Kjellström (2004) points out, the the shape of the orbit is considered to be a fairly reliable trait (weight two) by Acsádi and Nemeskéri (1970) but less reliable (weight one) by the

Workshop of European Anthropologists (Ferembach et al, 1980). In addition, the external occipital protuberance, which has been assigned a weight of two, was not found to be a definitive criterion for sex estimation in an independent test (Gülekon & Turgut, 2003). To overcome this issue, some researchers have adapted the orginal weighting method to their own purposes (Kjellström, 2004); however, this reduces the objectivity and consistency of the technique among different researchers, which is one of its primary aims.

To test the reliability of sex estimates based on 10 visually assessed traits of the skull,

Walrath and colleagues (2004) used an ordinal scoring system with trait weighting and calculation of the sexualisation index in 42 well-preserved, complete, or nearly complete Inuit crania, curated at the University of Pennsylvania Museum. For the purposes of the study, a positive IS score was indicative of a male individual, a negative IS score was indicative of a female individual, and a score in the interval ±0.2 was indicative of indeterminate sex (Walrath et al, 2004). Two of the study authors, both trained and experienced osteologists, scored features independently for each cranium in the sample. The non-parametric gamma statistic was used to assess inter-observer reliability for the two observers on the 10 cranial traits individually, while the kappa statistic was used to assess inter-observer reliability for the two observers on the estimation of sex based on the weighted index of sexualisation (IS) score. Five separate kappa scores were generated to assess inter-observer reliability: 1) the overall IS reliability based on all 10 cranial traits, 2) the IS score for the four cranial traits providing visual cues, 3) the IS score for the remaining six cranial traits without visual cues, 4) the IS score for the traits with a gamma value of 0.85 and greater (glabella, mastoid process, supercilliary arches, external occipital protuberance, and zygomatics), and 5) the IS for the traits with a gamma of less than 0.85 (nuchal plane, zygomatic process of temporal, frontal and parietal eminences, frontal profile, and orbital form). The kappa scores for these five inter-observer reliability IS tests are shown in Table 1.5B.

106

Table 1.5B: Inter-observer reliability for measures of the index of sexualisation (Source: Walrath et al, 2004).

IS measure Kappa Strength of Mean (median) IS scores P-value* agreement1

Observer 1 Observer 2

All 10 traits 0.610 Substantial 0.19 (0.410) –0.11 (–0.045) 0.001

Visual items 0.620 Substantial 0.07 (0.280) –0.11 (0.170) 0.027

Non-visual 0.495 Moderate 0.280 (0.460) –0.110 (–0.040) 0.001 items

Items with 0.785 Substantial 0.130 (0.380) –0.090 (0.250) 0.006 gamma ≥0.85

Items with 0.300 Fair 0.270 (0.400) –0.150 (–0.100) 0.002 gamma <0.85

1Landis & Koch, 1977. *Wilcoxon signed ranks test.

The results of this study highlighted the subjective nature of IS scores, which differed significantly between the two observers, as shown in Table 1.5B. Observer 1 tended to

“masculinise” the sample in all five simulations, with sample mean scores ranging from 0.070–

0.280. Observer 2 tended to “feminise” the sample, with sample mean scores ranging from –

0.090 to –0.150 (Walrath et al, 2004). The results further suggested that clarity of definition, rather than number of character traits, was critical for effective estimation of sex by the visual assessment method, given that most inter-observer discordance derived from the specific character traits with subjective definitions and no accompanying diagrams. The authors suggested that the subjective nature of IS values limits inter-observer comparisons of results, and that population comparisons of mean IS scores should be interpreted with caution when more than one investigator has performed the assessment (Walrath et al, 2004).

Weighting of traits is not the only method of assigning greater importance to specific sex estimation indicators. An alternative method involves the ranking of traits based on tests of accuracy and precision. In such instances, accuracy is determined by comparing the blind assessment of sex using a particular morphological sex indicator with the known (documented) sex of the individual, while precision is evaluated in terms of the percentage of cases in which two separate rounds of assessment made by the same observer conflicted (intra-observer

107

error). “High quality” sex estimation traits were typically defined as those with ≥80% accuracy and ≤10% intra-observer error (Williams & Rogers, 2006). The overall value of each trait was calculated by the sum of the precision rank (based on the lowest to the highest percentage of

Intra-observer error) and accuracy rank (based on the highest to the lowest percentage) to produce a combined score (Rogers, 2005; Suazo et al, 2009; Williams & Rogers, 2006).

Following these procedures, the value of morphological traits of the bony pelvis in the estimation of sex have been evaluated in a nineteenth century Canadian cemetery population (Rogers &

Saunders, 1994), and cranial traits have been evaluated in a nineteenth century Canadian population (Rogers, 2005), a modern European population from the W.M. Bass Donated

Skeletal Collection (Williams & Rogers, 2006), and a Brazilian population from the Universidade

Federal de São Paulo (UNIFESP), Brazil Skulls Museum (Suazo et al, 2009). Each study provided a list of ranked traits (see Tables 2.2E and 2.2F in Section 2.2.1.5), which may help other researchers to decide which skeletal indicators should be given the greatest weight or emphasis when producing an overall morphological sex assignment. As with the development of ordinal scoring systems, these sorts of studies help to quantify subjective visual assessments of sexual dimorphism in the skeleton, as well as to standardise osteological practice and to ensure that morphological sex estimates are based on proven reliable and accurate traits rather than favourite traits based on anecdotal or experiential ‘evidence’.

1.5.1.2 Geometric morphometrics and ‘virtual’ osteology

1.5.1.2.1 Geometric morphometric techniques

Geometric morphometrics (GM) is a relatively new method that provides a mechanism to quantify morphological characteristics. It involves the analysis of 3D Cartesian geometric co- ordinates of homologous morphological landmarks that describe the object being studied

(Bigoni et al, 2010; Lawling & Polly, 2010). When analysing the forms of biological objects, GM enables differentiation of variability due to both size and shape (Bigoni et al, 2010). In other words, GM methods involve the acquisition of landmark co-ordinates to visualise and quantify shape changes, extracting size so that subsequent analysis of morphology is performed in a manner in which shape is the only variable under consideration (Green & Curnoe, 2009; Figure

1.5B). This method of data collection and statistical analysis is considered to be advantageous relative to traditional morphometric methods because the physical integrity of the object being

108

studied is preserved, rather than collapsing the object into a series of linear and angular measures in which some aspect of ‘shape’ is generally lost (Green & Curnoe, 2009).

Figure 1.5B: Lateral rendered images and midsagittal transformation grids depicting mean female (A) and male (B) cranial shape (s=superior; a=anterior).Transformation grids have been exaggerated x10 to aid visualisation (Source: Green & Curnoe, 2009).

Geometric morphometrics represents an important approach in the evaluation of variability in the fields of bioarchaeology, evolution, and ecology (Bigoni et al, 2010). In forensic anthropology, the most frequent applications of GM relate to estimation of population affinity or ancestry, assessment of age at death, and estimation of sex. With regard to the latter parameter, it is important to note that GM methods are not intended to replace methods currently used for sex assessment. Rather, their aim is to quantify shape and characterise shape variability in order to evaluate objectively any differences in shape and compare them with other variables while preserving all of the geometric information corresponding to the original object (Bigoni et al, 2010). Nevertheless, an interesting trend appears to have developed in the literature with earlier studies suggesting that GM analysis is a valuable and reliable method to objectively confirm or refute the presence of sex differences in morphological characteristics of the skeleton observed with more traditional sex estimation methods (Oettlé et al, 2009; Steyn et al, 2004), while more recent studies suggest that GM methods could be used to discriminate between the sexes in place of the traditional morphological and metric methods

(Gonzalez et al, 2009), and that they may even prove to be more accurate and statistically robust than the standard osteological sex estimation procedures (Gómez- Valdés et al, 2012;

Kranioti et al, 2009).

109

Geometric morphometric methods have also been extensively employed in studies of sexual dimorphism, sampling, for example, the cranium and facial skeleton (Bigoni et al, 2010;

Franklin et al, 2012; Green & Curnoe, 2009; Kimmerle et al, 2008b), the ilium (Bilfeld et al,

2013), the humerus (Vance & Steyn, 2013), and the scapula (Scholtz et al, 2010). Of particular note among these studies, Kimmerle and colleagues (2008b) found that among a sample of 118

American White and Black males and females from the W.M. Bass Donated Collection and the

Forensic Data Bank (The University of Tennessee, Knoxville), sex had a significant influence on craniofacial shape for both American Whites (P=0.0024) and Blacks (P=0.0035), whereas size did not have a significant influence on shape in either Whites or Blacks (P>0.05). Thus, for each sex, individuals of different sizes were found to be statistically the same shape (Kimmerle et al,

2008b). By contrast, other studies have identified allometry, the statistical relationship between size and shape, in human crania using geometric morphometrics (Franklin et al, 2012;

Mitteroecker et al, 2013). Geometric morphometric studies have also made an important contribution to the field of ontogeny. For example, Bilfeld and colleagues (2013) used GM analysis of 10 osteometric landmarks of the ilium recorded by multislice computed tomography, from a sample of 188 children (95 male and 93 female) of mixed origins living in the area of

Toulouse, southern France, and ranging in age from one to 18 years, to investigate patterns of shape and size changes with age (Bilfeld et al, 2013). They found that the shape of the ilium became statistically significantly sexually dimorphic at 11 years of age, although visible shape differences were observed as early as one year of age. There was no statistically significant difference in size between sexes. Trajectories of shape (development) and size (growth) were additionally found to differ throughout ontogeny and between sexes, suggesting that to a certain extent, the development of shape differences was decoupled from the ontogenetic development of size differences (Bilfeld et al, 2013).

Particularly relevant to the current project is the contribution that GM techniques can make to the ongoing issue of assigning sex to individuals from archaeological populations for whom no known-sex reference sample is available. González and colleagues (2007) used GM to analyse and compare patterns of sexual dimorphism of the greater sciatic notch and ischiopubic region in two archaeological samples of unknown sex. K-means clustering analysis was used to identify two groups within each sample, which corresponded to male and female

110

morphologies. The samples were dated to the late Holocene (c. 3000 to 500 years BP) and were excavated from southeast Argentina (Chubut valley, Patagonia, Argentina; n=29) and northwest Argentina (Pampa Grande, Salta, Argentina; n=30). For the purposes of the study they were denoted Sample B1 and Sample B2, respectively (González et al, 2007). Two landmarks and 14 semilandmarks were digitised on the greater sciatic notch and two landmarks and 19 semilandmarks were digitised on the ischiopubic region. Relative warp analysis was used to describe major trends in shape variation within each sample (González et al, 2007). The first two relative warps calculated from the landmarks and semilandmarks of the greater sciatic notch of the B1 (Patagonian) sample accounted for 89.58% of the explained variance. The first relative warp explained 63.13% of the variance. The first two relative warps of the greater sciatic notch for the B2 (northwest) sample accounted for 91.5% of the explained variance, and the first relative warp explained 50.84% of all variation. The corresponding values for the ischiopubic region were 77.96% and 64.01% for the B1 sample, and 81.75% and 62.04% for the B2 sample

(González et al, 2007).

For each sample, a large percentage of the overall variation (approximately 60%) was explained by the first relative warps axis, which summarises the morphological changes due to sexual dimorphism. The main variation along this axis corresponded to the width, depth and asymmetry of the sciatic notch, and to the projection of the pubis, sub-pubic concavity, and length of the ischiopubic region. Within each sample, the first axis of relative warps analysis separated sciatic notches that were narrow, deep, and asymmetrical from those that were wide, less deep, and symmetrical (González et al, 2007). Given that the first RW axis summarises the shape variation due to sexual dimorphism, the clusters obtained within each sample using K- means clustering could be assigned as female or male according to their morphology. Thus, the combination of such techniques with multivariate analyses allows sex allocation based on within-sample variation in cases where a reference sample is not available. This approach does not allow calculation of the per cent accuracy associated with the technique. In fact, percentage of correct sex assignments realistically always remains unknown when the samples analysed are not derived from a known-sex reference collection (González et al, 2007). Although the variation found within each sample is characteristic of sexual dimorphism in humans, the relative warps analyses showed that the characteristic male and female shapes were different in

111

both samples analysed (B1 and B2), mainly with respect to the greater sciatic notch. These findings therefore support the hypothesis that patterns of pelvic sexual dimorphism vary among human populations (discussed further in Section 1.5.1.3); thus, criteria for sexing are population-specific (González et al, 2007).

1.5.1.2.2 ‘Virtual’ osteology

Like GM, computed tomography (CT) also provides digital 3D data, which is an important source of information for morphometric analysis (Badawi-Fayad & Cabanis, 2007). In recent years, medical imaging modalities such as multi-detector CT (MDCT) and magnetic resonance imaging (MRI) have played an increasingly important role in forensic medicine, for example, in postmortem examinations of trauma analysis and victim identification (Grabherr et al, 2009;

Ramsthaler et al, 2010). This led to the suggestion by some authors that 3D reconstructions of

MDCT scans may be used to collect morphological or osteometric data for sex or age at death estimation purposes (Grabherr et al, 2009; Robinson et al, 2008; Verhoff et al, 2008). The results of some of these studies are summarised in Table 1.5C.

Table 1.5C: Summary of studies examining the utility of “virtual osteology” methods

Study Design Results Comments

Gradherr et al, 22 known-sex 100% accuracy rate Sex estimated using morphological 2009 corpses, aged 17– indicators of the skull/bony pelvis, 92 years viewed as 3D reconstructions of MDCT scan data

Ramsthaler et 50 known-sex 96% accuracy Sex estimated using morphological al, 2010 corpses analysis of cranial CT scan data; no statistically significant differences in accuracy rate between the sexes; rate of inter-observer error was very low (kappa statistic: 0.83)

Robinson et al, MDCT scans of 15 No statistically significant CT scan data can be used to 2008 lower limbs in differences in estimate sex using metric varying states of measurements taken by techniques; may be of particular, decomposition CT vs. measurements of value for mummified remains that defleshed bones by direct cannot be unwrapped and may osteometric methods have suffered damage the bones of the skull or pelvis

112

Decker et al, 100 patients (40 Mean per cent error: Investigate whether 3D 2011 male and 60 2.2%; kappa statistic: volumetric virtual models can be female); had CT 1.0 (almost perfect used in the estimation of sex from scans at the agreement); logistic the pelvis and whether University of regression sex ‘‘metricising’’ nonmetric sex South Florida estimation equation: estimation traits in the pelvis will College of 100% cross-validated reduce subjectivity and increase Medicine; aged accuracy the accuracy and reliability; 19–83 years results suggest that current methods of sex estimation from the bony pelvis could be updated and improved by including in vivo data to increase the accuracy of sex assignment

In addition to the high rates of accuracy presented above, there are numerous other advantages of “digital osteology” with “virtual skeletons”, to use some of the popular terminology. For example, the CT scans are quick to perform (Decker et al, 2011; Grabherr et al, 2009), are non-invasive and negate the need for defleshing (Decker et al, 2011), allow remote analysis of data and images, as well as file sharing with other experts (Decker et al,

2011; Grabherr et al, 2009), and can be archived with other forensic-case related data (Decker et al, 2011). Furthermore, the use of imaging techniques allows the creation of modern forensic reference samples (Verhoff et al, 2008) that can be studied ad infinitum (Grabherr et al, 2009).

For these reasons, it is clear to see why the creation of procedures that allow a “virtual autopsy”, as is currently being explored by the Virtopsy Research Group (Grabherr et al, 2009; Thali et al,

2007), is so attractive in forensic contexts. However, the biggest limitation of the virtual approach is the access to an MDCT machine. While many CT systems have already been installed in institutes of forensic medicine all over the world (Grabherr et al, 2009), archaeological departments are unlikely to have access to the sorts of resources required to obtain one. Furthermore, archaeological-context bone is rarely fleshed, unless the individual was mummified or preserved by some other means, so the requirement for a minimally invasive analytical aproach is not as relevant as in forensic cases. That said, the continuing development of medical imaging procedures for forensic purposes may lead to improvements in existing sex estimation techniques which can be applied to direct osteological analysis of human remains.

113

1.5.1.3 Population differences in morphology

As pointed out by Saini and colleagues (2011), “…the population specificity of sexually dimorphic features is well known”. It is true that since the conception of physical anthropology as a scientific discipline subjective visual assessments of sexually dimorphic features of the skull and bony pelvis have been used as the basis of sex estimation (Walker, 2008), and to the present day this continues to be the most widely used sexing procedure with little refinement of the original list of morphological skeletal sex indicators. However, as early as the 1980s, researchers were publishing papers concerned with population variability in patterns of sexual dimorphism (MacLaughlin & Bruce, 1986) and more recent research has demonstrated that even within a restricted geographical region and historical period, patterns of sexual dimorphism vary, often to significant extents (Kemkes & Göbel, 2006; Walker, 2008).

To assess population differences in traditional morphological sex indicators of the bony pelvis, Patriquin and colleagues (2003) sampled 400 pairs of adult os coxa, evenly distributed between ‘Whites’ and ‘Blacks’ (individuals showing skeletal traits characteristic of European or

African ancestry, respectively), and males and females. All specimens included in the sample were obtained from individuals of documented age at death, sex and race, and were held within anatomical dissecting room collections housed in the Pretoria Collection (Department of

Anatomy, Faculty of Medicine, University of Pretoria) and Dart Collection (Department of

Anatomical Sciences, University of the Witwatersrand). The five morphological traits assessed on each os coxa were: pubic bone shape, subpubic concavity, ischiopubic ramus form, orientation of ischial tuberosity, and greater sciatic notch shape (Patriquin et al, 2003). For all five traits tested, the authors observed a disparity in diagnostic accuracy between both the sexes and races. For example, females of both racial phenotypes were most accurately classified by pubic bone shape, but White females had a higher rate of accuracy (96%) than

Black females (88%). Sciatic notch shape in females was the second best discriminator of sex

(96% accuracy in Whites, 84% in Blacks). This trait also performed well among Black males

(91% accuracy), but was a poor discriminator in White males (33% accuracy). Chi squared tests additionally revealed statistically significant differences in the effectiveness of all sex indicators between the racial phenotypes. The most diagnostic features in Whites (pubic bone shape and subpubic concavity form) both averaged 88% accuracy. In Blacks, the highest separation

114

(averaging 87.5%) was obtained from sciatic notch shape (P<0.001 for all analyses; Patriquin et al, 2003). Thus, while some traditionally applied morphological traits in the bony pelvis are relatively effective sex indicators, there is significant variation in their accuracy both in terms of sex and race, as well as by population (Patriquin et al, 2003).

Population variability of the greater sciatic notch has also been investigated by other researchers. Examining 262 pelvic bones obtained from male individuals of a contemporary

Balkan population excavated from two mass graves in Serbia between 2001 and 2002, Đurić and colleagues (2005) found that the application of the scoring system for greater sciatic notch morphology recommended by the Workshop of European Anthropologists (Ferembach et al,

1980) and in Standards for Data Collection from Human Skeletal Remains (Buikstra & Ubelaker,

1994) resulted in correct sex estimates in only 79.2% of cases, compared with an accuracy in excess of 99% using the subpubic angle and ventral arc. Traditional cranial sex indicators were additionally found to perform poorly in this modern Balkan population, further highlighting the impact of interpopulation variability in the expression of sexually dimorphic skeletal features

(Đurić et al, 2005).

Using the new ordinal scoring system and a set of diagrams he devised for estimating sex from the greater sciatic notch (discussed in Section 1.5.1.1), Walker (2006) investigated the accuracy of this skeletal feature when used to assign sex in individuals from different population groups. A total of 296 skeletons of documented age at death and sex were sampled from two distinct skeletal collections, the Hamann-Todd/Terry anatomical collections (Cleveland Museum of Natural History and Smithsonian Institute, respectively), which consist of the remains of

Americans of European and African ancestry, and the St. Bride’s sample, which consists of people who died in London and were buried in the crypt of St. Bride’s Church, Fleet Street

(Walker, 2006). The results of the study demonstrated that the distributions of sciatic notch scores of Americans of European and African ancestry do not differ significantly from each other

(P=0.60 for the pooled sex sample). However, statistically significant differences in the distribution of sciatic notch scores were observed between the American sample and the St.

Bride’s collection sample (P=0.04), with English females demonstrating extreme feminine morphology (scores of one) significantly more frequently than American females (P=0.06;

Walker, 2006). English males were also assigned a statistically significantly higher proportion of

115

sciatic notch scores of one and two than were their American counterparts (60% vs. 41%, respectively; P=0.02). The author concluded that population differences, such as the significantly wider sciatic notch observed in English individuals compared with Americans of

European or African descent, have serious implications for palaeodemographical research, as they can introduce systematic errors into reconstructed mortality profiles (Walker, 2006).

1.5.2 Metric sex estimation

Sex is estimated metrically using discriminant functions or logistic regression equations that are applied to a set of measured skeletal dimensions. In instances where recovered skeletal remains are incomplete or damaged, metric methods may be the only option available for estimating sex. In addition, this type of method is often preferred because the metric equations are easy to apply, result in fewer indeterminate cases, and more readily lend themselves to statistical testing and data manipulation. Metric sex estimation methods have been formulated using measurements of the skull (Giles & Elliot, 1963), including the temporal bone (Kalmey &

Rathbun, 1996), palate (Burris & Harris, 1998), occipital condyle (Gapert et al, 2009) and mandible (Giles, 1964), vertebrae (Wescott, 2000; Zheng et al, 2012), the bony pelvis (Kelley,

1979; Schulter-Ellis et al, 1983), sacrum (Flander, 1978), and almost every post-cranial element, including amongst others long bones (Berrizbeitia, 1989; Black, 1978a; Holland, 1991;

Holman & Bennett, 1991; İşcan & Miller-Shavitz, 1984a, 1984b; Purkait, 2001; Seidemann et al,

1998), metacarpals (Scheuer & Elkington, 1993), metatarsals (Robling & Ubelaker, 1997), tarsals (Bidmos & Asala, 2003) and ribs (İşcan, 1985). Levels of accuracy and precision are well documented throughout the literature, with some authors demonstrating that their metric methods may produce cross-validated accuracy rates of 90% when tested on the original study population (Seidemann et al, 1998).

A large body of evidence exists pertaining to the utility of the dentition in the metric estimation of sex. Numerous studies have demonstrated the presence of statistically significant sexual dimorphism in the dimensions of permanent mandibular canines (Kaushal et al, 2003;

Pereira et al, 2010; Pettenati-Soubayroux et al, 2002), maxillary canines (Khangura et al, 2011;

Yuwanati et al, 2012), maxillary molars (Sonika et al, 2011), mandibular second premolars

(Vodanović et al, 2007), tooth root length (Zorba et al, 2013), and tooth weight (Schwartz &

Dean, 2005). Among a sample of modern Greeks, Zorba and colleagues (2011) noted

116

statistically significant differences between males and females in maximum mesiodistal crown diameter for the maxillary canine, maxillary first premolar, maxillary premolar, maxillary third molar, mandibular canine, mandibular first premolar, mandibular second molar, and mandibular third molar, and maximum buccolingual crown diameter for all teeth excluding the mandibular second premolar (Zorba et al, 2011). In addition to the permanent dentition, the level of sexual dimorphism in deciduous teeth has been quantified in a number of different studies (Harris &

Lease, 2005; Kuswandari & Nishino, 2004; Ling & Wong, 2007; Liversidge & Molleson, 1999).

Attempts to estimate the sex of juvenile remains using odontometry have produced mixed results, as summarised in Table 1.5D.

Table 1.5D: Summary of studies examining odontometric methods of sex estimation in juveniles.

Study Sample Key findings Sex estimation accuracy, %

Black, 133 White children (69 Male mean measurement statistically 63.9–67.7% 1978b male and 64 female); significantly greater than the female University School Growth mean measurement for MD length of all Study of the University of four incisors and the BL breadth of Michigan upper central incisor

Żądzińska et 113 children from the 12th– Males found to be statistically 69% in males, th al, 2008 16 centuries significantly larger than females for 88% in females archaeological site Stary MD diameter of the maxillary Brześć Kujawski in the second molar, MD and BL Kujawy region of central diameters of the mandibular first Poland molar, and BL diameter of the mandibular second molar

Cardoso, Test of Black (1978b) and Concluded that deciduous tooth crown 46.2–60.0% 2010 Żądzińska et al. (2008) size does not show any significant (cross-validated) DFs on 46 known-sex value as a discriminator of sex skeletons from Portugal

Viciano et Known-sex contemporary Odontometric methods of sex 78.1– 93.1% al, 2013 Spanish estimation in juveniles may be useful as adjuncts to other accepted procedures

BL, buccolingual; MD, mesiodistal; DFs, discriminant functions.

According to recommendations developed by the Workshop of European Anthropologists, the deciduous teeth “represent the only factor useful for sex diagnosis” in children (Ferembach et al,

1980). By comparison, the large overlapping of male and female measurements of the

117

permanent teeth led to the suggestion that “sex diagnosis really cannot be based on teeth” in adults (Ferembach et al, 1980). However, the WEA recommendations were published in 1980 and therefore do not take into account the findings of more recent research.

Examination of the literature reveals numerous recent studies sampling various populations in which logistic regression equations or discriminant functions are presented for the estimation of sex from dental measurements (Acharya et al, 2011a; Angadi et al, 2013; Hassett,

2011; İşcan & Kedici, 2003; Khamis et al, 2014; Macaluso, 2010; Macaluso, 2011; Prabhu &

Acharya, 2009; Thapar et al, 2012; Viciano et al, 2013; Zorba et al, 2012, amongst others).

Several of these studies report very high rates of correct sex classification associated with the equations or functions; however, a large proportion report levels of accuracy that failed to meet the 80% cut-off point at which metric methods are considered useful (Rogers, 1999). Table 1.5E provides a summary of studies examining the utility of the dentition in sex estimation of adults.

Table 1.5E: Summary of studies examining odontometric methods of sex estimation in adults.

Study Sample Sex estimation accuracy, %*

Acharya et al, 105 young adults from the College of Dental 90.5% (after random deletion of 2011a Science in Satur, India tooth variables)

Hassett, 2011 Known-sex sample of skeletons from St. 93.8% Bride’s Church, London

Viciano et al, Known-sex Spanish sample 87.5% 2013

Zorba et al, 2012 344 permanent molars in 107 individuals from 86.0% the modern, known-sex Athens Collection

Angadi et al, 2013 600 dental casts of young adults (reference 68.1% (maxillary teeth); 73.9% sample); 69 known-sex subjects (test sample) (mandibular teeth); 71% (both jaws)

Macaluso, 2010; Black South Africans 73.6% (maxillary molar cusp 2011 diameters); 74.5% (cusp areas)

Thapar et al, 200 subjects of Indian origin 76% 2012

İşcan & Kedici, 100 dental students from Ankara University 73–77% 2003

Khamis et al, 400 young adults from Malaysia 70.2–78.2% 2014

*Cross-validated.

118

Given the results of the studies presented above, it is difficult to recommend odontometric sex estimation as the single method for estimating the sex of human remains.

Indeed, several authors suggest that the best use of the technique might be as an adjunct to other standard sex estimation methods (Khamis et al, 2014; Viciano et al, 2013; Yuwanati et al,

2012), or in instances where no other methods of sex estimation are available (Macaluso, 2010;

Macaluso, 2011). Dental methods of sex estimation are not recommended in Standards for Data

Collection from Human Skeletal Remains (Buikstra & Ubelaker, 1994) or in forensic anthropological handbooks (Byers, 2008) or guidance papers (Rösing et al, 2007), perhaps because of several issues associated with the measurement of teeth. For example, a high level of precision compared with bone measurements is required, given that teeth are relatively small and comparisons between different sexes or populations often involves mean differences of fractions of a millimetre (Hillson, 1996: 288). In addition, many of the methods discussed above require complete dentitions to be present in order to obtain the greatest rates of accuracy.

Unfortunately, this is not likely to be the case in archaeological contexts and among populations who suffered a considerable amount of tooth wear (Hillson, 1996: 289).

In addition to the issues associated with odontometric methods of sex estimation discussed above, metric techniques based on the skeleton are also subject to some important limitations. For example, long bone methods often require measurement of the intact length, as well as additional dimensions, which is often impossible if the bone is broken or parts are missing. Other issues relate to population differences in skeletal size and proportions, reduced accuracy observed in tests of the applicability of methods to dissimilar population samples, and the suitability of reference collections. Previous research has demonstrated that metric methods used to estimate sex in different population samples often result in lower accuracy rates than were reported in the original investigation (Cowal & Pastor, 2008; Marlow & Pastor, 2011). Male and female body size and skeletal proportions often overlap, and males from one population may actually be smaller than the females from another population (Migliano et al, 2007). Metric methods for estimating sex are therefore particularly prone to error because they are based on absolute differences in measured skeletal dimensions and the sectioning points used to separate the sexes are only reliably applied to the population used to create the technique

(Rogers, 2005).

119

In seeking a solution to this problem, Albanese (et al, 2008; 2013) used a reference collection which they stated exhibited a wide range of human variation in an attempt to create a reliable and accurate metric method of sex estimation that was not population-specific.

Measurements and angles of the proximal femur and ossa coxae were used to create two logistic regression equations, which when tested on an independent sample, resulted in an allocation accuracy of 95–97% (Albanese et al, 2008). Unfortunately, this study suffers from several limitations. Firstly, the authors state that a wide range of human variation was included in the reference sample. Yet, only around 300 individuals were sampled and these were derived from only one skeletal collection – the Terry Collection – which has been used by numerous other researchers to create metric sex estimation equations, none of whom claimed that their equations were free from population-specificity (Giles & Elliot, 1963; İşcan & Miller-Shaivitz,

1984b; Berrizbeitia, 1989; Holman & Bennett, 1991; Seidemann et al, 1998). Furthermore, the

Terry Collection represents a rather uniform sample of White and Black Americans of European or African descent, who lived and died in the nineteenth and twentieth centuries. Secondly, the method was tested on the Grant Collection, which, despite exhibiting a differing pattern of sexual dimorphism to the Terry Collection (Albanese et al, 2008) was amassed according to a very similar and strict protocol and also consists of individuals of primarily European descent who died in the twentieth century (Bedford et al, 1993). Thus, one may question whether the reference (Terry) and target (Grant) collections are suitably different to support the claims of the authors that the metric sex estimation method produced is not population-specific. Added to this is the complexity of the method created by Albanese and colleagues (2008), thus it is not particularly surprising that the method has never been widely adopted into routine osteological practice.

The obvious conclusion from these several lines of evidence, therefore, is that metric methods of sex estimation should be population-specific, and the recognition of this fact by researchers has led to the creation of metric equations for numerous archaeological and modern populations, including prehistoric Scottish (MacLaughlin & Bruce, 1985), prehistoric

Central Californian (Dittrick & Suchey, 1986), New Zealand Polynesian (Murphy, 2005), medieval Croatian (Šlaus & Tomičić, 2005), ancient (Jomon period) Japanese (Özer &

Katayama, 2008), modern South African (Barrier & L’Abbé, 2008), Guatemalan (Rίos Frutos,

120

2005), and Greek (Steyn & İşcan, 2008). However, to date, only two studies have addressed the need to create population-specific metric sex estimation equations for ancient Egyptian populations. These studies are summarised in Table 1.5F.

Table 1.5F: Studies creating population-specific sex estimation methods for the ancient Egyptians.

Study Population Methods Key findings

Raxter, 71 adults; Measured stature, Multiple analysis of variance (MANOVA) 2007 predominantly maximum vertical diameter results showed that sex had a significant from of the femoral head (FHD), effect on all the variables examined Predynastic maximum femoral length (P<0.001); histograms further revealed Period Keneh (XFL), circumference of the body breadth distributions to be more and Old tibia at the nutrient foramen bimodal than body lengths, with FHD being Kingdom Giza (TC), maximum length of the most sexually dimorphic; accuracy of the tibia (XTL), maximum metric sectioning points: 89%; testing of vertical diameter of the sectioning points developed on other humeral head (HHD), and populations resulted in reduced accuracy maximum humeral length (56% using TC [Symes & Jantz, 1983; (XHL) cited in Raxter, 2007]; 54% using FHD and 51% using HHD [Stewart, 1979]).

Dabbs, 27 adults from Five dimensions of scapula Equation with the highest predictive 2010 New Kingdom value used three of the five variables Tell El-Amarna (maximum scapula length, maximum length of the scapular spine, and height of the glenoid fossa); overall accuracy of classification: 88%

Though Raxter and Dabbs both make an important contribution to this field of research, a limitation with the methods created in both of these studies is that neither has been tested using an independent and dissimilar population sample. In addition to these two studies which created metric sex estimation equations for ancient Egyptian populations, numerous methods have also been created for modern Egyptians using measurements of the talus (Abd-elaleem et al, 2012), maxillary sinus (Amin & Hassan, 2012), metacarpals and phalanges (El Morsi & Al Hawary,

2013; Eshak et al, 2011), temporal bone (El-Sherbeney et al, 2012), mandible (Kharoshah et al,

2010), and patella (Moneim et al, 2008). The purpose of all of these studies was to create population-specific sex estimation methods that could be used in forensic contexts, for example, to identify Egyptian victims of mass disasters or unnatural deaths that resulted in incineration,

121

dismemberment, mutilation, or other modification procedures that made the body and the individual difficult to identify. All studies sampled living or recently deceased Egyptian people, meaning that the methods were created using known sex reference individuals or specimens.

Table 1.5G provides a summary of the “modern” or “living Egyptian” sex estimation methods.

Table 1.5G: Summary of studies presenting metric sex estimation methods for use in the living Egyptian population.

Study Bone/ Sample n Age Equations Accuracy Precision dimensions range

Abd- Talus; 12 Modern; 110 20–60 yrs Univariate 83.5– 0.9–0.99* elaleem et dimensions cadavers sectioning 90.9% al, 2012 measured points and directly discriminant functions

Amin & Maxillary Living 96 20–70 yrs Discriminant 62.5% in NR Hassan, sinus; 8 functions females; 2012 dimensions 70.8% in obtained from males CT scans

El Morsi & MCs & Living 100 17–65 yrs Univariate 88–94% NS Al Hawary, phalanges; sectioning 2013 length points and logistic regression equations

El- Temporal Living 120 <1–70 yrs Discriminant 78–84% NR Sherbeney bone; lateral functions (adults et al, 2012 angle obtained only) from CT scans

Eshak et MCs & Living 122 18–30 yrs Univariate 76.6– NR al, 2011 phalanges; sectioning 92.9% length points and obtained from discriminant CT scans fucntions

Kharoshah Mandible; 6 Living 500 6–60 yrs Discriminant 84.2% in NR et al, 2010 dimensions functions females; obtained from 83.6% in CT scans males

122

Moneim et MTs & patella; Living 160 25–65 yrs Univariate 70.0– NR al, 2008 sectioning 71.1%† for points and patella; discriminant 95.0– functions 97.2%† for MT length

Yrs, years; NR, not reported; MCs, metacarpals; NS, not significant; MTs, metatarsals. *Correlation coefficient of reproducibility (Pc). † Cross-validated accuracy rate.

1.5.2.1 Computer program-based methods

To facilitate the estimation of sex using metric data, and to overcome some of the limitations associated with standard morphological or metric methods, a number of different computer software programs have been developed. Perhaps the two best-known programs are

Probabilistic Sex Diagnosis (Diagnose Sexuelle Probabiliste, DSP) and FORDISC. According to

Murail and colleagues (2005) morphological sex estimation methods, which involve visual inspection of discrete skeletal sex indicators, are subjective and require the observer to have considerable experience. In addition, morphological methods are often difficult to apply if the bones of the pelvis or skull are damaged or fragmented (Murail et al, 2005). Metric estimation of sex using discriminant function analysis requires comparison of a calculated discriminant score with a population-specific sectioning point, which may be problematic because of the existence of an overlapping area between the male and female distributions (Bruzek & Murail, 2006: 236;

Murail et al, 2005). Thus, the closer the discriminant score is to the sectioning point, the less reliable is the sex diagnosis. To overcome these issues, Murail and colleagues (2005) created the DSP program for sex estimation using the ossa coxae. For each specimen, the probability of its being male or female is calculated. In this respect, the statistical process resembles logistic regression (see Section 2.5.4.3).

The DSP program is based on a worldwide metrical database consisting of 2,040 adult known sex ossa coxae from 12 different reference populations from four geographical areas

(Europe, Africa, North America, and Asia). Using discriminant analysis, it was demonstrated that the different populations of modern humans share a common pattern of sexual dimorphism, suggesting that the program is not population-specific (Murail et al, 2005). A total of 10 pelvic dimensions based on previously published osteometric distances may be recorded in the specially constructed spreadsheet; at least four dimensions in any combination are required to

123

estimate sex, meaning that the method is applicable even to fragmented bones (Murail et al,

2005). Sex is allocated only if the posterior probability is equal to or greater than 0.95. Based on this threshold, sex was assigned in 40–91% of the sample, depending on the combination of dimensions used, and the accuracy rate was found to range from 98.7% to 100%.

Independent testing of the DSP program found similarly high rates of accuracy. Using a sample of 58 individuals of known and unknown sex (the latter sexed using the Phenice characteristics) McMullan (2013) obtained an accuracy rate of 96% when all 10 dimensions were input into the DSP program. Chapman and colleagues (2014) used 49 dry ossa coxae of unknown sex from the Body Donation Program of the Université Libre de Bruxelles to compare the accuracy of the DSP tool with standard morphological assessments of sex and with measurements obtained both ‘manually’ using a sliding caliper and ‘virtually’ from CT scans of the bones. They obtained a 97% consistency rate between the morphological and DSP methods, as well as a 100% consistency rate between the ‘manual’ and ‘virtual’ DSP methods

(Chapman et al, 2014). Despite this, the DSP method of sex estimation has not been cited particularly well in the English language literature. It was used by Nielson (2011) to estimate sex in a sample of skeletons from two medieval Danish cemeteries, and was acknowledged by

Vacca and Di Vella (2012) as one of several methods that incorporated worldwide variability of sexual dimorphism of the ossa coxae. However, the limited number of citations of this method in the literature does not mean that the program is not widely used in practice. That said, it is possible that the greatest application of the technique is in laboratory situations given that is it not a very convenient method to obtain quick and simple estimates of sex in the field.

A second well-known computer program that can be used to estimate sex, ancestry, or stature is FORDISC. It was created by Jantz and Ousley in 1993 and is based on metric data from more than 1,200 cases from the American Forensic Data Bank and 1,700 nineteenth and twentieth century reference cases from the Terry and Hamann-Todd Collections. It additionally incorporates the Howell’s worldwide cranial data set (Guyomarc’h & Bruzek, 2011; Howells,

1973; Ramsthaler et al, 2007). The program computes discriminant functions using cranial or standard postcranial measurements, and is a popular tool both in the US and around the world

(Leach et al, 2009; Márquez-Grant, 2005; Ramsthaler et al, 2007; Verhoff et al, 2008) because it allows standardisation of methodology and is fast, efficient, and straightforward to use

124

(Guyomarc’h & Bruzek, 2011; L’Abbé et al, 2013; Ramsthaler et al, 2007). Despite this, the software has recently been questioned and criticised, with most of the criticism focused on its ability to assess ancestry (Elliott & Collard, 2009; Hubbe & Neves, 2007; Leathers et al, 2002;

Ubelaker et al, 2002; Williams et al, 2005).

Independent tests of the ability of the FORDISC program to correctly assign sex have produced mixed results (Kimmerle et al, 2008a; McMullan, 2013). For example, McMullan reported an accuracy range of just 24–67%; Kimmerle and colleagues (2008a) found the

FORDISC program to commonly contradict morphological sex estimates when the North

American metric data was used to assign sex in Rwandan populations; and Decker and colleagues (2011) obtained an unacceptably low rate of correct sex classification in males

(68%) but a high rate in females (98%) when using FORDISC to estimate sex based on pelvic measurements obtained from virtual images created using CT scan data. Other authors have further found the program to be highly population-specific (Guyomarc’h & Bruzek, 2011; L’Abbé et al, 2013; Ramsthaler et al, 2007). Like the DSP program, the application of FORDISC is realistically limited to laboratory-based situations rather than fieldwork contexts. Furthermore, the population specificity of the discriminant functions that FORDISC generates suggests that at present, its use should be limited to North American populations.

1.5.2.2 Population differences in body size and proportions

The key factor affecting the accuracy of metric sex estimation methods when applied to dissimilar population samples is variation in human body size and proportions, which have been found to vary considerably among living human populations (Ruff, 2002). A worldwide sampling of populations found that mean intersex body breadths vary by around 25%, and height by 10%

(Ruff, 2002). Variation in the former additionally showed a clear latitudinal gradient, suggesting that the cause of this and other systematic body shape differences may involve basic physiological adaptive mechanisms (Ruff, 2002), in addition to other intrinsic and extrinsic factors. Within every population, the phenotypic expression of adult human body size and body shape results from synergistic interactions between hereditary factors and environmental conditions experienced during growth, the latter including climatic conditions, diet, subsistence strategy and access to resources, activity levels, and disease (Vercellotti et al, 2011). This is supported by observations of body size variations even within closely related groups likely to

125

share the same maximum genetic potential (Bogin et al, 2002). Despite this, the obvious contribution of genetic factors cannot be ignored (Livshits et al, 2002).

The biological effects of adaptation of a population to its environment is a particularly relevant consideration with regard to the ancient Egyptians given the geographic isolation of their habitation area, the hot, dry desert climate, their reliance on an agricultural economy from the Predynastic Period onwards, as well as limited genetic flow (inbreeding) in the elite classes

(Masali, 1972). Ancient Egyptian body size was evaluated by Masali (1972) using a sample of skeletal remains from the Egyptian Osteological Collection held at the Institute of Anthropology of Turin. The majority of skeletons were from the rather unspecific “Dynastic Period” of Assiut

(n=127) and (n=133); an additional 60 skeletons were dated to Predynastic Period

(Naqada I) Gebelein (Masali, 1972). The parameters measured for each individual were stature, bone robustness (measured using the robustness perimeter to length index), girdle proportions

(breadth of the bony pelvis as a measure of the pelvic girdle and twice the length of the clavicle as a measure of the thoracic girdle), and the correlated distribution of longitudinal and transverse skeletal dimensions. The results of the study indicated that from the Predynastic

Period to Dynastic times the general body length underwent no important changes, while the transverse dimensions radically changed, especially in males. Stature, though slightly higher in

Predynastic Period individuals than Dynastic Period individuals, was found to change little over time, a result that has subsequently been contradicted by other researchers (Zakrzewski, 2003), and was comparable with neighbouring populations, excluding those from the Sudan (Masali,

1972). Compared with a number of other human populations, Dynastic Period Egyptians exhibited the lowest values for bone robustness, with Predynastic Period individuals exhibiting low–average values. In terms of girdle proportions, the ancient Egyptian populations studied appeared to have been mainly composed of individuals of both sexes with wide shoulders and narrow pelves. The expression of these characteristics was found to increase from Predynastic to Dynastic times. These results therefore appear to indicate the presence of a microevolutionary pattern from the Predynastic Period to Dynastic times, which manifested as a gradual reduction of variability, convergence of skeletal morphology between the sexes, and gracilisation (Masali, 1972). The physical characteristics and stature of Eighteenth and

Nineteenth Dynasty was additionally investigated by Robins and Shute (1983), who

126

found that this sample of individuals exhibited Negroid limb characteristics, in that the distal segments were relatively long in comparison to the proximal segments (Robins & Shute, 1983).

This limb ratio is thought to be characteristic of individuals from warmer climates (Ruff, 1994), given that the distal segment of the limbs has strong heat dissipating qualities (Tilkens et al,

2007). Thus, considering the observations that the ancient Egyptians possessed more gracile body plans than European populations, and that the ratio of distal to proximal limb length reflects adaptation to a warm climate, it is not unreasonable to speculate that the application of metric sex estimation methods based on the skeletal size and proportions of modern populations to ancient Egyptians will result in substantially reduced rates of accuracy.

1.5.2.3 The role of reference collections

Ideally, all new metric methods of sex estimation should be created using a documented skeletal collection of known sex and age at death. This ensures that the methods are based on accurate rather than estimated sex assessments, and increases the validity of the resulting technique. Unfortunately, skeletal reference collections of this type are only available for a small number of populations, among which the ancient Egyptians do not feature, and those that do exist are subject to a number of important biases and limitations.

Until relatively recently, the majority of metric sex estimation equations available for use in the disciplines of physical and forensic anthropology were created using known-sex skeletal collections of nineteenth/twentieth century White and Black North Americans of European or

African descent (Giles & Elliot, 1963; İşcan & Miller-Shaivitz, 1984b; Berrizbeitia, 1989; Holman

& Bennett, 1991; Seidemann et al, 1998; Wescott, 2000). A prime example of this type of collection is the Terry Collection, which was primarily amassed in the 1920s by Dr R. J. Terry

(1871–1966) and consists of skeletons collected from the cadavers of White and Black

Americans that were used in the anatomy classes of medical schools. The collection derives from the lower socioeconomic classes from St. Louis and Missouri, and predominantly consists of individuals whose bodies became the property of the state when they died and were not claimed, or whose relatives signed over the remains to the state. The bodies were subsequently turned over to the Washington University Medical School for cadaver research (Hunt &

Albanese, 2005). After the retirement of Dr Terry in 1941, the collection was passed to Mildred

Trotter (1899–1991), whose biggest contribution was to attempt to correct the collection’s

127

demographic composition by focusing on the collection of White females, which until this point had been greatly underrepresented as a result of social and economic factors during the early part of the twentieth century (Hunt & Albanese, 2005). However, this change in strategy introduced several sources of bias into the collection, not least because the changing attitudes of the time meant that the majority of these females were donated or willed to the collection

(Komar & Grivas, 2008). The findings of a recent study suggested that individuals who donate or will their bodies to anatomical collections before their death generally have a higher education level and come from a higher socioeconomic background than individuals who were donated by family members or the legal authorities after their death. Certain comparisons, particularly those between White and Black females, may therefore not represent the same socioeconomic group, thereby introducing bias (Komar & Grivas 2008).

This issue was investigated by Ericksen (1982), who found that “willed” females had numerically longer femora that the “regular” females; however, this trend did not reach statistical significance. In addition, length was found to be positively correlated with age in the Terry

“willed” group and negatively correlated in the Terry “regular” group (Ericksen, 1982). The author suggested that this was likely to mean that the Terry “regular” females were shorter than the “willed” females, which has important implications for studies using the Terry Collection to construct stature estimation equations (Ericksen, 1982). However, whether these differences in femoral length represented secular changes or are evidence that the Terry Collection is not representative of the US population is not clear from this study. More recent research demonstrated that the collection is not representative of the living population in St. Louis in the first half of the twentieth century when compared to historical census data (Hunt & Albanese,

2005). A further study comparing the documented skeletal collection curated at the Maxwell

Museum at the University of New Mexico to annual demographic information from three relevant populations found that the collection differs significantly from all three populations in terms of age, sex, ethnicity/race, and cause and manner of death (Komar & Grivas, 2008). Thus, while collections such as the Terry Collection have been an important resource for physical and forensic anthropologists to develop and test metric methods of sex estimation, they are at best only really a model for one highly specific modern sample and at worst a ‘manufactured population’ that should not be considered a proxy for any racially or ethnically defined

128

population (Komar & Grivas, 2008). Either way, these types of collections certainly cannot be expected to serve as valid analogues for past human populations, particularly those that were known to have possessed gracile body plans such as the ancient Egyptians (Masali, 1972).

1.5.3 Molecular sex estimation

Recent advances in molecular genetics have provided alternative approaches to morphological and metric sex estimation, using DNA preserved within bones and bone fragments (Matheson &

Loy, 2001). Genetic sexing utilises molecular differences between the sexes for identification, the most obvious genetic difference between males and females being the sex chromosome complement (Matheson & Loy, 2001). Therefore, the most commonly used molecular systems for sex estimation from archaeological human remains are based on the differences between the amelogenin gene on the X chromosome and its counterpart, a pseudogene, on the Y chromosome (AMELX/AMELY) (Daskalaki et al, 2011).

One of the first studies to demonstrate the ability to extract and analyse DNA from ancient remains was published in 1984 by Higuchi and colleagues (1984), who identified DNA sequences from the Quagga, an extinct member of the modern zebra family. A year later, the first human DNA sequence data were obtained from a 2,400-year-old Egyptian mummy (Pääbo,

1985a; 1985b), although it is highly likely that this finding was the result of contamination

(Hagelberg & Clegg, 1991). Occurring almost simultaneously was the development of the polymerase chain reaction (PCR), a molecular technique capable of amplifying millions of copies of short, specific fragments of DNA in vitro, which revolutionised the field of ancient DNA

(aDNA) research, as well as molecular genetics in general (Kaestle & Horsburgh, 2002). Since then, numerous studies have been published that characterise aDNA sequences from geographically distinct archaeological populations (Arnay-de-la-Rosa et al, 2007; Bauer et al,

2013; Faerman et al, 1995; Faerman et al, 1998; Gibbon et al, 2009; Kim et al, 2011; Lin et al,

1995; Matheson & Loy, 2001; Mays & Faerman, 2001; Meyer et al, 2000; Skoglund et al, 2013;

Stone et al, 1996).

For example, Luptáková and colleagues (2011) extracted DNA from 25 skeletons dating to eighth–ninth, ninth, or twelfth–seventeenth century AD context burials from western Slovakia.

Separate amplifications of DXZ4 repetitive satellite sequences on the X chromosome, and the

SRY gene on the Y chromosome, were performed using nested PCR (Luptáková et al, 2011).

129

This method is thought to overcome the problem of allelic drop-out, a frequent issue associated with the amplification of degraded aDNA, which results in the lack of X or Y homologous amelogenin PCR products. Sex was additionally estimated using morphological methods for all individuals represented by sufficient cranial and pelvic material. After PCR amplification of DNA sequences, not one extraction gave successful SRY sequence identification. The authors suggested that failure in SRY locus detection may have been due to the high rate of DNA fragmentation. DXZ4 sequences of 91 base pairs in length were obtained in 23 of 25 samples, giving a relatively high amplification success rate of 92%. Sex was determined for all of these 23 individuals. In 20 individuals, sex was additionally estimated using morphological methods; for

17 of these individuals (85%), the results of the morphological and molecular sex estimation procedures were consistent, indicating both the authenticity of the amplification products

(Luptáková et al, 2011), and the accuracy of traditional methods of sex estimation that rely on observation of the shape and form of specific features of the skull and bony pelvis.

Other studies, however, have found genetic techniques problematic and have reported disappointing levels of agreement between the results of morphological and molecular sex determination methods. In one study examining the skeletal remains of five Neolithic individuals recovered from a Pitted Ware Culture cemetery at Ajvide, Gotland, Sweden, the morphological and molecular sex assessments were consistent in only two of the five samples (Götherström et al, 1997). However, the authors noted that osteological identification of sex was not always clear due to the fragmentary nature of the remains, which further highlights problems in the use of modern techniques applied to skeletal populations in which females are generally considered to be relatively robust (Arnay-de-la-Rosa et al, 2007; Götherström et al, 1997).

Molecular methods of sex estimation are further hampered by concerns about authenticity, arising both from a high risk of allelic drop-out, and the danger of contamination from exogenous present day sources of DNA in archaeological material (Skoglund et al, 2013).

For example, the reliability of claims about the recovery of authentic DNA from ancient Egyptian mummies and skeletons has been questioned based on findings of the DNA decay rate (Marota et al, 2002). Drawing on two kinds of evidence, half-life calculations based on the rate of DNA depurination (determined in a previous study), and aspartic acid racemisation (AAR) which has been shown to be linked to DNA decay, Marota and colleagues (2002) suggested that the

130

preservation limit for archaeological DNA in Egypt is likely to be less than 1,000 years. The authors do, however, state that particularly favourable burial conditions may allow human DNA to be preserved for an especially long time-span (Marota et al, 2002). Furthermore, a more recent study has suggested that the extent of AAR does not correlate with DNA amplification success, and that AAR is therefore not a useful screening tool for DNA in ancient bone samples

(Collins et al, 2009). Given these findings, it is therefore of paramount importance that aDNA is sufficiently authenticated, and considerable effort has gone into assuring that research results reflect endogenous target sequences rather than modern contaminants (O’Rourke et al, 2000).

Nevertheless, studies of the type discussed above further serve to highlight the importance of morphological techniques in the estimation of sex from archaeological human remains, as well as the refinement of metric sex estimation methods to make them more applicable to specific populations such as the ancient Egyptians.

1.6 Aims, objectives, and hypotheses

The aim within this project is to investigate the expression of sexual dimorphism in ancient

Egyptian populations, changes in the degree of expression over time and by geographic location, and the effect that these changes have on the level of accuracy obtained using modern and population-specific metric sex estimation equations. This will be achieved by addressing the following objectives:

 To collect metric data from a wide range of skeletal elements and from a large sample

of skeletons from different time periods and geographic locations within ancient Egypt

 To use the collected data to test the accuracy of popular and well-established methods

of metric sex estimation created using modern population samples

 To use discriminant function analysis and logistic regression to create new metric sex

estimation equations that are specific to ancient Egyptian samples from different time

periods and/or cemetery sites

 To objectively test the accuracy of the newly created population-specific metric sex

estimation functions and equations using a temporally and geographically distinct

sample of ancient Egyptian skeletons

131

 To explore changes in the degree or pattern of sexual dimorphism over time using a

sexual dimorphism index and statistical tests

 To establish the level of intra- and inter-observer error associated with the

measurement of skeletal dimensions

 To relate the findings of this research to previous studies and explore the implications of

the results in terms of ongoing studies sampling ancient Egyptian human skeletal

remains.

1.6.1 Hypotheses

The hypotheses presented in Table 1.6A are based on the research questions posed in Section

1.1.1 and were formulated with reference to the studies discussed in the preceding sections, and in the context of the methodological approach undertaken in this project, described in

Chapter 2.

Table 1.6A: Study hypotheses.

Sexual dimoprhism

Question H0 H1

Did sexual dimorphism There are no statistically Skeletal size and proportions exist in ancient Egyptian significant differences between are statistically significantly populations? male and female skeletal size different between males and and proportions females

Did sexual dimorphism There are no statistically The degree of sexual change over time? significant differences in the dimorphism exhibited by degree of sexual dimorphism populations from different time exhibited by populations from periods is statistically different time periods significantly different

Did sexual dimorphism There are no statistically The degree of sexual differ by geographic significant differences in the dimorphism exhibited by location? degree of sexual dimorphism populations from different exhibited by populations from cemetery sites/locations is different cemetery sites/locations statistically significantly different

132

Metric sex estimation

Question H0 H1

How applicable are modern Modern methods of metric sex Modern methods of metric sex methods of metric sex estimation produce unacceptably estimation can be accurately estimation to ancient low accuracy rates when applied applied to ancient Egyptians Egyptian skeletal remains to ancient Egyptian skeletal remains

How accurate are Population-specific methods of Population-specific methods of population-specific methods metric sex estimation produce metric sex estimation produce of metric sex estimation? unacceptably low accuracy rates very high accuracy rates

How accurate are Population-specific methods of Population-specific methods of population-specific methods metric sex estimation produce metric sex estimation produce of metric sex estimation unacceptably low accuracy rates very high accuracy rates when when tested on a temporally when tested on a temporally or tested on a geographically or or geographically distinct geographically distinct ancient temporally distinct ancient ancient Egyptian sample? Egyptian sample Egyptian sample

133

2 MATERIALS AND METHODS

2.1 Materials

2.1.1 Identification and selection of skeletal collections

The starting point for identification of collections of human skeletal remains that might be suitable for this research was a search of the literature for previous studies that had sampled ancient Egyptian skeletons. The skeletal samples described within these studies enabled the construction of a list of potentially suitable collections held around the world in museums and academic institutions, as shown in Table 2.1A.

Table 2.1A: Collections of ancient Egyptian skeletal remains previously sampled in published studies.

Collection Location Previous studies

Peabody Museum of Archaeology and Harvard University, Raxter, 2007; Raxter et al, Ethnology, Egyptian collection Boston, USA 2008

Naturhistorisches Museum Wien, Vienna, Austria Davide, 1972; Raxter et al, Reisner Collection 2008; Zakrzewski, 2003 & 2007

University of Cambridge, Duckworth Cambridge, UK Davide, 1972; Zakrzewski, Collection 2003 & 2007

Smithsonian Institution National Washington DC, USA Raxter et al, 2008 Museum of Natural History, Egyptian Collection

Natural History Museum, Egyptian London, UK Zakrzewski, 2003 & 2007 Collection

Department of Anthropology and Turin, Italy Davide, 1972; Torre et al, Biology at the University of Turin, 1980; Zakrzewski, 2003 & Marro Collection 2007

The , Egyptian London, UK Davide, 1972 Collection

Research of each collection listed above revealed that two were unsuitable for the aims of the current project: the British Museum collection, which consists primarily of mummified remains

134

and skeletons from Nubia, and the Smithsonian Institution collection, which contains a large number of individual skeletal elements but very few complete skeletons (Hunt, David; Personal

Communication, 2012). One further Egyptian collection, held at the Natural History Museum in

London, was initially shortlisted but later discounted when it was announced that the osteological storerooms and research work areas of the museum would be closed for a period of at least six months to allow for essential maintenance and refurbishment, as well as inspection and conservation of the bones and their storage materials (British Association for

Biological Anthropology and Osteoarchaeology mailing list; Personal Communication, 2012).

This left four possible sources of ancient Egyptian skeletal material for use in this study:

 The Peabody Museum of Archaeology and Ethnology, Harvard University, Boston, USA

 The Naturhistorisches Museum Wien (Natural History Museum), Vienna, Austria

 The Leverhulme Centre for Evolutionary Studies, Duckworth Laboratory, University of

Cambridge, Cambridge, UK

 Department of Anthropology and Biology at the University of Turin, Turin, Italy.

Although not one of the institutions identified in the initial literature search, the Biological

Anthropology Department at the National Research Centre in Cairo, Egypt, was also put forward as a source of skeletal material by the original supervisor of the current research project,

Professor Rosalie David. Although Professor David was successful in seeking links with this department, the Egyptian Revolution, which began in the spring of 2011, effectively ended the ability to travel to and conduct research in Egypt, and only relatively recently has sporadic access for archaeological teams and researchers resumed.

Data collection began in Boston, USA, at the Peabody Museum of Archaeology and

Ethnology. This institution was selected as the starting point of the current research for both practical and logistical reasons. For example, a large amount of information about the collections at this museum, and the application process for visiting researchers, is available in the public domain, the relevant members of staff at the museum were very quick to respond to email enquiries and to process the research application, and the collection itself was thought to be very well suited to the aims of the current project as it consists of a large number of complete skeletons from a range of different time periods and geographic locations within Egypt, as shown in Table 2.1B.

135

Table 2.1B: The Egyptian series from the Peabody Museum collection (Source: Herschensohn, Olivia; Personal Communication, 2011).

Region Time period Number of skeletons

Adult Subadult

Keneh Predynastic 174 18

Mesaeed Predynastic 18 0

Giza Old Kingdom 61 8

Sheikh Farag Middle Kingdom 20 6

Lisht 12th Dynasty, c. 2000–1950 BC 1 0

Meir Late 12th Dynasty, c. 1900–1800 BC 2 0

Thebes 18th–20th Dynasty 1 0

20th Dynasty, c. 1160 BC 1 0

21st Dynasty 4 0

Siwa 400–100 BC 103 1

Nuri c. 362 BC 1 0

Meroe AD 75 3 0

Thebes - 2 1

Mesaeed - 135 17

Total 526 51

Data collection from this Egyptian series, which was primarily amassed by George A. Reisner

(1867–1942) during the early part of the twentieth century, was conducted over a one month period. In that time, a total of 105 skeletons were analysed. These skeletons were predominantly derived from the Predynastic Period Keneh and Old Kingdom Giza sub- collections, as according to the Peabody Museum inventory these contexts offered the greatest

136

number of complete skeletons. A detailed description of how and why skeletons were chosen for inclusion in the study sample is provided in Section 2.1.2.

The skeletons in the Old Kingdom Giza sub-collection of the Peabody Museum’s

Egyptian series were primarily excavated from Cemetery G2100 of the Western Cemetery (see

Section 2.1.3). Adjoining Cemetery G2100 is Cemetery G4000. This latter cemetery formed part of the German–Austrian expedition concession and was excavated by Hermann Junker (1877–

1962) of the University of Vienna between 1911 and 1913 (Filce Leek, 1980; Junker, 1914). The skeletons unearthed during this time are currently held at the Natural History Museum (NHM) of

Vienna and date to the early Old Kingdom (Fourth Dynasty). Further analysis of Old Kingdom skeletons from a neighbouring Giza cemetery therefore seemed the next logical step in the current project. Thus, the second data collection visit, lasting a period of two weeks, was to the

NHM, Vienna. This institution holds a total of 182 skeletons dating to Old Kingdom Giza. Of these skeletons, 10 are the remains of juvenile individuals, and were therefore not required for the present research (see Section 2.1.2), and 172 skeletons are the remains of adults (Guld,

1995). Of the adult skeletons, 57 were complete or nearly complete and were therefore included in the study sample. Two crania of unambiguous sex were additionally analysed giving a total sample size of 59 individuals from the NHM, Vienna collection.

The Duckworth Collection, held at the University of Cambridge in the UK, was selected for inclusion in the present study because it contains a large sample of crania from Giza that dates to a much later period of Egyptian history compared with the Old Kingdom material. This temporal separation would therefore allow analysis of changes occurring in the cranial dimensions and proportions of males and females over time. The sample in question dates to the Late Period (Twenty Sixth to Thirtieth Dynasties) and is known as the ‘Gizeh “E” series’ or the ‘Pearson Collection’, so named after Professor Karl Pearson, a biometrician at University

College London and the first person to measure the crania (Filer, 1992; Pearson & Davin,

1924). A total of 1,726 skulls and crania are attributed to the Gizeh E series, as well as 813 isolated mandibles that cannot be associated with specific crania. They were excavated from a cemetery located south of the great pyramids at Giza in the winter of 1906 by W. M. Flinders

Petrie. In his report for the British School of Archaeology in Egypt, which was published in 1907,

Petrie states that:

137

“The later burials at Gizeh yielded very little that was worth note, although a large number of

tombs were opened, and we collected about 1,400 skulls of about 600–300 BC, which are now

at University College, London, for study in Prof. Karl Pearson’s department” (Petrie, 1907: 29).

In total, 154 crania from this collection were examined and measured during a two-week research trip. At this point, the total study sample size was considered to be sufficiently large to enable meaningful statistical analyses to be performed. As such, permission to access the Marro

Collection held within the Department of Anthropology and Biology at the University of Turin, Italy, was not sought.

2.1.2 Selection of skeletons

Skeletons from the three institutional collections described above were selected for inclusion in the study sample based on a number of predefined and collection-specific criteria. The predefined skeletal selection criteria for all three collection samples were:

 Adult individuals, as demonstrated by complete epiphyseal fusion of all long bones

 Presence of sufficient pelvic and/or cranial material to permit the initial assignment of

sex

 Ability to unequivocally assign, sex based on pelvic and/or cranial morphology

 No evidence of pathology or trauma affecting the metric proportions of the bones

studied.

Collection-specific criteria relate to the preservation, quality, and completeness of individual skeletons within each of the three collections sampled. Complete skeletons were considered very important due to the requirement to collect a large amount of metric data from each individual (see Section 2.2.2.1). For the Peabody Museum and NHM, Vienna collections, complete skeletons were therefore preferentially selected for inclusion in the study sample over incomplete or fragmented skeletons. If a skeleton was incomplete or fragmented, those with intact pubic bones were preferentially selected over those that had fragments of the bony pelvis commonly used for sex morphological estimation, such as the greater sciatic notch or sacrum.

Similarly, skeletons with fragments of the bony pelvis from which a reliable estimation of sex could be made were preferentially chosen over those that did not have any pelvic material. The

Peabody Museum collection was ideally suited to this process of selection given that detailed inventories were available that listed both the level of completeness and presence of pelvic

138

material for each skeleton within the Egyptian collection. The initial choice of sub-collection was therefore decided on the basis of completeness; this further dictated the initial choice of time period and cemetery site included in the study. As mentioned above, the study sample initially consisted of skeletons from Predynastic Period Keneh and Old Kingdom Giza. When these sub- collections were exhausted the Middle Kingdom Sheikh Farag sub-collection was sampled, as it contained a high proportion of complete skeletons or skeletons that were incomplete but had pelvic material from which a reliable estimate of sex could be made. Choice of skeletons was again dictated by the preferential selection criteria outlined above. Finally, two named skeletons from the Peabody Museum collection, Yi-neferti and her son Khonsu, were included in the sample from this institution. These skeletons, which date to New Kingdom Thebes, are important because they represent two individuals of known (documented) sex.

The curatorial staff at the NHM Vienna were unable to provide an inventory for the

Egyptian series within their osteological collection. However, the Old Kingdom Giza sub- collection had previously been inventoried by an MSc-level student; thus, information presented within the student’s thesis was used to identify which individuals were represented by both cranial and post-cranial material, and which were represented by a cranium or skull only (Guld,

1995). With the exception of two skulls, which exhibited unambiguous sexually dimorphic morphological indictors and were part of a sample of skulls that was seriated (see Section

2.2.1.2), only skeletons with both cranial and post-cranial material were included in the study sample. Preferential choice of skeletons was again based on the process outlined above.

The absence of post-cranial material associated with the skulls and crania that make up the ‘Gizeh “E” series’ held at the University of Cambridge represented a significant challenge in the sampling of this collection. Choice of individuals from this collection was therefore based on one primary criterion: the presence of unambiguous morphological sex indicators allowing a definite estimate of sex. Unfortunately, formal seriation could not be used to assist in the estimates of sex because at the time of the research trip the Duckworth Laboratory research guidelines stated that the remains should be analysed one catalogue number at a time (the guidelines have since been revised to state that “if large numbers of remains need to be examined simultaneously for comparative purposes, the Laboratory’s research curator should be informed first”) (University of Cambridge, 2012). Despite this, small series of crania were

139

inspected while in the laboratory’s storage area before relocation of the required boxes to the research work areas for formal analysis. The initial ‘informal inspection’ of small series began with the first catalogue number in the series (E1) and proceeded sequentially.

2.1.2.1 Data collection

Having selected a skeleton that met the inclusion and suitability criteria outlined above, the analysis and data collection phase proceeded as follows. The skeleton was laid out in anatomical position. This included siding of bilateral elements and seriation of ribs and vertebrae. The skeleton was then photographed, with a label giving the skeleton ID number in clear view.

The initial step in the examination and recording of human skeletal remains is usually the creation of a detailed inventory that serves as both an important description of materials and as a basis for comparative analyses (Brickley, 2004: 6; Buikstra & Ubelaker, 1994: 5). Several systems for recording the completeness and preservation of skeletal remains are available; the system selected will largely depend on the specific research questions to be addressed

(Brickley, 2004: 6). In the present study, a formal inventory procedure was not employed. This decision was based on a number of factors:

 Inventorying skeletal remains is a very time-consuming process and not a good use of

the limited time available to access the three ancient Egyptian collections sampled in

this research project

 The form used to record metric data served as its own inventory of sorts. If a bone

was present, and the necessary landmarks intact, it was measured; if, on the other

hand, the bone was absent, it clearly could not be measured and was indicated as

such on the form. Bone frequencies could therefore be obtained from the recording

forms. Similarly, the pelvic and cranial morphology recording forms provided data on

the number of skeletons that consisted of pelvic material only, cranial material only, or

both

 Digital photographs were taken of all skeletons laid out in anatomical position; these

could be consulted at any point if additional details regarding completeness or bone

frequency within a particular sample are required

140

 Traditional purposes of skeletal inventories, for example to collect data on non-metric

traits, trauma, or pathologies, were not relevant to the present research.

For each skeleton or isolated skull/cranium included in the study sample the following data were recorded using museum records: collection name, location within museum, skeleton

ID, cemetery location/region, period, excavation date, grave, tomb or shaft number, and date of analysis. All data were recorded on specially constructed recording forms (see Appendix 7.1,

Sections 7.1.1–7.1.4) and later transferred to Microsoft Excel spreadsheets. Digital photographs were taken of specific bones and features used in the estimation of sex and age at death for referral at a later date. Photographs taken of the Gizeh ‘E’ cranial series consisted of anterior, lateral, and posterior views for each individual.

2.1.3 Cemetery sites and excavations

2.1.3.1 Keneh (Qena)

Keneh (Qena) is one of the seven Provinces of Upper Egypt. It is located around 652 kilometres south of Cairo on the east bank of the Nile, opposite the temple of Denderah (Budge, 1890;

Figure 2.1A). Information about this site in antiquity is relatively scarce; however, it is mentioned by several authors as a source of clay and marl used to make pottery (Bard, 1999: Butzer,

1974), and a port from where mined rock was shipped to other parts of Egypt such as

Alexandria (Klemm & Klemm, 2001). Information about Keneh as a cemetery site is even more limited; it is likely that the skeletons attributed to this location and held at the Peabody Museum of Archaeology and Ethnology, Harvard University, Boston were in fact part of the c. 1,450 skeletons excavated from the nearby Predynastic Period cemetery known as Naga-el-Hai by

Louis C. West, a member of the Harvard University/Boston Museum of Fine Arts joint expedition team, in 1913 (Bard, 1994; Reisner, 1930: 241–247). This cemetery was extensively plundered in antiquity making it difficult to date; however, it is thought to span the Naqada II and III periods

(c. 3500–3000 BC) (Bard, 1994).

141

Figure 2.1A: Map of ancient Egypt showing the location of Keneh (Qena) (Adapted by JJL from: http://oi.uchicago.edu/research/lab/map/maps/egypt.html, with additions/amendments. Accessed February

2013).

142

The Predynastic Period skeletons from Keneh, curated at the Peabody Museum, have been examined by a number of previous researchers. Trinkaus (1975) observed a relatively high frequency of squatting facets and anterior rounding of the distal tibial articulation in a sample of skeletons from this sub-collection, which was taken as evidence of habitual squatting, as during normal locomotion the foot is seldom dorsiflexed sufficiently to approach the articular limits of the ankle. The Predynastic Period Keneh skeletons have additionally been sampled for research in the context of metric sex estimation (Raxter, 2007), stature estimation (Raxter et al,

2008), developmental lesions in teeth (Sognnaes, 1956), growth and development of the facial skeleton (Landauer, 1962), and the morphometrics of bipedal gait (Rhoads & Trinkaus, 1977;

Wallace et al, 2008).

2.1.3.2 Sheikh Farag

The area known as Sheikh Farag is part of the site of Naga-ed-Dêr, a large necropolis that served as a burial ground for the village of Thinis (This) from the Predynastic Period to the

Middle Kingdom. The village of Naga-ed-Dêr is situated on the east bank of the Nile, around

160 kilometres north of , opposite the town of Girga (Figure 2.1B). The site of Naga-ed-

Dêr is located within the Thinite polity, one of the Predynastic proto-kingdoms out of which the unified state of Egypt emerged (Delrue, 2001: 21–66). The entire site stretches for two kilometres from the Coptic monastery and modern village of Naga-ed-Dêr to the area of Sheikh

Farag, so named after the tomb of a local Islamic holy man. The ruined tomb sits on a spur of limestone, the promontory of which falls away from the edge to a depression about 100 metres wide and then rises to a high hill overlooking the ravines or wadis which separate it from the main cliff.

143

Figure 2.1B: Map of Egypt showing the location of Naga-ed-Dêr (Adapted by JJL from: http://oi.uchicago.edu/research/lab/map/maps/egypt.html, with additions/amendments. Accessed February

2013).

144

Excavation of the Naga-ed-Dêr site was initiated in 1901 by George Reisner. His attention had been drawn to the site by J. E. Quibell (1867–1935), Chief Inspector of the Department of

Antiquities, who was concerned that an ancient cemetery was being plundered by illicit excavators (Reisner, 1901: 23–24). Work began in February 1901 and continued intermittently until 1924, initially under the auspices of the Hearst Egyptian Expedition (1901–1904; Reisner,

1905: 132) and from 1905 onwards, for the Joint Harvard University/Museum of Fine Arts

(MFA), Boston Expedition. The initial focus of the excavations was the Predynastic and Early

Dynastic period cemeteries, lying to the south of the site, which Reisner described as being “of unique importance” given that the discoveries and collections obtained were:

“…so complete that every stage of the early development of Egyptian civilization was followed in

unbroken sequence to the end of the Middle Empire. The burials at this site were found in an

unusual condition of preservation, and thus have provided a splendid opportunity for the

determination of the race of these earliest inhabitants of Egypt” (Reisner, 1905: 132).

Excavation of the large Predynastic Period cemetery, designated N7000 by Reisner, was primarily undertaken by A. M. Lythgoe (1868–1934), a student of Reisner, between 1902 and

1904. During this time, at least 834 individuals from 635 numbered graves were unearthed

(Podzorski, 1990; Savage, 1998). Unlike other excavations, where the human remains were largely ignored or only the crania were retained, all the Predynastic Period remains from Naga- ed-Dêr were excavated and subsequently turned over to Grafton Elliot Smith (1871–1937), who used the series to help compose his theories of the early racial character of Egypt and Nubia

(Podzorski, 1990; Smith & Jones, 1910).

In addition to cemetery N7000, other notable cemeteries include N1500 (primarily First

Dynasty), N3000 (Second Dynasty), N3500 (Second and Third Dynasties), and N500 and N700

(Third to Fifth Dynasties) (Mace, 1909: 1; Reisner, 1908; Savage, 1997). The ‘N’ in these designations refers to Naga-ed-Dêr and was used by Reisner to denote those cemeteries that were excavated during the years of the Hearst Expedition; similarly ‘SF’ (Sheikh Farag) was later used to denote cemeteries at the northern end of the site that were excavated by the

Harvard–MFA Expedition. Unlike many of his contemporaries, Reisner was a meticulous archaeologist who did not believe in the practice of randomly selecting museum-quality objects

145

from tombs at the expense of historical context (Reisner, 1908). His excavation methods therefore followed a number of set principles (Reisner, 1908):

 To have an organised staff of Europeans and of workmen trained in all branches of

the work

 To follow careful methods of excavation and recording

 To excavate whole sites and whole cemeteries

 To make a complete record of all stages of the work using drawings, photographs,

and notes

 To publish records on a tomb-by-tomb basis as often as is practical.

In his report of the Early Dynastic cemeteries at Naga-ed-Dêr, published in 1908, Reisner states that:

“The excavation of individual tombs, while interesting and at times valuable, does not provide that

sufficiency of continuous material which is necessary to justify conclusions on the development of a

civilization such as we have in Egypt. The discovery of beautiful objects is, of course, greatly to be

desired; but the search for Museum specimens is an offence against historical and archaeological

research which is utterly unworthy of any institution which pretends to be devoted to the

advancement of knowledge” (Reisner, 1908: VIII).

The excavations at Naga-ed-Dêr therefore commenced in an organised and systematic fashion, beginning with the division of cemeteries into numbered strips to which teams of Egyptian workmen were assigned (Kroenke, 2010). For each tomb, the excavators recorded the measurements of the archaeological features, noted the relative positions of the tomb contents, and provided descriptions and sketches of many of the artefacts in the field records (Kroenke,

2010). Reisner additionally mandated the extensive use of photography, a practice that resulted in a little over 7,350 photographs of the Naga-ed-Dêr cemeteries (Der Manuelian, 1992). These photographs are primarily housed at the Museum of Fine Arts in Boston. Some selected photographs and documentation from the Hearst Expedition were additionally sent to the

University of California (Kroenke, 2010).

The tomb of Sheikh Farag, and the cemetery named after it, lies on a hill to the north of the Predynastic Period cemetery. This site was designated Cemetery 9000 by Reisner. It contained a number of shaft graves and mud-brick mastabas of the Eighth to Twelfth

146

Dynasties. The slope of the hill behind the plateau on which the tomb of Sheikh Farag sits also contained rock-cut tombs of the same period, which had been extensively plundered in both ancient and modern times (Reisner, 1908). Several other cemeteries, known as

SF 200, SF 500, and SF 5000, were additionally attributed to the area of Sheikh Farag by members of the Harvard University–MFA expedition, which returned to the site in 1912,

1913, and 1923 (Bard, 1999). A total of 29 skeletons (20 adult and 9 juvenile) were excavated from the Sheikh Farag cemeteries; these are currently curated at the Peabody

Museum of Archaeology and Ethnology at Harvard University, Boston, USA (Herschensohn,

Olivia; Personal Communication, 2011). These skeletons have not been well studied by previous researchers, possibly because of the small size of the sample. Only one published paper could be found in which this sub-collection of the Peabody Museum’s entire Egyptian collection had been accessed for osteological analysis. In this paper, seven of the 20 adult skeletons were included as part of a larger sample used to develop population-specific metric sex estimation equations and sectioning points (Raxter, 2007). In comparison, the skeletons excavated from other cemeteries within the Naga-ed-Dêr site have been quite extensively studied in a number of different research contexts (Podzorski, 1990). These include analysis of activity patterns (Schrader, 2012), tumour prevalence (Strouhal, 1976), angulation of the basiocciput (Anderson, 1983), vertebral arch defects (Barkley, 1978), sex estimation (Derry, 1909), and tibial form and function (Derry, 1907). Unfortunately, the large

Predynastic Period series from cemetery N7000 no longer exists in its entirety; within a few years of its arrival in Cairo the collection was broken up and a large part of it destroyed

(Podzorski, 1990: 10).

2.1.3.3 Giza

2.1.3.3.1 Old Kingdom Contexts

The Giza Necropolis is located around nine kilometres into the desert from the old town of Giza on the Nile, some 25 kilometres southwest of Cairo (Figure 2.1C), and served as a burial ground within the Memphite region when the political capital of newly unified Egypt was shifted to

Memphis in around 3100 BC, under King Narmer (Hoffman et al, 1986).

147

Figure 2.1C: Map of ancient Egypt showing the location of Giza (Adapted by JJL from: http://oi.uchicago.edu/research/lab/map/maps/egypt.html, with additions/amendments. Accessed February

2013).

It sits on the so-called ‘Giza Plateau’, the top of a bed of limestone known as the Mokattam

Formation (Kemp, 2006: 187). Although the site contains early Dynastic tombs, it was not until the Fourth Dynasty that the large-scale royal construction projects culminating in the great pyramids of Khufu (c. 2589–2566 BC), Khafra (c. 2558–2532 BC), and (c. 2532–

2503 BC) began. In the area surrounding the Great Pyramid of Khufu are two important fields of tombs, shown in Figure 2.1D, known as the Eastern Mastaba Field (or Eastern Cemetery), which was reserved primarily for members of the royal family, and the Western Mastaba Field

148

(or Western Cemetery), which came to hold the tombs of the governing classes and high officials (Der Manuelian, 2009: 23; Reisner, 1942). It is from the Western Mastaba Field that the

Old Kingdom Giza skeletal remains examined in this project were derived. They were excavated by George Reisner, Harvard University/Boston Museum of Fine Arts, and Hermann Junker

(1877–1962), University of Vienna, in the early years of the 1900s.

Figure 2.1D: Plan of the Giza Necropolis showing the Eastern and Western cemeteries (Mastaba Fields) of the pyramid of Khufu. (Source: http://en.wikipedia.org/wiki/Giza_Necropolis. Accessed July

2013).

Reisner’s involvement in the excavation of the Giza Plateau began in 1903 when he and two colleagues, Professor Steindorff of Leipzig and Professor Schiaparelli of Turin, submitted applications for archaeological works to the Director of the Department of Antiquities

(Reisner, 1911). All three applications were granted with the request to divide the site amicably among the three institutions. Reisner was allocated the pyramid of Menkaure, and by drawing lots, the northern strip of the Western Mastaba Field, shown in Figure 2.1E (Reisner, 1911).

149

Figure 2.1E: Overview plan of the Giza necropolis with the American, German–Austrian, and Egyptian concessions indicated. Figure 2.1 from Der Manuelian, 2009: 24.

Within the Western Mastaba Field, Reisner identified three principal, core, or ‘nucleus’ cemeteries, known as Cemetery G1200, G2100 and G4000, as well as a large isolated mastaba, G2000, and the Cemetery en Echelon, which adjoins Cemeteries G2100 and G4000 on the east (Reisner, 1942). Cemetery G4000, which formed the central component of the

German–Austrian concession excavated by Junker, consists of 41 core mastabas and is the largest and most regularly laid-out of the nucleus cemeteries, occupying an area of solid, level rock (Der Manuelian, 2009: 23). Cemetery G2100 consists of 12 core mastabas with subsidiary burial shafts and tombs, built on sound but uneven rock (Der Manuelian, 2009: 28). Despite

150

being attributed to the reign of Khufu in the Fourth Dynasty, evidence from Cemetery G2100 suggests use and reuse through multiple phases and eras, most likely spanning the Fourth to

Sixth Dynasties. In addition, long delays between construction and occupation, as well as the unfinished state of many of the mastabas, may suggest that the ultimate owners of Cemetery

G2100 tombs may not have actually been Khufu-era individuals (Der Manuelian, 2009: 28).

Excavation of Cemetery G2100 began on the 14th January 1904 using methods of excavation that Reisner and his team had been following for a number of years, namely “…to clear away the sand to the surface of decay, to make notes and photographs of this surface, and then to cut away the debris of decay to the surface on which the cemetery was built. As a rule, only those burial pits were opened which gave evidence of having been plundered”

(Reisner, 1905: 136). Excavation of the American concession lasted for three seasons until

1906. During that time, Reisner and his team additionally identified the royal cemetery of Khufu in the Eastern Mastaba Field, as well as an intrusive cemetery reserved for priests and high officials. Using careful and meticulous archaeological techniques, for which Reisner was famed, he was able to show that the great cemetery had fallen into decay and had been covered with sand by the end of the Sixth Dynasty (Reisner, 1911). Following this successful three-year campaign, Reisner’s attention turned to the excavation of the Pyramid Temple of Menkaure, which he completed in seasons spanning 1906–1907, and 1909–1910 (Reisner, 1911). Work in the Western Mastaba Field resumed in 1912 with the excavation of the cemetery lying to the north of the pyramid of , on the opposing side of Cemetery G4000 to Cemetery G2100

(see Figure 2.1E) (Reisner, 1915). Excavations continued in 1913–1914, 1914–1915, and

1915–1916 with short campaigns in Cemetery G4000, formerly the German–Austrian concession (Reisner, 1930). The royal cemetery of Khufu in the Eastern Mastaba Field was opened in the 1924–1925 season and excavation work continued there until 1929. During this time, the intact ‘secret’ tomb of Queen Hetep-heres I, the mother of Khufu, was discovered and excavated in painstaking detail (Reisner, 1930: 246; Reisner, 1955).

The skeletal remains of the individuals interred in Cemetery G4000 are now housed in the Natural History Museum in Vienna, Austria, while those excavated from Cemetery G2100 by

Reisner are curated at the Peabody Museum of Archaeology and Ethnology, Harvard

University, Boston. These remains have been accessed by a number of previous researchers,

151

who included them as part of larger samples in studies investigating stature and body proportions (Raxter et al, 2008; Zakrzewski, 2003), and population affinity (Zakrzewski, 2007), or as a complete series for the investigation of cranial pathology (Filce Leek, 1980; 1984).

2.1.3.3.2 Late Period Contexts

The Late Period cranial remains included in the study sample were excavated in the winter of

1906 by , who was leading a field school for the British School of Archaeology in

Egypt. Work began in Giza (Gizeh) on the 1st December 1906 and continued until the beginning of April in the following year (Petrie, 1907: 1). The cemetery from which the crania were derived was located around a mile south of the great pyramid of Khufu, on the southward slope of the hill. Petrie’s attention had been drawn to the area because of the presence of a large quantity of stone chips. Large spaces were cleared to reveal only rubble-core masonry; however, slightly further up the hill a number of chambers were identified (Petrie, 1907: 28). Gradually, the funeral chapel of Thary was uncovered, as well as a large number of tombs. In his 1907 report,

Petrie largely dismisses the skeletal remains found within these tombs as not being ‘note worthy’ (Petrie, 1907: 29). Furthermore, he describes the collection of a large number of skulls, at the request of Karl Pearson for study at University College London (UCL), with no mention of the post-cranial remains or what became of them. As such, very little information exists about this collection. Hand-written notes, which accompanied the crania and mandibles when they were transferred from UCL to the Duckworth Laboratory at the University of Cambridge, describe a sample of 1,726 specimens that date to the Twenty Sixth to Thirtieth Dynasties (Filer,

1992). Unfortunately, no other information is available.

Despite the lack of documentary information, the ‘Gizeh “E” series’ has been extensively investigated and measured by previous researchers. Table 2.1C provides a summary of previous studies in which the crania and/or mandibles were examined. In many of these studies the series was included as part of a larger time-successive sample with the aim of establishing the biological affinities of the ancient Egyptians, as discussed in Section 1.2.1.1.

152

Table 2.1C: Previous studies using the Gizeh “E” series.

Study Number of crania Purpose and/or area of research examined

Brace et al, 1993 53 Investigation of the biological affinities of the ancient Egyptians

Filer, 1992 1,726 A comparison of the type and nature of head injuries in Egyptian and Nubian populations

Hanihara, 2000 100 Investigation of facial and frontal flatness in 112 major human populations

Howells, 1973 111 Craniometric differences among human populations

Irish & Friedman, 62 Dental affinity of C-Group Nubians found at 2010 Hierakonpolis

Irish, 2006 62 Dental evidence for Egyptian origins and affinities

Keita, 1988 51 Evaluation of population affinity using discriminant function analysis of craniometrics

Keita, 1990 51 Evaluation of population affinity using discriminant function analysis of craniometrics

Keita, 2004 111 Investigation of craniofacial variation in Africa using data from the Howells database

Martin, 1936 432* The metric estimation of sex from the mandible

Nikita et al, 148 Geometric morphometric analysis of cranial shape 2012b and gene flow in North Africa

Pearson & Davin, c. 1,600 Craniometric analysis and calculation of cranial 1924 capacity and indices

Pearson & c. 1,600 Presentation of formulae for calculating cranial Stoessiger, 1927 capacity from external cranial measurements

Pearson & Woo, 800 Investigation of the morphometric characters of the 1935 bones of the cranium

Smith, 1912 9 Evaluation of “pygmy” crania

Woo, 1931 800 Investigation of cranial asymmetry

*Mandibles only.

153

2.1.3.4 Thebes

The ancient city of Thebes lay on both banks of the Nile and covered an area of around 93 square kilometres. The modern town of Luxor, which occupies part of the site on the east bank, is 675 kilometres south of Cairo (Figures 2.1F and 2.1G).

Figure 2.1F: Map of ancient Egypt showing the location of Thebes (Adapted by JJL from: http://oi.uchicago.edu/research/lab/map/maps/egypt.html, with additions/amendments. Accessed February

2013).

154

Figure 2.1G: Plan of Thebes. (Source: Redford, 2001: 385).

From the end of the Old Kingdom, Thebes was the main city of the fourth Upper Egyptian nome

(administrative province), and later became the capital of Egypt during parts of the Eleventh

Dynasty and during the New Kingdom (Redford, 2001: 381–388).

The Theban necropolis occupies a nine square kilometre area of the western bank of the Nile and contains the royal tombs and mortuary temples, as well as the houses of priests, soldiers, craftsmen, and labourers (Figure 2.1H). The Theban area, which includes the temples of Luxor and on the east bank, and the Valley of the Kings and the Valley of the Queens on the west bank, was made a UNESCO World Heritage site in 1979 (Redford, 2001: 381–388).

155

Figure 2.1H: Plan of the central area of the Theban necropolis, western Thebes (Source: Redford, 2001:

382).

Geographically and archaeologically, the Theban necropolis may be divided into four sections, which includes the cultivable land between the Nile and desert edge, the desert edge itself containing a number of important mortuary temples and shrines of New Kingdom rulers and commoners, small limestone hills containing hundreds of tombs of priests and noblemen, predominantly from the New Kingdom, and outlying wadis and desert plateaux, containing the remains of Palaeolithic work stations and New Kingdom workmen’s huts (Redford, 2001:

381–388).

The two named individuals from Thebes included in the study sample, Yi-Neferti (alt.

Iy-Nofret; Iineferty; Aaneferta) and Khonsu (alt. Khons; Chonsu), were the wife and son, respectively, of Sennedjem (alt. Sen-Nudem), a prominent artisan of the Ramesside Period

(Nineteenth and early Twentieth Dynasties, c. 1295–1069 BC) (Farid & Farid, 2001; Personal

Communication: Herschensohn, Olivia; 2014; Snape, 2011; 235). Sennedjem was the owner of

Tomb 1 (TT1) in the necropolis of Deir el-Medina, the village in western Thebes that housed the people who worked on the royal tombs in the Valley of the Kings (Farid & Farid, 2001; Porter &

156

Moss, 1927: 53–54). The tomb is located in the Western Cemetery of Deir el-Medina on the slopes of the Qurnet Murai hill (Figure 2.1I), and was discovered by Gaston Maspero (1846–

1916) in 1886. Despite this discovery, the contribution of Maspero to the unfolding of the history of Deir el-Medina is often dismissed because it belonged to a period of “unscientific” excavation at the site (Cooney, 2012; Reeves, 2000: 69–71). Several authors therefore begin their account of the investigation of Deir el-Medina in the period 1905–1909 when “serious scientific excavation” was undertaken by the Italian archaeologist Ernesto Schiarparelli (1856–1928)

(Lesko, 1994: 7; Meskell, 1998; Reeves & Wilkinson, 1996: 22). In 1917 the concession was formally given to the French Institute of Oriental Archaeology (IFAO), and under the supervision of Bernard Bruyère (1878–1971), the site was completely cleared between 1922 and 1951

(Lesko, 1994; Meskell, 1998; Reeves & Wilkinson, 1996: 22).

Figure 2.1I: The site of Deir el-Medina. 1, village; 2, Western Cemetery; 3, Eastern Cemetery; 4, votive chapel; 5, Ramesside cemetery; 6, Ptolemaic Hathor temple; 7, Hathor chapel of Sety I; 8, great pit; 9, tombs of the Saite princesses. (Source: Toivari-Viitala, 2011).

157

The tomb of Sennedjem is considered vastly important as it is the only ‘intact’ and unrobbed tomb to be discovered from the Ramesside Period (Cooney, 2012; Snape, 2011:

235). Inscriptions within the tomb describe Sennedjem as a “servant in the place of truth", which was quite a common title for the workers and the artisans who built and decorated the royal tombs in the Valley of the Kings (Farid & Farid, 2001). These workers are an important group outside the ruling elite of the New Kingdom. Their skills in building the royal tombs afforded them the benefit of well-constructed, well-equipped, and well-decorated tombs for themselves

(Farid & Farid, 2001; Snape, 2011). Unlike the private tombs of prominent government officials, where many of the wall decorations depict their daily activities and duties, the decorations of the artisans' tombs of Deir el-Medina are devoted largely to religious and mythological themes, as shown in Figure 2.1J (Farid & Farid, 2001).

Figure 2.1J: Scenes from the tomb of Sennedjem. The North wall (Source: Farid & Farid, 2001).

A total of 20 bodies were found inside TT1; of these, nine bodies, including those of

Yi-Neferti and Khonsu, were enclosed in coffins (Cooney, 2012; Gillet, 1898: 123–124; 128–

129). These coffins are presently curated at the Metropolitan Museum of Art in New York

(Cooney, 2012). The coffin of Yi-Neferti is described as being of “peculiar” construction, consisting of: “… a lower casket into which the inner cover fits, thus replacing the cartonnage

158

and obviating the necessity of a second complete case. The true cover originally fitted over all”

(Gillet, 1898: 128). The true case shows: “…a face in dark brown color a lotus bloom on forehead and a lotus-petal chaplet”, as well as “her son Chonsu” who is shown reading from a papyrus written in hieratic script (Gillet, 1898: 128). Inside the casket was a reed mat which originally shrouded the mummy. Resting on the mat was the pasteboard mask, and below were the remains of the mummy (Gillet, 1898: 129). The outer mummy casket of Khonsu (Chonsu) is described as showing the owner’s face in a “…brownish colour; brows, black; ears, white; beard, black, long and curved at lower end; hair, in green and yellow stripes” (Gillet, 1898: 123).

The skeletal remains of Yi-Neferti and Khonsu are curated at the Peabody Museum of

Archaeology and Ethnology, Harvard University, Boston. They were acquired in 1933 as part of an exchange with the Metropolitan Museum of Art (Personal Communication: Herschensohn,

Olivia; 2014). Though originally mummified, the remains were unwrapped and de-fleshed prior to their transfer to the Peabody Museum. Despite the importance of these two named individuals, an examination of the literature suggests they have not been well-studied by previous researchers. They were accessed by Michelle Raxter for her studies on sex estimation

(Raxter, 2007) and reconstruction of stature (Raxter et al, 2008), and are the subject of ongoing research being conducted by Professor Lisa Sabbahy at the American University in Cairo

(Personal Communication: Sabbahy, Lisa; 2012).

2.2 Sex estimation

2.2.1 Morphological sex estimation

Ideally, standards for sex estimation should be developed and tested using skeletal samples of known (documented) sex. Unfortunately, large series of ancient Egyptian skeletons of this type are extremely rare and/or currently unavailable for study; therefore, sex must be established by other means. In line with the discussion and evaluation of morphological sex indicators given in

Section 1.5, the sex of each individual in the study sample was estimated using three standard methods (Buikstra & Ubelaker, 1994: 16–19; Ferembach et al, 1980; Phenice, 1969):

1. Assessment of the Phenice characteristics

2. Morphological assessment of the bony pelvis (ossa coxae and sacrum)

3. Morphological assessment of the skull.

159

This approach was taken because it represents the standard methodology for estimating the sex of unknown skeletal remains recovered from forensic or archaeological contexts (Buikstra &

Ubelaker, 1994: 16–19; White & Folkens, 2000; 362–369).

2.2.1.1 Issues and considerations

In the context of the present research, the standard methodological approach to estimating morphological sex is associated with a number of important issues:

 Very few ancient Egyptian skeletons of documented sex are available for study

 Only individuals with sufficient pelvic and/or cranial material present to permit

morphological sex estimation could be included in the study sample

 Many of the standard morphological sex indicators demonstrate inter-population

variation

 The standard morphological sex estimation methods are not specific to the ancient

Egyptians; no population-specific methods currently exist

 Morphological sex estimation of unknown skeletal remains is associated with a level of

error.

Only two individuals in the study sample, Yi-Neferti and her son Khonsu, excavated from New

Kingdom Thebes, are of known sex. As a result, the creation and testing of metric sex estimation equations within this study is based predominantly on an estimated sex sample.

Although there are a number of important limitations associated with this methodology, discussed in detail in Chapter 4, such an approach is not unprecedented. In previous studies the use of standard morphological indicators to estimate sex has allowed the creation of metric sex estimation standards for numerous archaeological populations, including prehistoric

Scottish (MacLaughlin & Bruce, 1985), prehistoric Central Californian (Dittrick & Suchey, 1986),

New Zealand Polynesian (Murphy, 2005), medieval Croatian (Šlaus & Tomičić, 2005), Pre- dynastic/Old Kingdom Egyptian (Raxter, 2007), New Kingdom Egyptian (Dabbs, 2010), and ancient (Jomon period) Japanese (Özer & Katayama, 2008).

As described in Section 2.1.2, skeletons were preferentially selected for inclusion in the study based on level of completeness. However, even relatively complete skeletons had to be rejected if the pelvic and/or cranial material was insufficient to allow an unambiguous estimate of sex. Skeletons were further rejected if the individual morphological indicators of the bony

160

pelvis only, the skull/cranium only, or both the bony pelvis and skull/cranium provided ambiguous results, even after weighting and ranking of traits and methods according to accuracy and reliability (see Section 2.2.1.5 below). Rejection of skeletons on this basis was not found to be necessary when sufficient pelvic material was available for analysis; however, it was a common occurrence when examining the Gizeh “E” cranial series. As a result, the majority of the crania from this sample included in the study demonstrated the most extreme male and female morphology, to the exclusion of individuals exhibiting intermediate morphology or a combination of both male and female morphological features.

In instances such as this where sex must be estimated from cranial material only, seriation is recommended (White & Folkens, 2000: 338). This involves arranging as many skulls or crania as possible in a series to compare features within a single biological population. Such a process has the advantage of allowing the assessment of sex of each individual to be made relative to other individuals in the series (White & Folkens, 2000: 338, 341). Other advantages of seriation include a reduction in error due to time-shift effects (caused by breaks in analysis at the end of each working day), and constant monitoring of assessments. However, other authors argue that seriation is problematic, notably because when dealing with a large sample, it would be very difficult to seriate the crania in a replicable manner. Furthermore, if there are multiple cranial indicators to consider, each must be examined in separate independent seriations, and it may not be possible to combine these into one seriation (Konigsberg & Hens, 1998).

The seriation approach was used during the period of data collection at the NHM,

Vienna (where skulls are stored separately to post-cranial remains) to identify two individuals exhibiting unambiguous but not extreme male or female morphological features; these individuals were then subject to further analysis and measurement. Unfortunately, at the time of analysis, the research guidelines in place at the Duckworth Laboratory, University of

Cambridge, precluded formal seriation because only one catalogue number could be examined at a time (see Section 2.1.2). As such, the Gizeh “E” series only really represents a selected population sample of individuals demonstrating hyper male and hyper female morphological features, which is an important issue. Chapter 4 provides a discussion of the implications of constructing a study sample that consists only of the most sexually dimorphic or morphologically unambiguous individuals.

161

Section 1.5 in Chapter 1 describes and evaluates population differences in skeletal morphology and the impact that these differences have on the accuracy and reliability of morphological sex indicators. According to Williams and Rogers (2006), size of the mastoid process is a “high-quality” sex estimation trait, as defined by acceptable levels of intra-observer error and accuracy of ≤10% and ≥80%, respectively. However, some authors have suggested that little, if any, consideration should be given to this feature when examining ancient Egyptian skeletal remains as a result of observations that mastoids tend to be relatively large and inflated in this population compared to others (Zakrzewski, 2003; 2007). This observation has been noted even among more recent populations from Sinai (Herschkovitz et al, 1990). Despite this, several researchers sampling ancient Egyptian skeletal remains used size of the mastoid process as one of their sex estimation traits without any mention of females resembling males in this respect (Erfan et al, 2009; Howells, 1973: 14; Masali & Chiarelli, 1972; Raxter et al, 2008).

In fact, describing how sex was assigned by a previous researcher who examined crania from the ‘Gizeh “E” series’, Howells (1973: 14) writes that “A typical male had weight, & general massive appearance; large mastoids…” and “A typical female was light in weight, small mastoids…” (Howells, 1973: 14). William Howells measured thousands of crania from around the world; therefore, it is unlikely that he would have been unaware of any major population variation in this feature. However, given this division of opinion among expert morphometricians, it is clear that especial care should be taken with morphological assessment of this indicator in ancient Egyptian populations.

This confusion surrounding the use of particular sex estimation features is not unexpected given that morphological sex estimation standards tailored specifically to ancient

Egyptian skeletal remains do not currently exist. As a result, the approach to morphological sex estimation in the present study was necessarily based on several key sources of evidence: recommendations for sex estimation given by the Workshop of European Anthropologists

(Ferembach et al, 1980) and within the highly cited Standards for Data Collection from Human

Remains (Buikstra & Ubelaker, 1994; 1,932 citations on Google Scholar); rates of accuracy, precision, and reliability for specific traits taken from the published literature (discussed in

Section 1.5 and presented in Section 2.2.1.5 below); methods and recommendations for the weighting or ranking of traits (Section 2.2.1.5); and the expert opinion of academics and

162

researchers working with ancient Egyptian remains who have noted consistent differences in specific morphological features compared with other populations (Zakrzewski, 2003; 2007).

Finally, the level of error associated with assigning sex to an unknown individual based on skeletal morphology must be considered. In estimating sex an osteologist often starts at a baseline accuracy level of 50%, provided that 1) there is no prior information as to the sex of the unknown individual, or 2) the unknown individual has been sampled from a population in which the expected numbers of males and females are equal. In such instances, random guesswork will result in correct sex estimates in half of all cases. In the case of skeletal remains excavated from ancient Egypt there may be an a priori expectation of a predominance of males, because male graves and tombs tended to be more prominent and attractive to archaeologists. For some skeletal elements, for example the pubic bones, training, experience, and use of appropriate methods will allow for correct estimates in excess of 80% or even 90% (Bruzek, 2002; Ubelaker

& Volk, 2002; White & Folkens 2000: 362). To examine my own ability in estimating sex using morphological methods, an inter-observer error test was performed using skulls and ossa coxae from the Department of Anatomy at the University of Manchester. The purpose of this test was to establish the level of observer–observer agreement in sex estimation using morphological methods, with the aim of identifying systematic bias in my own ability to assign sex to skeletal remains. Such a test does not provide an actual accuracy rate as would a test of one observer using a known-sex skeletal collection; however, it is still a useful undertaking. Furthermore, to be truly meaningful, a single-observer morphological sex estimation test should be performed on a known-sex sample of ancient Egyptian skeletal remains, given population differences in sexual dimorphism, which is not possible for reasons previously discussed. Following the morphological sex estimation methods outlined in the subsequent sections, two observers (the present author and IK-O) assigned sex to 20 ossa coxae and 10 skulls. Sex estimates were made by each observer without reference to the other. Results were compared using the kappa statistic (see Section 1.5.1.1).

2.2.1.2 The Phenice characteristics

The so-called Phenice characteristics are three features of the paired pubic bones: the ventral arc; subpubic concavity; and the medial aspect of the ischiopubic ramus. Each feature was examined and assigned a sex based on its appearance. Table 2.2A and Figure 2.2A present

163

the features observed and their appearance in males and females. The form of these features was recorded on the specially constructed recording form given in Appendix 7.1.1.

Table 2.2A: The three Phenice characteristics for sexing the pubic bones (Source: Phenice, 1969).

Characteristic Description Male Female

Ventral arc Ridge of bone beginning at the superior- Absent Present medial corner on the anterior pubic surface and curving down to the ischiopubic ramus

Subpubic concavity Concavity of ischiopubic ramus starting Straight or Concave below the pubic face slightly convex

Medial aspect of Area of the medial part of the Wide and dull Narrow and ischiopubic ramus ischiopubic ramus just below the pubic sharp face

Figure 2.2A: The Phenice characteristics. A-1: ventral arc on ventral surface of female pubis. B-2: slight ridge on ventral aspect of male pubis. C-3: subpubic concavity seen from dorsal aspect of female pubis and ischiopubic ramus. D: dorsal aspect of male pubis and ischiopubic ramus. E-4: ridge on medial aspect of female ischiopubic ramus. F-5: broad medial surface of male ischiopubic ramus (Source: Phenice,

1969).

164

According to Phenice (1969), the three characteristics of the pubic bones may be ranked in descending order of reliability as follows: the ventral arc; subpubic concavity; and the medial aspect of the ischiopubic ramus. The ventral arc is considered to be an objective discrete trait that is present in females and absent in males. A slight ridge may be present on the male os pubis (B-2 in Figure 2.2A), but this should never be confused with the ventral arc if carefully observed (Phenice, 1969). The subpubic concavity is not as objective as the presence or absence of the ventral arc, as a very small number of males may demonstrate a small amount of concavity. However, even when a trace of a subpubic concavity appears in a male it is difficult to confuse it with the well developed trait in the female (Phenice, 1969). Therefore, this, like the ventral arc, is essentially an objective, discrete criterion which does not require comparison of relative amounts of development between the sexes (Phenice, 1969). Finally, the appearance of the medial aspect of the ischiopubic ramus tends to exhibit a certain degree of intergradation between males and females. As a result, heavy reliance should be placed on this criterion only in the absence of the areas of the bone where the other two traits are found

(Phenice, 1969).

In assessing and assigning sex using the Phenice characteristics in the present study all three indicators were considered where possible. In instances where there was ambiguity in the appearance of one of the traits, the estimate of sex was based on the other two. If only two traits could be visualised and they each gave a different sex estimate, the presence or absence of the ventral arc was used as the definitive criterion (if possible), as this trait is expected to be the least ambiguous (Phenice, 1969). According to Phenice (1969) intact pubic bones should almost always possess one trait that is definitely male or female. If the estimation of sex is based on this one definite trait, the estimate will be right in at least 96% of cases. This was demonstrated in a test using known-sex skeletal material from the Terry Collection, the collection upon which the method was formulated (Phenice, 1969).

2.2.1.3 Morphological assessment of the bony pelvis

Table 2.2B lists the features and traits of the bony pelvis that were assessed, as well as their expected appearance in males and females. Each individual sex indicator was scored after its level of masculinity and femininity in accordance with Buikstra and Ubelaker (1994: 21), where:

165

0 = Indeterminate sex: there are insufficient data available for sex estimation

1 = Female: there is little doubt that the features represent a female

2 = Probable female: the features are more likely to be female than male

3 = Ambiguous sex: sexually diagnostic features are ambiguous

4 = Probable male: the features are more likely to be male than female

5 = Male: there is little doubt that the features represent a male.

Table 2.2B: Features of the bony pelvis used for the estimation of sex (Sources: Byers, 2008; Ferembach et al, 1980).

Trait Male Female

Size Large and rugged Small and gracile

Ilium shape High and vertical Low and flat

Pelvic inlet Heart shaped Circular or elliptical

Pubic shape Narrow and rectangular Broad and square

Subpubic angle Narrow; V-shaped Wide; U-Shaped

Obturator foramen Large and ovoid; rounded rims Small and triangular; sharp rims

Greater sciatic notch Narrow Wide

Preauricular sulcus Absent or small Well developed; deep

Shape of sacrum Long and narrow; sharply Short and broad; little curvature curved

Scoring of the greater sciatic notch was aided by images and descriptions proved by Buikstra and Ubelaker (1994: 18; Figure 2.2B).

Figure 2.2B: Sex differences in the greater sciatic notch. Figure 2 from Buikstra & Ubelaker, 1994: 18.

Drawing by P. Walker.

166

A description of each trait given in Table 2.2B was recorded on specially constructed recording forms (see Appendix 7.1.2) for each individual. Each trait was scored independently without reference to other features. An overall sex estimate based on morphological features of the bony pelvis was assigned by consideration of the weighting/ranking of traits, as well as rates of accuracy, precision, and reliability, as described in Section 2.2.1.5 below.

2.2.1.4 Morphological assessment of the skull

The morphological features of the skull used to estimate sex are listed in Table 2.2C. This table additionally provides a description of the male and female appearance of each feature. As with pelvic morphological features, each individual cranial feature was scored after its level of masculinity and femininity in accordance with Buikstra and Ubelaker (1994: 21). Scoring of each trait was done independently, without reference to other features.

Table 2.2C: Features of the skull used for the estimation of sex (Sources: Buikstra & Ubelaker, 1994;

Byers, 2008; Ferembach et al, 1980; Rogers, 2005).

Trait Male Female

General size Large Small

Architecture Rugged Smooth

Occipital area Marked nuchal crests and Nuchal crests and protuberance protuberance; presence of bony not marked; smooth surface with ledge or ‘hook’ no bony projections

Supraorbital ridges Medium to large (protruding) Small to medium

Glabella Pronounced Faint

Mastoid process Medium to large, broad base Small to medium, narrow base

Frontal eminences Small Large

Parietal eminences Small Large

Orbits Squared, lower, rounded margins Rounded, higher, sharp margins

Forehead Sloping, less rounded Vertical, full

Zygomatics Heavier, more laterally arched Lighter, more compressed

Palate Larger, broader, more U-shaped Smaller, parabolic

Occipital condyles Larger Smaller

167

Mandible Larger, higher symphysis, broader Small, with lower corpus and ascending ramus ramus dimensions

Mental eminence Pronounced; squared Minimal projection; pointed

Gonial angle Approx. 90o <90o

Gonial flare Pronounced Slight

Scoring of five features, the nuchal crest, mastoid process, supra-orbital margin, supraorbital region/glabella, and the mental eminence, was aided by the images and descriptions provided by Buikstra and Ubelaker (1994: 19–20; Figure 2.2C). In accordance with the recommendations provided in Standards for Data Collection from Human Skeletal Remains, each cranium or mandible was held next to the appropriate diagram and in the same orientation to allow direct comparison (Buikstra & Ubelaker, 1994: 19).

Figure 2.2C: Scoring system for sexually dimorphic cranial features. Figure 4 from Buikstra & Ubelaker (1994: 20).

168

A description of each trait given in Table 2.2C was recorded on specially constructed recording forms (see Appendix 7.1.3) for each individual. An overall sex estimate based on morphological features of the skull or cranium was assigned by the ranking of traits and consideration of rates of accuracy, precision, and reliability, and other appropriate factors, as described in Section

2.2.1.5 below.

2.2.1.5 Weighting and ranking of morphological methods and traits

In considering how overall sex estimates for each individual in the study sample would be assigned, two questions were addressed. 1) How will the individual traits or indicators associated with each of the three standard morphological methods (Phenice characteristics, bony pelvis, skull/cranium) be weighted or ranked to produce an overall sex estimate? In other words, which traits will be considered of greatest importance when producing a sex estimate using each of the three methods?, and 2) In instances where the three standard morphological methods produce different overall estimates of sex, which will be given the greatest weight in the overall assignment of sex to each individual in the study sample?

Question two is relatively easy to answer given that it is widely accepted that the ossa coxae provide more accurate and reliable estimates of sex than the skull (Bruzek & Murail,

2006: 227; Buikstra & Ubelaker, 1994: 16; Byers, 2008: 177; Ferembach et al, 1980; Klales et al, 2012; Rösing et al, 2007; Walrath et al, 2004, amongst others). This is because the shape and size of the bony pelvis is directly related to biological function. Females give birth; therefore, the pelvis must be sufficiently wide and voluminous to allow the safe passage of a large foetal head through the birth canal (Bruzek & Murail, 2006: 227). As such, in instances where an estimate of sex using pelvic morphology (or the Phenice characteristics) differed from an estimate of sex using cranial morphology within the same skeleton, the overall assignment of sex was based on the pelvic estimate.

Question one is more problematic because it involves assessment of the relative importance or contribution of the individual pelvic or cranial sex indicators to the overall sex estimate. One method that has been proposed to achieve this involves the use of weighted averages (Acsádi & Nemeskéri, 1970; Ferembach et al, 1980). Numerical values are assigned to each of the diagnostic sex indicators according to a five-point scale, which ranges from ‒2

(hyperfeminine) to +2 (hypermasculine), with zero indicating ambiguous or “neutral” sex.

169

Individual features are multiplied by one, two, or three, based on their significance and importance for the estimation of sex (Walrath et al, 2004). A weight of three indicates the greatest importance and a weight of one the least. An ‘index of sexualisation (IS)’ is then calculated using the following formula (Walrath et al, 2004):

∑ (score x weight) IS = ______∑ weight

Positive IS values indicate a male specimen, while negative IS values indicate a female specimen. If the IS score is zero or approaching zero, the sex of the specimen is regarded as indeterminate (Acsádi & Nemeskéri, 1970 cited in Walrath et al, 2004). This procedure is recommended by the Workshop of European Anthropologists (Ferembach et al, 1980) and has been used by a number of researchers working with human remains from archaeological contexts (Hincak et al, 2007; Kjellström, 2004; Maat et al, 1997; Nagy, 2008; Walrath et al,

2004).

This technique is, however, associated with several limitations. Notably, there is disagreement regarding which indicators should be considered most important for assessment of sex. As Kjellström (2004) points out, the shape of the orbit is considered to be a fairly reliable trait (weight two) by Acsádi and Nemeskéri (1970) but less reliable (weight one) by the

Workshop of European Anthropologists (Ferembach et al, 1980). This point is particularly relevant to the present research, given that the size of the mastoid process, which has been described as unusually large and inflated in ancient Egyptian skeletal remains and therefore not a useful indicator of sex (Zakrzewski, 2003; 2007), is weighted as three in the Workshop of

European Anthropologists recommendations (Ferembach et al, 1980). In addition, the external occipital protuberance, which has been assigned a weight of two, was shown in an independent test to exhibit population differences; it was additionally not found to be a definitive criterion for sex estimation (Gülekon & Turgut, 2003). To overcome this issue, some researchers have adapted the original weighting method to their own purposes (Kjellström, 2004); however, this reduces the objectivity and consistency of the technique among different researchers, which is one of its primary aims. Indeed, Walrath and colleagues (2004) noted statistically significant differences in the IS scores of different observers, suggesting that inter-observer comparisons of results are highly subjective and should, therefore, be limited (Walrath et al, 2004).

170

As a result of these issues and considerations, weighted means and the calculation of a sexualisation index were not used to assign sex in the present study. Instead, individual sex indicators of the bony pelvis and skull were ranked in order of importance by considering rates of accuracy, precision, and reliability presented in studies and independent tests in the published literature. In this context, ‘accuracy’ refers to the percentage of skeletons whose sex was correctly assigned in the sample upon which the method was created; ‘precision’ relates to the level of intra-observer error (the proportion of cases in which two separate rounds of assessment conflicted); and ‘reliability’ refers to the level of accuracy when the method is tested on an independent population (Bruzek & Murail, 2006: 229; Williams & Rogers, 2006).

Ranking of the three Phenice characteristics was performed by the author himself in the original study (see Section 2.2.1.2), as shown in Table 2.2D. The ranking is supported by precision and reliability data from the published literature. The accuracy of the method when applied to independent populations is, for the most part, supported by the reliability tests, although the level of experience of the observer and the characteristics of the population from which the test sample was derived have an impact on the reliability of the technique.

Table 2.2D: Accuracy, precision, and reliability of the Phenice characteristics in the estimation of sex.

Characteristic Rank Accuracy Precision* Reliability – Reliability – individual traits all traits

Ventral arc 1 K = 0.645 98% in males; 91% in (substantial females2 95%1 agreement) 1 96%3 88%4

Subpubic 2 K = 0.579 91% in males; 83% in 81–84%5 concavity 96% (moderate females2 46–72% in agreement) 1 males; Medial aspect of 3 K = 0.694 93% in males; 81% in 69–94% in ischiopubic ramus (substantial females2 females6

agreement)1

Accuracy and reliability rates are weighted (by sample size) or given for males and females separately if a weighted mean could not be calculated from the available data. *Not reported in original study. 1. Klales et al, 2012. Intra- observer error was assessed using weighted Kappa scores. The reliability rate given is cross-validated and for experienced observers. 2. Kelley, 1978. Reliability rates were calculated by dividing the number of observed male or female characteristics by the total number of males or females in the test sample. For example, the number of females

171

exhibiting a ventral arc (156) divided by the total number of females (171) = 156/171 = 91.2%. 3. Sutherland & Suchey,

1991. 4. Ubelaker & Volk, 2002. Like the original study, this test was conducted using the Terry Collection. 5. Lovell,

1989. Reliability rate differed depending on experience of the observer. 6. MacLaughlin & Bruce, 1990. Reliability rate differed depending on test sample; rate was lowest in modern Scottish sample and highest in 17th–18th century English sample.

Based on the data presented in Table 2.2D, as well as the recommendations given by Phenice

(1969), the ventral arc was considered the most important of the three sex indicators. Therefore, in instances where the estimate of sex using the Phenice characteristics method was ambiguous, sex was assigned based on the ventral arc estimate.

The next questions to address relate to how individual indicators of the bony pelvis should be ranked, and how sex will be assigned in instances where the Phenice method and the method using morphology of the bony pelvis (ossa coxae and sacrum) produce different sex estimates. Again, the accuracy, precision, and reliability of individual pelvic sex indicators, given in Table 2.2E, were considered.

Table 2.2E: Accuracy, precision, and reliability of the pelvic morphology method in the estimation of sex (Source: Rogers & Saunders, 1994; with additions).

Characteristic Rank† Accuracy Precision Reliability

Obturator foramen 2 93.8% 3.2% 85%1

True pelvis size and shape* 3 85.8% 0% -

Shape of sacrum 4 94.1% 6.4% -

Subpubic angle 5 83.8% 3.2% 99–100%2

91%3

Pubic shape 6 86.2% 4.8% 80–81% in males; 88–96% in females4

Preauricular sulcus 10 91.6% 11.3% 90–93%2

89–100% in males; 70–76% in females5

Greater sciatic notch 12 85.7% 6.5% 71–79%2

67–80% in males; 67–69% in females5

172

33–91% in males; 84–96% in females4

Illium shape 12 83.7% 6.4% -

Pelvic inlet 15 80.0% 9.7% -

Accuracy and reliability rates are weighted (by sample size) or given for males and females separately if a weighted † mean could not be calculated from the available data. Ranking scores are missing because the traits they represent were not used for sex estimation in the present study. *Considered as part of the general ‘size’ criterion listed in Table 2.2B. 1. Bierry et al, 2010. Reliability rate obtained using Fourier analysis. 2. Đurić et al, 2005. A blind test was conducted using a sample consisting of male individuals only. 3. Karakas et al, 2013. 4. Patriquin et al, 2003. Reliability rates differed in White and Black populations. 5. Bruzek, 2002. Reliability was tested in two different samples: 20th century French and 19th–20th century Portuguese.

Each trait was ranked for both accuracy and precision; the two ranking scores were then summed to give an overall rank. The relative importance of each trait was subsequently considered when assigning sex estimates to each individual in the study sample using the pelvic morphology method. The overall sex estimate was therefore always based on the highest ranked traits. This was the case even when a greater number of low-ranked traits gave the opposite sex estimate. For example, suppose that the shape of the obturator foramen and the subpubic angle indicated that the individual was female; the overall assignment of sex to the individual was therefore female, regardless of whether the greater sciatic notch, shape of the illium, and size and shape of the pelvic inlet appeared to indicate a male individual. This of course required a certain degree of subjectivity and judgement; however, such is the case with a number of different aspects of osteology. Furthermore, the result of the self-test of the accuracy assigning morphological sex indicated an almost perfect level of agreement between two observers in assigning sex estimates based on pelvic morphology (kappa statistic, K=0.903; see Section 3.2.2.1).

Table 2.2E demonstrates that in general, the accuracy and reliability of individual sex indicators of the bony pelvis are not as high as for the Phenice characteristics. As such, in instances where the estimate of sex using morphological indicators of the bony pelvis conflicts with the estimate of sex using the Phenice characteristics, the latter estimate was used to assign sex. This occurred infrequently and predominantly in instances where only low-ranked indicators of the bony pelvis were assessable.

The final point to consider is how to rank the importance of individual sex indicators of the cranium to allow an overall assignment of sex using the cranial morphology method. An

173

examination of the literature demonstrated that determination of the accuracy, precision and

rank of a large number of cranial sex indicators has been performed in three separate studies

using samples from three distinct skeletal populations. These were 19th century Canadian

(Rogers, 2005), modern Brazilian (Suazo et al, 2009), and 20th century European (Williams &

Rogers, 2006). The data for these three populations are given in Table 2.2F.

Table 2.2F: Accuracy, precision, and rank of individual sex indicators used in the cranial morphology method in three different skeletal populations.

Canadian1 Brazilian2 European3

Trait Rank Acc. Prec. Rank Acc. Prec. Rank Acc. Prec.

Zygomatics* 1 70.3% 6.1% 7† 79.7%† 4.2%† 4† M: 100%+ 12.0%†

F: 64%

Supraorbital 1 60.9% 2.0% 10 78.8% 6.6% 1 M: 84% 6.0% ridge F: 88%

Mental 2 56.3% 4.1% 15 75.4% 8.3% 4 M: 72% 4.0% eminence F: 72%

Occipital area 2 53.3% 2.0% 4 80.9% 3.3% 5 M: 80% 2.0%

F: 32%

Mastoid process 3 44.7% 0% 1 84.8% 1.6% 1 M: 88% 8.0%

F: 92%

Mandible** 4 51.1% 4.1% 3 81.4% 3.3% 5 M: 60% 18.0%

F: 44%

Forehead 5 44.5% 4.1% 6 80.1% 10.0% 7 NR 16.0%

General size & 5 38.0% 0% 5 80.6% 3.3% 2 M: 80% 10.0% architecture‡ F: 88%

Palate 6 36.6% 4.1% 16 72.9% 8.3% 7 M: 76% 10.0%

F: 56%

Orbits 7 43.6% 12.2% 14 75.5% 13.3% 6ǁ M:72%¶ 11.0%ǁ

F: 76%

Frontal 7 31.9% 4.1% 13 75.5% 10.0% 7 M: 76% 8.0% eminences F: 24%

Parietal 7 28.9% 2.0% 11 80.7% 6.6% 9 NR 20.0% eminences

174

Canadian1 Brazilian2 European3

Trait Rank Acc. Prec. Rank Acc. Prec. Rank Acc. Prec.

Occipital 7 14.0% 0% 8 79.3% 8.3% 8 NR 12.5% condyles

Accuracy and reliability rates are weighted (by sample size) or given for males and females separately if a weighted mean could not be calculated from the available data. Acc., accuracy; prec., precision. *Refers to zygomatic extension and size. **Refers to mandibular symphysis and ramus size. ‡Considered separately in the present study. †Size and shape of the zygomatic bone and the zygomatic arches/extension were considered separately; therefore, the two ranks, accuracy rates, and intra-observer error rates were summed and averaged. ǁOrbital margins and shape/position were considered separately; therefore, the two ranks, accuracy rates, and intra-observer error rates were summed and averaged. +Zygomatic extension only. ¶Orbital margins only. NR, not reported. 1. Rogers, 2005. N.B. Accuracy rates were lower than might be expected for many of the traits because they produced a high number of indeterminate estimates. 2. Suazo et al, 2009. 3. Williams & Rogers, 2006; accuracy rates represent minimum accuracy, the lower of two rounds of tests.

To produce an overall ranked list of cranial traits based on all three studies, the individual ranks

were summed and divided by three to produce an average rank for each trait, as shown in Table

2.2G. This table additionally shows the accuracy, precision, and rank of a number of additional

cranial sex indicators, which were used for sex estimation in the present study but were not

assessed by two of the studies described above. Reliability data from additional independent

studies are also presented.

Table 2.2G: Overall rank and accuracy, precision, and reliability of individual cranial sex indicators used to

estimate sex in the present study.

Trait Mean New rank Accuracy Precision Reliability rank range range

Mastoid process 1.7 1 45–92% 0–8% 52%1

Occipital area 3.7 2 53–81% 2–3% 38–42%1†

Mandible 4 3 51–81% 3–18% 71%1

Zygomatics 4 3 70–82% 4–12% 72%3

Supraorbital ridge 4 3 61–86% 2–7% 46%1

General size & 4 3 38–88% 0–10% - architecture‡

Gonial angle* 5 4 80–86% 6% -

Forehead 6 5 45–80% 4–16% 85%5

Mental eminence 7 6 56–75% 4–8% 45%1

175

Trait Mean New rank Accuracy Precision Reliability rank range range

Occipital condyles 7.7 7 14–79% 0–13% -

Gonial flare* 8 8 58–72% 8% 69%4

Orbits 9 9 44–76% 11–13% 29%1¶

71%2¶

Frontal eminences 9 9 32–76% 4–8% 47%1

Parietal eminences 9 9 29–81% 2–20% -

Palate 9.7 10 37–74% 4–10% 76%6

Accuracy and reliability rates are weighted (by sample size) or given for males and females separately if a weighted mean could not be calculated from the available data. *Based on study of Williams and Rogers (2006) only. ‡Considered separately in the present study. †Based on separate assessment of the occipital protuberance and nuchal crest. ¶Based on assessment of the supraorbital margin. 1. Đurić et al, 2005. 2. Graw et al, 1999. 3. Shveta et al, 2010. 4. Kemkes- Grottenthaler et al, 2002. 5. Inoue, 1990; reliability rate based on Fourier analysis. 6. Suazo et al, 2008.

Table 2.2G demonstrates that when the results of three studies examining the accuracy and precision of cranial sex indicators in three distinct populations are combined the mastoid process is ranked as the most important trait. However, this does not take into account the observations of other researchers who noted unusually large and inflated mastoids in ancient

Egyptian cranial remains (Hershkovitz et al, 1990; Zakrzewski, 2003; 2007). As such, in the present study, little emphasis was placed on the estimate of sex using the mastoid process.

Instead, in instances where there was a high level of disagreement between the individual cranial indicators, the overall assignment of sex was based on visual assessment of traits in ranking positions two to five in the above table. For example, if the morphology of the zygomatics, mandible, and supraorbital ridges indicated a male individual, the overall sex assignment would be male regardless of whether the orbits, frontal eminences, and palate demonstrated female morphology.

Considering all the evidence presented above, the method by which an overall estimate of sex was assigned in the present study may be summarised as follows:

 The pelvic estimate of sex was used to assign sex when there was disagreement

between the pelvic and cranial estimates

 The ventral arc estimate of sex was used to assign sex when there was disagreement

with the other Phenice characteristics and/or indicators of the ossa coxae and sacrum

176

 The estimate of sex using the highest ranked indicators of the bony pelvis was used to

assign sex when there was disagreement with low-ranked traits of the bony pelvis

 The estimate of sex using the highest ranked indicators of the cranium or skull was

used to assign sex when there was disagreement with low-ranked traits of the cranium

or skull

 Individuals assessed as being of ambiguous sex as a result of inconclusive

morphological evidence were not included in the study sample.

2.2.2 Metric sex estimation

2.2.2.1 Selection of metric methods and skeletal dimensions

One of the key aims of this research was to test the accuracy and precision of the following groups of metric sex estimation methods when applied to ancient Egyptian skeletal remains:

 Methods created using modern population (c. 19th–20th century) reference samples of

known sex (the “Modern Methods”)

 Methods created using ancient Egyptian population samples that are different and/or

dissimilar to the present study sample (the “Population-Specific Methods”)

 Methods created using living Egyptians (the “Living Egyptian Methods”).

The term ‘accuracy’ refers to the extent to which a measured or calculated value corresponds to the real value, while the term ‘precision’ refers to the observed variability in repeated measurements taken on the same individual (Pederson & Gore, 1996: 77–96). However, given that the ancient Egyptian study sample is of estimated sex, accuracy in this context actually refers to consistency with the morphological sex estimate.

A total of 12 “Modern” methods representing the first category above were selected for inclusion and testing in the study. These methods were selected because they met one or more of a number of predefined criteria. These were that the method:

 Had been cited or recommended in osteological textbooks, handbooks, or standards for

data collection from human skeletal remains

 Had received a moderate (≥30 in the case of papers published in 1998 or later) or high

(≥50 in the case of papers published in 1997 or earlier) number of citations on Google

Scholar

177

 Was created using a modern population sample of known (documented) sex (see

Section 1.5.2.2)

 Included standard skeletal dimensions OR skeletal dimensions that were novel but

provided the opportunity to estimate sex even in highly fragmented remains

 Had demonstrated an accuracy of greater than or equal to 80%, both in total and in

males and females separately, when tested on the original study sample. In forensic

contexts, 80% is the cut-off point at which methods of metric sex estimation are

generally considered useful (Rogers, 1999).

Table 2.2H provides a summary of the 12 “Modern Methods” selected for testing in the study, including accuracy rates and precision (as indicated by the level of intra-observer error). In addition to the criteria given above, this group of methods was further selected because collectively it included a wide range of skeletal elements, as well as methods that only require a single measurement, which may prove to be highly valuable in instances where skeletal remains are fragmented or incomplete.

Table 2.2H: Summary of “Modern” metric methods tested, including original study populations and published accuracy rates.

No. Bone/dimension Study Original Published Precision, population accuracy, % mean %

1 Cranium Giles & Elliot, 20th C, White 82–86* NR 1963 & Black, American

2 C2 Wescott, 2000 20th C, White M: 80–86*† 1.5 & Black, F: 80–85*† American

3 Femoral head Krogman & White & 90 NR diameter İşcan, 1986 Black, American

4 Femoral neck Seidemann et 19th & 20th C, 93† NR diameter al, 1998 White & Black, American

178

5 Femoral shaft İşcan & Miller- 19th/20th C, 84 NR circumference Shaivitz, White & 1984a Black American

6 Tibia (univariate) İşcan & Miller- 20th C, White NR Shaivitz, & Black, White M:75–83*; F: 80–92* 1984b American

Black M: 76–83*; F: 80–90*

Tibia ------(multivariate) Xxxxxxxxxxxxxxx n

White M: 75–88*; F: 85–87*

Black M: 83–88*; F: 85–95*

7 Humeral head Spradley & 20th C, White NR diameter Jantz, 2011 & Black, American White 83.0†

Black 86.0†

8 Humerus, radius Holman & 20th C, White M: 80–86*; F: 80–86* NR and ulna Bennett, 1991 & Black, American

9 Radial head Berrizbeitia, 20th C, White ------NR diam. 1989 & Black, 83† American Maximum 82† Minimum

10 MC1 Scheuer & White, British 94 ‘Not Elkington, significant’ 1993

11 MT1 Robling & 20th C, White M: 91†; F: 92† 1.0 Ubelaker, & Black, 1997 American

179

12 Multiple bones Stewart, 1979: NR 93–99* NR 123

Accuracy and reliability rates are weighted (by sample size) or given for males and females separately if a weighted mean could not be calculated from the available data. C2, second cervical vertebra; diam., diameter; MC1, metacarpal 1; MT1, metatarsal 1; C, century; NR, not reported; M, male; F, female. *Depending on function used. †Cross-validated accuracy. ‘Black’ denotes equations developed in populations of African ancestry; ‘White’ denotes equations developed in populations of European ancestry.

To date, only two studies have presented metric sex estimation methods that are specific to the ancient Egyptians. These methods, summarised in Table 2.2I, were tested on the study sample to establish the level of accuracy of “Population-Specific Methods” when applied to a different and/or dissimilar sample from the same general population.

Table 2.2I: Summary of “Population-Specific” methods tested, including original study populations and published accuracy rates.

No. Bone/dimension Study Original Published Precision, % population accuracy, %

1 Long bones Raxter, 2007 Primarily Pre- 89.0 for all NR dynastic three FHD, CNF, Period and dimensions HHD Old Kingdom, Egyptian

2 Scapula Dabbs, 2010 New 84–88*† NR Kingdom, Egyptian

Accuracy and reliability rates are weighted (by sample size) or given for males and females separately if a weighted mean could not be calculated from the available data. FHD, femoral head diameter; CNF, circumference of tibia at nutrient foramen; HHD, humeral head diameter. *Depending on the function used. †Cross-validated accuracy. NR, not reported.

A number of studies have been published which present sex estimation methods based on and/or created for use in the living Egyptian population (see Section 1.5). Of the available studies, two were selected for testing on the current study sample. Both studies present sectioning points for length of the first metacarpal; they were primarily selected because they allow a direct comparison of accuracy with one of the “Modern” methods (created by Scheuer &

Elkington, 1993). The two “Living Egyptian” methods are summarised in Table 2.2J.

180

Table 2.2J: Summary of “Living Egyptian” methods tested, including original study populations and published accuracy rates.

No. Bone/dimension Study Original Published Precision, % population accuracy, %

1 MC1 Eshak et al, Adult living M: 67; F: 80 NR 2011 Egyptian

2 MC1 El Morsi & Al Adult living M: 72; F: 71 ‘Not statistically Hawary, 2013 Egyptian significant’ (data not given)

Accuracy and reliability rates are weighted (by sample size) or given for males and females separately if a weighted mean could not be calculated from the available data. M, male; F, female; MC1, metacarpal 1; NR, not reported.

Thus, a total of 16 metric methods – 12 “Modern”, two “Population-Specific”, and two “Living

Egyptian” – were included for testing in the study. The inclusion of these groups of methods created using such different reference samples allowed evaluation of a number of research questions:

 The applicability of methods created using modern population samples to

archaeological populations

 The applicability of population-specific methods to different or dissimilar samples from

the same population

 The applicability of living Egyptian methods to Egyptians who lived in a highly distinct

and widely separated period of history.

As can be seen from Tables 2.2H, 2.2I, and 2.2J few studies evaluated the level of precision associated with the collection of metric data that the newly-created methods required.

Furthermore, only six studies reported using a cross-validation procedure to obtain the overall accuracy rate associated with the method. These are important limitations of the original studies and will be discussed further in Section 2.5.5.1.

A total of 63 skeletal dimensions were used in the present study. These dimensions were included because they are essential to the testing of the 16 metric methods summarised above. For example, to test the equations presented within the Giles and Elliot (1963) “Modern” method, a total of 11 specific dimensions of the cranium, listed and defined in Table 2.2K, are required. Similarly, testing of the equations created by Wescott (2000) required measurement of

181

eight specific dimensions of the second cervical vertebra, and so on. All the dimensions required to test the 16 metric methods included in this study are listed and defined in Table

2.2K. The acronyms used in this table were taken from the publications in which the dimensions were described and defined and may not always correspond with the standard acronyms cited in Brothwell (1981) or Buikstra and Ubelaker (1994).

Table 2.2K: Definitions of skeletal dimensions recorded for metric analysis.

Bone Dimension Definition

Glabello-occipital length Most anterior point of frontal bone in midline to most (GO) distal point on occiput in midline

Maximum width (MW) Greatest breadth across cranial vault, perpendicular to median sagittal plane (avoiding supramastoid crest)

Basion-bregma height (BB) Midpoint on anterior border of FM (basion) to intersection of coronal and sagittal sutures (bregma)

Maximum diameter bi- Max. width between lateral surfaces of zygomatic zygomatic (DB) arches measured perpendicular to median sagittal Cranium plane

(Definitions: Prosthion-nasion height Lowest point on alveolar border between central Giles & (PN) incisors (prosthion) to midpoint of nasofrontal suture Elliot, 1963) (nasion)

Basion-nasion (BN) From basion to nasion

Basion-prosthion (BP) From basion to prosthion

Nasal breadth (NB) Max. breadth of nasal aperture perpendicular to nasal height

Palate external breadth Max. breadth of palate taken outside of alveolar (PB) borders

Opisthion-forehead length Max. distance from midpoint on posterior border of (OF) FM (opisthion) to forehead in midline

182

Mastoid length (ML) Measured perpendicular to plane determined by lower borders of orbits and upper borders of EAM

Maximum sagittal length The sagittal length of the vertebra from the most (XSL) anterior point on the body to the most posterior point on the spinous process

Maximum height of dens The height from the most inferior edge of the (XDH) anterior border of the body to the most superior point on the dens

Dens sagittal diameter The maximum sagittal (antero-posterior) diameter of Second (DSD) the dens cervical Dens transverse diameter The diameter of the dens measured perpendicular vertebra (DTD) to the sagittal diameter (C2)

(Definitions: Length of vertebral The internal length of the vertebral foramen, Wescott, foramen (LVF) measured at the inferior edge of the foramen in the 2000) median plane

Maximum breadth across The maximum breadth between the most lateral superior facets (SFB) edges of the superior articular facets

Superior facet sagittal The maximum sagittal diameter of the superior diameter (SFS) articular facet

Superior facet transverse The maximum transverse diameter of the superior diameter (SFT) articular facet measured perpendicular to the sagittal diameter

Maximum diameter of The maximum diameter of the femoral head, femoral head (FHD) wherever it occurs (Krogman & İşcan, 1986)

Supero-inferior femoral The minimum diameter of the femoral neck in the neck diameter (FND) supero-inferior direction (Seidemann et al, 1998) Femur Femoral shaft Femoral circumference at midshaft (İşcan & Miller- circumference (FSC) Shaivitz, 1984a)

Maximum femoral length Measured perpendicular to a line defined by the (XFL) distal-most points of the two distal condyles (Stewart, 1979)

183

Minimum femoral The minimum diameter of the femoral shaft, transverse diameter (FTD) wherever it occurs (Stewart, 1979)

Epicondylar breadth of Width of the distal end of the femur (Stewart, 1979) femur (EBF)

Tibial length (TL) Medial malleolus to the lateral condyle

Circumference at nutrient Circumference of tibia at level of nutrient foramen foramen (CNF)

Tibia Minimum shaft Min. circumference of shaft, wherever it occurs circumference (MSC) (usually distal end) (Definitions:

İşcan & Antero-posterior diameter Antero-posterior diameter at level of nutrient Miller- (APD) foramen Shaivitz,

1984b) Transverse breadth (TB) Transverse breadth at level of nutrient foramen

Proximal epiphyseal Max. distance between the condyles (usually slightly breadth (PEB) below articular surfaces)

Distal epiphyseal breadth From medial malleolus to centre of fibular notch (DEB)

Vertical (maximum) Direct distance between the most superior and humeral head diameter inferior points on the border of the articular surface (HHD) (Buikstra & Ubelaker; 1994: 80; Spradley & Jantz, 2011) Humerus Maximum length of Most superior point on the head to the most inferior humerus (XHL) point on the trochlea (Stewart, 1979)

Maximum epicondylar Most medial point on medial epicondyle to most width of humerus (EWH) lateral point on lateral epicondyle (Stewart, 1979)

Maximum length of radius From the head to the tip of the styloid process (XRL) (Holman & Bennett, 1991) Radius

Radius SBB (RSBB) Most lateral point on styloid process to deepest point of ulnar notch (Holman & Bennett, 1991)

184

Maximum head diameter Max. diameter of radial head, wherever it occurs (MAXD) (Berrizbeitia, 1989)

Minimum head diameter Min. diameter of radial head, wherever it occurs (MIND) (Berrizbeitia, 1989)

Ulna Maximum length of ulna From top of the olecranon process to the tip of the (XUL) styloid process (Definitions:

Holman & Ulna SBB (USBB) Most medial point on head to the most lateral point Bennett, on styloid process 1991)

Interarticular length (IAL) Centre of proximal articular surface to apex of head

Medio-lateral breadth of Most medial point to most lateral point in Metacarpal base (BML) mediolateral plane 1 (MC1) Antero-posterior breadth of Measured at right angles to the above measurement

base (BAP)

Medio-lateral breadth of Measured between the anterior tubercles (Definitions: head (HML) Scheuer & Elkington, Antero-posterior breadth of Max. width of articular surfaces of head at right 1993) head (HAP) angles to above measurement

Maximum midshaft Max. diameter at midpoint between most proximal diameter (MS) and distal points

Length (L) Most distal point on head to most proximal point on lateral edge of proximal articular surface

Metatarsal Supero-inferior head height Max. height of head perpendicular to line between 1 (MT1) (SIH) most plantar points on crests of medial trochlear surface (Definitions: Robling & Medio-lateral head width Max. width of head perpendicular to line between Ubelaker, (MLH) tubercle for medial metatarsophalangeal ligament 1997) and medioplantar margin of head

Supero-inferior base height Measured on metaphyseal ridge (SIB)

185

Medio-lateral base width Measured at metaphysis perpendicular to lateral (MLB) side

Midshaft diameter (MSD) Measured perpendicular to flat lateral side

Ischial length (IL) From where long axis of ischium crosses ischial tuberosity to intersection of long axes of pubis and ischium in acetabulum Bony pelvis Pubic length (PL) From acetabular intersection defined above to upper extremity of symphyseal articular facet of pubis (Definitions: Stewart, Height of sciatic notch Measured as a perpendicular line dropped from a 1979) (HSN) point on the PIIS where upper border of notch

meets auricular surface to anterior border of notch

Acetabulo-sciatic breadth From median point on anterior border of sciatic (ASB) notch to acetabular border

Clavicle Maximum length of clavicle Max. distance from sternal to scapular articular ends (XCL) (Stewart, 1979)

Maximum length of scapula Max. distance between the most superior and (XHS) inferior points of the scapular body

Maximum length of spine Max. distance between the lateral extent of the (XLS) acromion process (avoiding bone development due to degenerative joint disease) and the midpoint of the base of the scapular spine where it transects the

Scapula medial border

(Definitions: Breadth of infraspinous Max. distance from the origin of the lateral margin, Dabbs, body (BXB) just inferior to the glenoid prominence, and the 2010) midpoint of the base of the scapular spine where it transects the medial border

Height of glenoid Max. distance between the superior and inferior prominence (HAX) external margins of the glenoid prominence, using spreading calipers (avoiding bone development). This is an outside measure of the glenoid prominence

186

Breadth of glenoid Max. distance between the dorsal and ventral prominence (BCB) external margins of the glenoid prominence, using spreading calipers (avoiding bone development). This is an outside measure of the glenoid prominence breadth

FM=foramen magnum; EAM=external auditory meatus; SBB=semibistyloid breadth; PIIS=posterior inferior iliac spine; Max.=maximum; Min.=minimum.

All dimensions were based on standard anatomical distances, with one notable exception. This exception is opisthion–forehead length, which is proposed by Giles and Elliot

(1963) in their method for estimating sex from cranial measurements. In contrast to the other 10 dimensions they used, the definition of opisthion–forehead length is not based on standard landmarks presented by previous researchers. Furthermore, the dimension is not listed as a standard measurement in a number of the well-known osteological handbooks (Bass, 2005;

Brothwell, 1981; Buikstra & Ubelaker, 1994; Schwartz, 1995; White & Folkens, 2000; White &

Folkens, 2005). Giles and Elliot (1963) define opisthion–forehead length as “The maximum distance from opisthion (the midpoint on the posterior border of the foramen magnum) to the forehead in the midline”. Although the opisthion is a clear and well-defined landmark, defined in this way, the ‘forehead’ is a non-homologous landmark that will vary in location slightly depending on the sagittal profile of the frontal bone. Thus, the ‘forehead’ was taken to be a point on the midline of the frontal bone that was most distant from the opisthion, as indicated by the red arrow in Figure 2.2D below (the opisthion is not marked on this figure).

Figure 2.2D: The human cranium showing location of the “forehead landmark” (red arrow). Figure from

Buikstra & Ubelaker, 1994: 72, with additions.

187

The dimensions used in the study were measured using digital sliding calipers, digital spreading

(cranial) calipers, an osteometric board, and tape measure, as appropriate. Measurements were taken to the nearest millimetre, 0.1 mm or 0.01 mm, depending on the equipment used (the spreading calipers measured to 0.1 mm, whereas the digital sliding calipers gave measurements to two decimal places). It is retrospectively noted, however, that use of calipers that provide measurements to the nearest 0.01 mm may not be suitable for use on dry bone, given its exfoliative nature. As such, it is unlikely that measurement taken to the nearest 0.01 mm will ever be reliably repeated. This issue, which may be termed ‘pseudo-precision’ – using greater precision in measurements or calculations using those measurements than can be justified in terms of biological reality – has been noted and investigated by other authors. Hayek and colleagues (2001) performed a series of tests to establish whether pseudo-precision and the number of decimal places have an impact on statistical outcomes. Though describing pseudo-precision as “biologically and statistically offensive”, Hayek et al. (2001) found it to have no serious impact on univariate descriptive statistics, multivariate analyses, or the inferential results of real data sets. In the present study, the inter-observer error test between EJM and

IK-O was performed using sliding calipers that measured to 0.1 mm only; similarly, the skeletal dimensions included in the inter-observer error test between EJM and MR were all measured to the nearest millimetre or 0.1 mm. On the basis of this discussion, the results of the intra- and inter-observer error tests, and the conclusions reached based on the results, are still considered to be sound (see Section 3.2). For bilateral elements, the left side was recorded with substitution for the right side in instances where the left could not be measured. Alternatively, both sides were measured if it was required in order to test a particular method. The complete set of 63 dimensions, or as many as it was possible to record, was measured for each individual in the skeletal sample.

2.2.2.2 Measuring procedures and application of discriminant functions

Given that the purpose of this section of the project was to test published methods of metric sex estimation, the procedures for measuring the required bones as stated by the author of the published study were followed wherever possible. This ensured that the test was fair, having followed the methodology that the creator of the method had intended. Any special instructions for application of the metric sex estimation equations were also closely followed.

188

The metric sex estimation equations summarised previously and tested in this project are predominantly discriminant functions (see Section 2.5.3) and usually take the form:

D = (a1 x B1) ± (a2 x B2) ± (a3 x B3) … ± C

In this equation, D is a numerical value of the discriminant function, a1–3 are measured values,

B1–3 are unstandardised discriminant function coefficients, and C is a constant. To derive D, each measured value is multiplied by its coefficient, summed, and the constant is added or subtracted. The value of D is subsequently used to classify each individual in the study by assigning them to one of two groups, male or female. This is achieved by comparing the value of D to a sectioning point, which is based on the male and female group centroids. If the group centroids are sufficiently different, the number of correct group assignments will exceed chance

(Kinnear & Gray, 2009: 525–563).

Each equation was applied in turn to all individuals in the study sample for whom the required measurements were available. Where several different equations were presented as part of the same method, each equation was tested separately using the data collected from the three collections of ancient Egyptian skeletal remains sampled, and an overall accuracy rate for the method given where appropriate. The overall accuracy rate for the method as a whole was obtained by considering the number of correct and incorrect sex estimates for each individual using the separate functions presented and assigning sex based on majority. Given differences in skeletal morphology and sexual dimorphism of populations of African (‘Black’) or European

(‘White’) ancestry (see Sections 1.2.1.1.2 and 1.5.1.3), several researchers developed different equations for populations pertaining to different ancestral groups. Where possible, equations developed using populations of unknown ancestry, that is, those consisting of pooled ‘Black’ and ‘White’ individuals, were preferentially selected for testing. In instances where pooled functions were not available, the equations for ‘Black’ and ‘White’ populations were tested separately. The numerical scores resulting from the test of metric sex estimation equations were recorded in Microsoft Excel spreadsheets, along with the sectioning point associated with each equation, and the overall assignment of sex (male or female). A comparison of sex estimates using morphological versus metric methods was only made after all discriminant scores had been calculated and recorded. The metric estimate of sex was deemed to be correct if it was in agreement with the morphological estimate. Accuracy rates (or in this context ‘rates of

189

consistency’) in per cent were calculated for males and females separately by dividing the total number of correct sex estimates for the equation by the number of individuals to whom the equation or method could be applied, and multiplying the result by 100. A weighted mean accuracy rate for males and females combined was obtained by adding the counts of correct sex estimation across the two sexes and dividing by the total number of cases across the sexes, then multiplying the result by 100. This was achieved by adding the two percentages and dividing by two. Where appropriate, the total accuracy rate for a ‘method’ as a whole (i.e. a method that presents a number of different equations or functions) was found by adding the total accuracy rates in per cent and dividing by the number of equations or functions. The discriminant functions tested on the ancient Egyptian sample used in this study are provided in the sections below.

2.2.2.2.1 “Modern Methods”

Giles and Elliot (1963) cranial method

This method requires measurement of 11 dimensions of the cranium (listed and defined in Table

2.2K above) predominantly using spreading calipers, as well as sliding calipers for some of the smaller dimensions (nasal breadth, palate external breadth, and mastoid length). All 11 dimensions proposed by Giles and Elliot (1963) were collected, although only nine of these are required for the discriminant functions. The functions presented in Table 2.2L below were created from a pooled sample of both White and Black individuals. The functions created for

White-only or Black-only populations were not tested as part of this study. Each dimension is multiplied by its coefficient, given in the table, and added or subtracted as indicated. The resulting score is then compared with the sectioning point. Scores above the sectioning point are considered male, and scores below the sectioning are female.

Table 2.2L: Cranial functions for pooled White and Black populations (Giles & Elliot, 1963).

Function no. 3 6 9 10 13 16 17 18 21

Measurements 8 8 6 6 6 4 4 8 5

GO 6.083 4.692 5.538 5.550 2.184 3.833 4.850 1.165

MW -1.000 1.000 2.308 -1.000 1.000 1.000 -1.500

BB 9.500 8.769 10.308 6.150

BN 4.615 1.000 4.100 2.417 -0.100 1.659

DB 28.250 21.308 21.538 19.800 5.867 6.224 11.267 19.350 3.976

190

BP 2.250 -4.385 -0.100 1.050 -1.00

PN 9.917 7.385 7.600 2.483 4.067 7.150 1.541

PB -19.167 -10.400 -3.567 -9.900

ML 25.417 21.077 22.154 5.867 6.122

Sectioning pt. 6237.95 5972.03 6119.50 3686.41 1094.99 1495.40 2551.52 3922.26 891.48

Accuracy (%) 85.8 86.0 83.3 83.6 85.8 82.4 82.4 83.8 83.8

GO, glabello-occipital length; MW, maximum width; BB, basion-bregma height; BN, basion-nasion; DB, maximum diameter bizygomatic; BP, basion-prosthion; PN, prosthion-nasion height; PB, palate – external breadth; ML, mastoid length (ML); pt, point. Sectioning point (SP): greater than SP = male; less than SP = female.

Wescott (2000) C2 method

This method required measurement of eight dimensions of the second cervical vertebra. The dimensions are shown in Figure 2.2E, and listed and defined in Table 2.2K above.

Measurements were taken using digital sliding calipers.

Figure 2.2E: Line drawing of the second cervical vertebra from a superior view (A) and lateral view (B)

illustrating measurements used. Figure 1 from Wescott, 2000.

Five of the eight dimensions are required for use in five different discriminant functions (Table

2.2M), created using a pooled sample of White and Black individuals. To calculate the discriminant score, each measurement was multiplied by its coefficient, added or subtracted as indicated in Table 2.2M, and the constant subtracted. The score was then compared to the sectioning point, which for all five functions is zero. Scores above zero are indicative of a male individual; scores below zero are indicative of a female individual. Scores of exactly zero

191

indicate that sex cannot be assigned and the individual was therefore recorded as being of indeterminate sex.

Table 2.2M: Second cervical vertebra (C2) functions (Wescott, 2000).

Function no. 1 2 3 4 5

Measurements 1 2 3 4 5

XSL 0.6488 0.5836 0.5490 0.5882 0.5343

SFS 0.4359 0.3234 0.2987 0.3005

SFT 0.3021 0.2796 0.2142

LVF -0.1694 -0.1671

XDH 0.1183

Constant -32.159 -36.6899 -38.0016 -36.3804 -37.1515

Sectioning point 0.0 0.0 0.0 0.0 0.0

Male accuracy (%) 80.3 81.8 85.6 81.2 80.6

Female accuracy (%) 83.1 85.1 80.3 85.1 84.4

XSL, maximum sagittal length; SFS, superior facet sagittal diameter; SFT, superior facet transverse diameter; LVF, length of vertebral foramen ; XDH, maximum height of dens. Sectioning point 0.0: >0.0 = male; <0.0 = female.

Krogman & İşcan (1986) femoral head diameter method

This method requires measurement of a single dimension, the vertical diameter of the femoral head, using sliding calipers. This diameter is defined as the maximum diameter of the femoral head taken in the coronal plane. The measurement of head diameter taken in this manner is not strictly vertical (more accurately, it is from supero-lateral to infero-medial) though these measurements are taken within a vertical plane, hence traditionally they have been referred to as vertical diameters. No calculations are required; the measurement is simply compared to a sectioning point, given in Table 2.2N. Measurements greater than 45 mm are considered indicative of a male individual, while measurements less than 45 mm indicate a female.

Individuals whose measurement equalled 45 mm exactly were recorded as ‘indeterminate sex’ using this method.

192

Table 2.2N: Maximum diameter of femoral head sectioning points (Krogman & İşcan, 1986).

Male Female

Femoral head diameter (mm) >45 <45

Accuracy (%) 90 90

Seidemann et al, 1998 supero-inferior femoral neck diameter method

Only one measurement was required for this method. Sliding calipers were used to measure the neck of the femur at the minimum diameter in a supero-inferior direction, as shown in Figure

2.2F.

Figure 2.2F: The supero-inferior femoral neck diameter measurement (SID). Figure 1 from Seidemann et al, 1998.

To estimate sex using this method, a simple calculation is required, as shown in Table 2.2O.

This function was developed for use with individuals of unknown ancestry. The supero-inferior femoral neck diameter (SID) is multiplied by 0.510 and the constant shown in Table 2.2O subtracted from the result. The discriminant score is then compared with the sectioning point

(zero). Scores above the sectioning point are indicative of a male individual; scores below the sectioning point are indicative of a female individual.

193

Table 2.2O: Supero-inferior femoral neck diameter function (Seidemann et al, 1998).

Unknown ancestry

SID 0.510

Constant -15.356

Sectioning point 0.0

Accuracy (%) 93

Supero-inferior femoral neck diameter (SID). Sectioning point 0.0: >0.0 = male; <0.0 = female.

İşcan & Miller-Shaivitz (1984a)

To test this method, a soft-material (paper) tape measure was used to measure femoral circumference at midshaft. In accordance with the original study, the tape was made to follow the contours of the bone, even on femora with prominent lineae aspera. As shown in Table

2.2P, the measurements were compared with the sectioning point (86 mm). Scores above the sectioning point were considered male, scores below the sectioning point were considered female, and scores that equalled the sectioning point were considered indeterminate sex.

Table 2.2P: Femoral shaft circumference method (İşcan & Miller-Shaivitz, 1984a).

Male Female Accuracy (%)

>86 mm <86 mm 84.0

İşcan & Miller-Shaivitz (1984b) tibial method

This method required measurement of seven tibial dimensions, listed and defined in Table 2.2K above. The first dimension, tibial length, was measured using an osteometric board. Both circumference measurements were taken using a paper tape measure, and in each case the tape was made to follow the contours of the bone. The remaining four measurements (antero- posterior diameter, transverse breadth, proximal epiphyseal breath, and distal breadth) were taken using a digital sliding caliper. In measuring proximal epiphyseal breadth, care was taken not to incorporate osteoarthritic exostosis.

Both univariate (single sectioning point) and multivariate (discriminant function) methods were created, as shown in Tables 2.2Q and 2.2R. Separate sectioning points and functions were presented for White and Black populations; all were tested separately on the study sample. As with previous methods described above, to derive an estimate of sex using

194

the univariate methods, the measurements of proximal epiphyseal breadth, circumference at the nutrient foramen, and distal epiphyseal breadth were compared with the sectioning points given in Table 2.2Q below. Measurements greater than the sectioning point indicated a male individual; measurements less than the sectioning point indicated a female individual. To apply the discriminant functions, each measurement was multiplied by its coefficient, and added or subtracted as indicated in Table 2.2R; the constant was then subtracted from the total. The resulting score was compared with the sectioning point (zero). Scores above the sectioning point were considered male, scores below the sectioning point were considered female, and scores that equalled the sectioning point were considered indeterminate. The discriminant functions require six of the seven tibial dimensions; transverse breadth was found not to be discriminatory in the original study and was therefore not included in the functions.

Table 2.2Q: Univariate sectioning points for tibial measurements (İşcan & Miller-Shaivitz, 1984b).

Variable Ancestry Male (mm) Female (mm) Accuracy (%)

Male Female

Proximal epiphyseal White >74 <73 82.5 92.3 breadth (PEB) Black >75 <74 82.5 90.0

Circumference at nutrient White >92 <91 75.0 79.5 foramen (CNF) Black >96 <95 77.5 82.5

Distal epiphyseal breadth White >45 <44 82.5 87.2 (DEB) Black >46 <45 80.0 80.0

Table 2.2R: Multivariate discriminant functions for tibial measurements (İşcan & Miller-Shaivitz, 1984b).

Function Variables White Black Sectioning Accuracy (%) point White Black

4 PEB 0.234 0.254 M: 85.0 M: 87.5 DEB 0.097 0.013 0.0 F: 84.6 F: 92.5 Constant -21.359 -19.083

195

5 DEB 0.254 0.170 M: 75.0 M: 82.5 MSC 0.072 0.121 0.0 F: 84.6 F:85.0 Constant -16.843 -16.944

6 TL -0.001 0.002 M: 82.5 M: 87.5 PEB 0.287 0.251 0.0 F: 87.2 F: 92.5 Constant -20.148 -19.196

7 PEB 0.235 0.240 M: 87.5 M: 87.5 CNF 0.042 0.019 0.0 F: 84.6 F: 92.5 Constant -20.794 -19.326

8 TL -0.010 0.001

PEB 0.208 0.231 M: 85.0 M: 85.0 DEB 0.087 0.009 0.0 F: 84.6 F: 92.5 CNF 0.055 0.017

Constant -20.422 -19.498

9 TL -0.008

APD 0.114

PEB 0.201 0.233 M: 87.5 M: 87.5 DEB 0.126 0.034 0.0 F: 84.6 F: 95.0 CNF 0.059

MSC -0.083

Constant -20.451 -19.527

TL, tibial length; CNF, circumference at nutrient foramen; MSC, minimum shaft circumference; APD, antero-posterior diameter; PEB, proximal epiphyseal breadth; DEB, distal epiphyseal breadth. Sectioning point 0.0: >0.0 = male; <0.0 = female.

196

Spradley & Jantz (2011) humeral head diameter method

The humeral head diameter method requires only a single measurement using a digital sliding caliper. In accordance with standard measuring techniques for this dimension, the vertical humeral head diameter was measured from the most superior to the most inferior points on the border of the articular surface (Buikstra & Ubelaker, 1994: 80). This measurement was then compared with the sectioning points for both White and Black populations given in Table 2.2S below. Values above the sectioning point were considered male, values below the sectioning point were considered female, and values that equalled the sectioning point were considered indeterminate.

Table 2.2S: Humeral head sectioning points (Spradley & Jantz, 2011).

White Black

Sectioning point* 46 44

Accuracy (%) 83 86

*Values above the sectioning point are considered male; values below the sectioning point are considered female; values that equal the sectioning point are considered indeterminate.

Holman & Bennett (1991) arm bones method

This method required five measurements; maximum length of the humerus, radius, and ulna, which were measured using an osteometric board, and two measurements which represent an approximation of bistyloid (wrist) breadth. For convenience, these are referred to as the semibistyloid breadth (SBB) of the radius and ulna. The SBB of the radius was measured from the most lateral point on the styloid process to the deepest point of the ulnar notch. The SBB of the ulnar was measured from the most medial point on the head to the most lateral point on styloid process. Both these measurements were taken at a right angle to the long axis of the bone using a digital sliding caliper. A total of 21 discriminant functions were created; the seven shown in Table 2.2T and tested in this study are based on the pooled sample of White and

Black individuals. As with the other discriminant functions described previously, each measurement was multiplied by its coefficient, summed, and the constant subtracted from the total. The score was then compared with the sectioning point, which was zero for all seven functions. Scores above zero indicated a male individual; those below the sectioning point indicated a female individual; scores equal to zero indicated an individual of indeterminate sex.

197

Table 2.2T: Humerus, radius and ulna functions (Holman & Bennett, 1991).

Function no. 1 2 3 4 5 6 7

Radius SBB 0.800243 0.888901 0.956186 0.982161

Ulna SBB 0.658418 0.721568 0.881177 0.870315 0.893665

Radius 0.032589 0.046121 0.059728 length

Ulna length 0.030833 0.046487

Humerus 0.058092 length

Constant -45.1420 -41.2248 -40.0423 -37.6819 -31.8971 -35.8044 -29.7653

Sectioning 0.0 0.0 0.0 0.0 0.0 0.0 0.0 point

Male 86.3 86.3 80.4 76.5 80.4 82.4 80.4 accuracy (%)

Female 84.3 80.4 86.3 84.3 82.4 84.3 82.4 accuracy (%)

Sectioning point 0.0: >0.0 = male; <0.0 = female.

Berrizbeitia (1989) radial head diameter method

This method involves measurement of both the maximum and minimum diameters of the radial head, and was performed by rotating the blunt surface of a sliding caliper around the head until these measurements were identified. The sectioning points given in Table 2.2U were based on a pooled sample consisting of both White and Black individuals, and were the same for both right and left radii. To estimate sex using this method, the measured diameters were compared with the relevant sectioning points given in the table. For example, for the maximum head diameter, measurements less than or equal to 21 mm were considered female, measurements greater than or equal to 24 mm were considered male, and measurements that fell within the range 21.1–23.9 mm were considered indeterminate. Similarly, using the minimum diameter of the radial head, measurements less than or equal to 20 mm were considered female, measurements greater than or equal to 23 mm were considered male, and measurements that fell in the range 20.1–22.9 mm were considered indeterminate.

198

Table 2.2U: Radial head sectioning points (Berrizbeitia, 1989).

Male (mm) Female (mm) Accuracy (%)

Maximum head diameter ≥24 ≤21 83

Minimum head diameter ≥23 ≤20 82

Scheuer & Elkington (1993) first metacarpal method

This method was developed to enable sex estimation from each of the metacarpals and the first proximal phalanx, and therefore six different regression equations are presented. For the purposes of this study, only the first metacarpal (MC1) equation was tested. This decision was taken because MC1 is easy to identify, and because the equation based on its measurement produced the highest rate of correct sex classification in a test of the six equations. A total of six dimensions of MC1 were measured using a digital sliding caliper. The regression equation for this bone is given in Table 2.2V. To derive an estimate of sex, the constant is taken as the starting point and each measurement multiplied by its coefficient is then subtracted or added as indicated in the table. The resulting score was then compared with the sectioning point of 1.5.

Scores greater than the sectioning point indicated a female individual; scores less than the sectioning point indicated a male individual.

Table 2.2V: First metacarpal function (Scheuer & Elkington, 1993).

Variable Coefficient

Constant 4.58

Length -0.0092

Base M/L -0.0240

Base A/P -0.0619

Head M/L -0.0118

Head A/P 0.0108

Midshaft -0.132

Sectioning point 1.5

Accuracy (%) 94.0

Sectioning point 1.5: >1.5 = female; <1.5 = male. M/L, medio-lateral; A/P, antero-posterior.

199

Robling & Ubelaker (1997) first metatarsal method

As with the MC1 method, the first metatarsal (MT1) method of Robling and Ubelaker (1997) presents five separate equations to estimate sex from each metatarsal. The MT1 function only was included for testing in the present study because the first metatarsal is easy to recognise and side, and because of all five functions, it produced the highest accuracy rate. The method presents six dimensions of the tarsals; all were collected from individuals in the present study sample despite only three of them being required to test the function. All dimensions were measured using a digital sliding caliper and following the landmark definitions provided in Table

2.2K. The discriminant function for MT1 given in Table 2.2W is based on a pooled sample of

White and Black individuals. The separate functions for White and Black populations were not tested. An individual of unknown sex is assigned to a group by calculating their “male” score

(using the male function in the table) and “female” score (using the female function in the table).

Sex is then indicated by the greater of the two scores. For example, if a “male” score of 182.19 was derived using the male function, and a “female” score of 185.85 was derived using the female function, the individual would be classified as female as 185.85 is greater than 182.19.

Table 2.2W: First metatarsal function (Robling & Ubelaker, 1997).

Variable Male Female

Medio-lateral head width 4.57723 4.01449

Supero-inferior base height 9.97772 9.23746

Medio-lateral base width 2.58050 1.76753

Constant -241.21741 -189.65654

Accuracy (%) 91.0 92.0

N.B. Sex is indicated by the greater of the two resulting scores.

Stewart (1979) multiple bones method

The method presented by Stewart (1979) required measurement of several different bones, including the femur, os coxa, humerus, and clavicle. Table 2.2K lists and defines the dimensions required to test this method. Long bone lengths were measured using an osteometric board. All other dimensions were measured using a digital sliding caliper. A total of six discriminant functions were tested in this study; two developed for use in White populations and requiring

200

measurement of both the right and left sides, and four developed for use in Black populations

(Table 2.2X). Discriminant scores for each function were calculated by multiplying each measurement by its coefficient, and adding or subtracting as indicated in Table 2.2X. The resulting scores were then compared to the appropriate sectioning point, which differed for each function. A score above the sectioning point was considered male; a score below the sectioning point was considered female.

Table 2.2X: Functions for the multiple bones method (Stewart, 1979).

Measurements White Black

Function no. 1R 2L 3 4 5 6

Maximum femoral 1.000 1.000 0.070 1.000 1.000 length

Maximum diameter 30.234 30.716 58.140 31.400 16.530 1.980 of femoral head

Minimum femoral -3.535 -12.643 transverse diameter

Epicondylar breadth 20.004 17.565 of femur

Ischial length 16.250 11.120 6.100 1.000

Pubic length -63.640 -34.470 -13.800 -1.390

Height of the sciatic notch

Acetaulo-sciatic breadth

Maximum length of 2.680 2.450 humerus

Epicondylar width of 27.680 16.240 humerus

Maximum length of 16.090 clavicle

Sectioning point 3040.32 2656.51 4099.00 1953.00 665.00 68.00

Accuracy (%) 94.4 94.3 98.5 97.5 96.9 93.5

Sectioning points (SP): >SP = male;

201

2.2.2.2.2 “Population-Specific Methods”

Raxter (2007) long bones method

This method required measurement of three dimensions – maximum femoral head diameter, circumference of tibia at the nutrient foramen, and humeral head diameter. The circumference was measured using a paper tape measure and following the contours of the bone, as described previously. The two diameters were measured using a digital sliding caliper. The sectioning points used to estimate sex are given in Table 2.2Y. As with previous methods, each collected measurement was compared with the appropriate sectioning point. As shown in the table below, scores above the sectioning point were considered male, and scores below the sectioning point were considered female. A score that equalled the sectioning point was considered indeterminate.

Table 2.2Y: Sectioning points of the population-specific long bones method (Raxter, 2007).

Variable Male (mm) Female (mm) Accuracy (%)

Femoral head diameter >42 <42 89.0

Circumference of tibia at nutrient foramen >85 <85 89.0

Humeral head diameter >41 <41 89.0

Dabbs (2010) scapula method

To apply this population-specific method of sex estimation, five dimensions of the scapula were measured using a digital sliding caliper. The five measurements are listed and defined in Table

2.2K. For three of the measurements, maximum length of the spine, height of the glenoid fossa, and breadth of the glenoid fossa, care was taken to avoid bone development due to degenerative joint disease, as recommended by the original study’s author. A total of five discriminant functions, presented in Table 2.2Z, were tested on the collected measurements. As with other functions described previously, each measurement was multiplied by its coefficient, summed, and the constant subtracted from the result. The discriminant was then compared with the relevant sectioning point for each function given in Table 2.2Z. A score greater than the sectioning point was considered male; a score less than the sectioning point was considered female.

202

Table 2.2Z: Scapula functions (Dabbs, 2010).

Function 1 2 3 4 5

Maximum length of spine 0.097 0.024

Maximum length of scapula 0.136 0.112 0.064

Breadth of infraspinous body 0.186

Height of the glenoid fossa 0.864 0.674 0.571 0.658 0.662

Breadth of the glenoid fossa 0.493 0.219

Constant -30.788 -35.834 -45.538 -53.039 -54.335

Sectioning point -0.068 0.0 0.0 -0.046 -0.047

Accuracy (%) 84.6 87.5 87.5 88.0 84.0

Sectioning points (SP): >SP = male;

2.2.2.2.3 “Living Egyptian Methods”

The final two methods tested each present a single sectioning point for length of the first metacarpal and are based on measurements obtained from living Egyptians presenting at hospital for a hand radiograph or CT scan. Metacarpal lengths in these studies were measured from the midpoint of the base to the tip of the distal end, which is the same procedure used in the present study; hence, the test is fair. Tables 2.2AA and 2.2BB present the sectioning points for these two methods.

“Living Egyptian” Method 1 – Eshak et al, 2011

Table 2.2AA: MC1 sectioning point for “Living Egyptian” Method 1 (Eshak et al, 2011).

Variable Male (mm) Female (mm) Accuracy (%)

Interarticular length of MC1 >42.55 <42.55 Male: 66.7

Female: 80.0

203

“Living Egyptian” Method 2 – El Morsi & Al Hawary, 2013

Table 2.2BB: MC1 sectioning point for “Living Egyptian” Method 2 (El Morsi & Al Hawary, 2013).

Variable Male (mm) Female (mm) Accuracy (%)

Interarticular length of MC1 >46.5 <46.5 Male: 72.0

Female: 71.4

It is interesting to note the difference in the sectioning points, despite both being derived from the living Egyptian population. There are several reasons that could explain this, which are described and discussed in Chapter 4. The collected MC1 measurements were compared with each sectioning point in turn. As indicated in Tables 2.2AA and 2.2BB, a value above the sectioning point indicated a male; a value below the sectioning point indicated a female. The results obtained using these two “Living Egyptian” methods were compared both with each other and with the “Modern” MC1 method created by Scheuer and Elkington (1993) and described above.

2.3 Intra-observer and inter-observer error

The ability to replicate measurements reliably is an essential component of osteometric-based studies such as in this one. It is to be expected that the same measurements taken on different days or by different researchers will vary to some extent, and that this variation may largely be accounted for by inconsistency in the technique of the researcher(s) taking the measurements.

Clearly, it is advantageous to minimise technical variability in measurements by testing and refining established methodologies. Tests of measurement error should address the issues of precision and reliability (Pederson & Gore, 1996: 77–96). Precision refers to the observed variability in repeated measurements taken on the same individual. Provided there are no systematic errors, a high level of precision corresponds to low variability in successive measurements and suggests that there will be a high probability that a single measurement will be close to its true value (Pederson & Gore, 1996: 77–96). Measures of precision usually have the same units as the units of the variable under consideration (Pederson & Gore, 1996: 77–

96). Reliability, on the other hand, is a measure of the proportion of between-subject variance which is free from measurement error (Ulijaszek & Lourie, 1994: 30–55). This measure is often

204

expressed as a correlation coefficient and therefore has no units (Pederson & Gore, 1996: 77–

96).

In the present study, precision and reliability were evaluated using tests of intra- observer error, to establish the level of error associated with repeated measurements taken by the same observer, and inter-observer error, to establish the level of error associated with repeated measurements taken by different observers. These tests are essential for determining which skeletal dimensions are reliable and precise enough to be used in statistical analyses or by other researchers, and should therefore be included in all studies that use osteometric or even discrete variable data (Williams & Rogers, 2006).

2.3.1 Test of intra-observer error

A test of intra-observer error was used to determine the level of error associated with repeated measurements made by the same observer, in this case the present author, for each of the 63 skeletal dimensions used in the study. A random number generator was used to select a subsample of skeletons or isolated crania from each of the three collections examined. All skeletal dimensions were then remeasured on a separate day after all original measurements had been collected. In total, measurements were retaken for 11 skeletons from the Peabody

Museum collection, five skeletons from the NHM, Vienna collection, and six crania from the

Duckworth Laboratory collection. Precision was assessed using three separate measures:

1. Per cent intra-observer error

2. Paired samples t-test

3. Technical error of measurement (TEM).

Per cent intra-observer error was calculated for each of the 63 dimensions that were re- measured in the intra-observer error test subsample using Equation 2.3A, where M1 is the original measurement and M2 is the retaken measurement:

Equation 2.3A [(M1 – M2) ÷ M1] x 100 = per cent error

The mean per cent absolute error was subsequently calculated for each dimension. The presence of statistically significant differences between the original and retaken measurements for each dimension was assessed using a paired samples t-test (SPSS 19.0; IBN, Somers, NY).

205

In accordance with standard scientific practice, differences were considered significant if the P- values equalled or were less than 0.05 (see Pg 212).

The TEM is defined as the standard deviation (SD) of repeated measurements taken independently of one another on the same individual. The intra-observer TEM was calculated for each dimension using Equation 2.3B, where D is the difference between measurements, and

N is the number of individuals measured (Ulijaszek & Lourie, 1994: 30–55):

Equation 2.3B √ [(∑D2) ÷ (2N)] = TEM (mm)

To allow comparison of TEM between different dimensions, per cent or relative TEM was additionally calculated using the following equation: (TEM ÷ mean) x100 = %TEM (Ulijasek &

Kerr, 1999). In this form, the %TEM represents an estimate of error magnitude relative to the size of the measurement, and is analogous to the coefficient of variation, where the standard deviation is divided by the mean (Weinberg et al, 2005). In terms of reliability, smaller percentages represent more precise measurements.

Acceptable levels of %TEM are not widely reported in the physical anthropological literature. Furthermore, standard values of acceptability for per cent error in linear measurements are difficult to define because they depend on a number of different factors, for example: 1) the accuracy of the instrument used, 2) the difficulty of locating landmarks repeatedly and the type of measurement being taken (i.e.measurements of bone lengths are more easily reproducible than measurements of bone widths), and 3) the research question, i.e. the size of effect or difference between samples that is being investigated.

The intra-observer reliability of the measurement of each dimension was assessed using an equation which calculates the reliability coefficient, R, a value that ranges from 0 (not reliable) to 1 (complete reliability; Equation 2.3C). In this equation, SD is the sample standard deviation of all measurements, which when squared represents the sample variance (Goto &

Mascie-Taylor, 2007; Ulijaszek & Lourie, 1994: 30–55):

Equation 2.3C 1 – [(TEM2) ÷ (SD2)] = R

As previously stated, this coefficient reveals what proportion of the between-subject variance in a measured population is free from measurement error. In the case of a measurement with an

R-value of 0.9, 90% of the variance could be attributed to factors other than measurement error

206

(Ulijaszek & Lourie, 1994: 30–55). Although there are no recommended values for R, values greater than 0.95 are generally taken to indicate small errors and good quality control (Goto &

Mascie-Taylor, 2007; Ulijaszek & Kerr, 1999). Despite the use of the reliability coefficient in other physical anthropological studies (Fields et al, 1995; Goto & Mascie-Taylor, 2007), it could be argued that this measure is not appropriate in the context of sexual dimorphism studies, given that the research question concerns between-group variation rather than total group variation.

The reliability of the measuring instruments used in the study was further assessed by examining the Pearson correlation coefficient, r, which places a numerical value on the degree of agreement between test and retest measurements taken on the same individual. As a general rule, an r of 0.80 or higher indicates reliability (Spatz, 2001).

2.3.2 Test of inter-observer error

A test of inter-observer error was used to determine the level of error associated with repeated measurements made by different observers for each of the 63 skeletal dimensions used in the study. Unfortunately, a colleague was not available during the three data collection trips to retake the collected measurements. As such, the test of inter-observer error was conducted retrospectively using two sources of data:

1. Repeated measurements taken by the current author and a departmental colleague

(IK-O) using skeletal material from the Tissue Bank located within the KNH Centre for

Biomedical Egyptology at the University of Manchester.

2. Measurements collected by an independent researcher (Dr Michelle Raxter, University

of South Florida) on the Predynastic Period Keneh and Old Kingdom Giza collections of

skeletons held at the Peabody Museum of Archaeology and Ethnology, Harvard

University, Boston.

The use of data collected by an independent researcher, named in point two above, meant that it was not necessary to collect additional measurements. Rather, the existing data collected by the present author as part of this study was compared with those previously collected by Dr

Raxter on the same individuals. In total, 65 individuals were measured by us both, and of the 63 skeletal dimensions used in this study, five (maximum lengths of the humerus, radius, femur and tibia, and femoral head diameter) were also measured by Dr Raxter.

207

In comparison, the inter-observer error test involving a departmental colleague required the collection of additional data from the unprovenanced but very well preserved skeletal material held within the KNH Centre’s Tissue Bank. The test sample consisted of two crania, three ossa coxae, two femora, two humerii, two tibiae, two scapulae, one radius, and one first metacarpal. This sample allowed 46 of the 63 skeletal dimensions to be measured. Each observer, the present author (EJM) and a trained osteologist (IK-O), used the same recording form, the same equipment (for example, the same type and brand of sliding or cranial calipers), and worked from the same list of skeletal landmark and/or dimension definitions. Each observer collected their measurements independently and without reference to the other.

This methodology resulted in two comparative sets of data: EJM vs. Dr Raxter and EJM vs. IK-O. These data sets were subsequently combined to produce an overall inter-observer error test sample and data set. As with the test of intra-observer error, precision and reliability were evaluated by calculating mean per cent absolute error, TEM, %TEM, R, and r for each skeletal dimension using the equations presented above. The same equation used to calculate intra-observer TEM could also be used to calculate inter-observer TEM because there were only two observers. With more than two observers, the calculation becomes more complex (Ulijasek

& Kerr, 1999). Acceptable inter-observer error precision rates are not available from the physical anthropological literature.

2.3.3 Other calculations and equations

To investigate changes over time in ancient Egyptian body size and proportions, sexual dimorphism indices were calculated for each skeletal dimension included in the study using two methods. The first was to calculate the per cent difference between means (%D) shown in

Equation 2.3D (López-Martin et al, 2006; Pomeroy, 2013; Ruff, 1987), and the second was to calculate a simple sexual dimorphism index by dividing the male mean by the female mean.

Equation 2.3D [(Male mean – female mean) ÷ female mean] x 100 = %D

The use of %D to examine the degree of sexual dimorphism exhibited by different skeletal elements or dimensions is in accordance with other studies sampling both human populations

(Black, 1978b; Holt, 2003; Pomeroy, 2013; Ruff, 1987; Wells, 2012) and bird species (Galarza et al, 2008). By convention, dimorphism values are commonly presented in per cent form, for

208

example, the female value as a per cent of the male value, or vice versa (Wells, 2012). This approach has an advantage over crude sex differences in measurement values as it adjusts for population variability in body size. It might be assumed that taller populations will have greater absolute sexual dimorphism than shorter populations; hence, it is more appropriate to assess relative dimorphism (Wells, 2012). In the latter instance a score of 1 indicates that male and female dimensions are the same size and no sexual dimorphism is present, a score of greater than 1 indicates that male dimensions are larger than female dimensions and a higher score indicates a greater degree of sexual dimorphism, and a score of less than 1 indicates that female dimensions are larger than male dimensions and a lower score (closer to zero) indicates a greater degree of sexual dimorphism.

Previous authors have noted that while it is a common occurrence for studies to make comparisons across population or time periods regarding the level of sexual dimorphism, not all of these studies attempt to test the significance of the difference in sexual dimorphism among groups (Relethford & Hodges, 1985). A number of studies have attempted to address this issue

(Bennett, 1981; Chakraborty & Majunder, 1982); however, the methods presented tend to be mathematically complicated and difficult to compute. In the present study, the method of

Relethford and Hodges (1985) was used to test whether the differences in sexual dimorphism of the femur and tibia exhibited by two temporally distinct subsamples of ancient Egyptian skeletons (from the Predynsatic Period and Old Kingdom) were statistically significant. This method was chose because it is relatively simple to compute and can be applied to summary statistics. The femur and tibia were chosen because any differences in the sexual dimorphism exhibited by dimensions of these bones may be more informative in terms of patterns of mobility, subsistence strategy and growth than other skeletal dimensions. The method of

Relethford and Hodges is based on a linear regression model and is calculated using Equation

2.3E below, with terms given in Equations 2.3F, 2.3G, and 2.3H. All that is required for this procedure is the sample size, mean, and standard deviation for each sex for both samples.

Thus, in these equations, M refers to the number of males in a sample, F the number of females in a sample, s the standard deviation of the sample, and Ȳ the mean of the sample. The subscripts 1 and 2 refer to samples 1 or 2, and the subscripts m and f refer to male or female.

209

Equation 2.3E t= (b1 – b2) ÷ √(AB ÷ df), where df = (M1 + F1 + M2 + F2) – 4.

Equation 2.3F A = [(M1 + F1) ÷ (M1 x F1)] + [(M2 + F2) ÷ (M2 x F2)]

2 2 2 2 Equation 2.3G B = (M1 – 1) x sm1 + (F1 – 1) x sf1 + (M2 – 1) x sm2 + (F2 – 1) x sf2

Equation 2.3H (b1 – b2) = (Ȳm1 – Ȳf1) – (Ȳm2 – Ȳf2).

2.4 Age at death estimation

Skeletal maturity was a key criterion for inclusion in the study sample (see Section 2.1.2), and was assessed by examining the extent of epiphyseal fusion of the long bones and clavicle.

Unfortunately, chronological age, defined as the number of months or years that have elapsed between an individual's birth and any given point in time (Lynnerup et al, 2010), was not known for any of the skeletons in the study sample. Age at death was therefore estimated for all individuals included in the skeletal sample in order to construct demographic profiles of the cemetery sites and collections examined, as well as the population sample as a whole.

Currently, there are no population-specific methods for estimating age at death in ancient

Egyptian skeletal remains. Five standard techniques were therefore used:

1. Pubic symphysis method (Brooks & Suchey, 1990)

2. Auricular surface method (Buckberry & Chamberlain, 2002)

3. Sternal rib end method (İşcan et al, 1984a, 1984b; İşcan et al, 1985; İşcan et al, 1987)

4. Cranial sutures method (Meindl & Lovejoy, 1985)

5. Tooth wear method (Lovejoy, 1985).

These methods are generally considered the ‘standards’ for age at death estimation, as postulated by several authors (Buikstra & Ubelaker, 1994; Byers, 2008; Garvin & Passalacqua,

2012; White & Folkens, 2000), and have been used in previous studies sampling ancient

Egyptian skeletal remains (Buzon & Bombak, 2010; Raxter et al, 2008; Zaki et al, 2009). Each individual in the study sample was assigned to one of three age categories in accordance with the recommendations of Buikstra and Ubelaker (1994: 36): Young adult (20–34 years), Middle adult (35–49 years), and Old adult (50+ years). Age was classified as ‘indeterminate’ if there was insufficient skeletal material from which an age at death estimate could be derived. This methodology allowed the construction of age frequency tables for each sample and/or skeletal

210

collection included in the present study. The age at death recording form is shown in Appendix

7.2.

2.5 Statistical analyses

All statistical analyses were undertaken using the SPSS 20.0 (IBM, Somers, NY) statistical software package for Windows. The Shapiro-Wilk test was used to test all variables (skeletal dimensions) for normality. This test was selected because, compared with others such as the

Kolmogorov-Smirnov test, it is the most powerful for all types of distribution and sample sizes

( & Pala, 2003; Razali & Wah, 2011). Exploration of the data included tests for the presence of outliers and extreme scores, and calculation of descriptive statistics, such as minimum and maximum measurements, the overall mean measurement, male and female mean measurements, and the standard deviation.

To reduce the dominating effect of large dimensions all data were Z-scored. A Z-Score may be defined as the difference between a raw score and its mean, in standard deviations

(Taylor Fitz-Gibbon & Lyons Morris, 1987: 33). This procedure allows comparisons to be made across different distributions because all scores are placed on the same standard scale. It has been widely used in a number of different anthropometric contexts as a way of adjusting for body size (Cole et al, 2005; De Meer et al, 1993; Dibley et al, 1987; May et al, 2013; Muhe et al,

1997; Uhl et al, 2013; Zakrzewski, 2007, amongst others). Determining the relative contributions of size and shape in distance measurements is a common concern in osteometric studies

(Pietrusewsky, 2008: 499). Here, size may be defined as the magnitude of a vector of measurements of an organism, while shape may be defined as a function of relative proportion normalised by size (Corruccini, 1987). Although size is widely regarded as a fundamental aspect of any organism’s biology, the goal of many osteometric or anthropometric studies is to assess similarity or differences among populations or sexes after size is taken into account, controlled for, or “factored out” (Jungers et al, 1995). However, the idea of extricating shape from size is controversial. Some researchers opine that shape is more relevant than size

(Corruccini, 1987), while others are more sceptical. For example, Jungers (1984) suggests that

“…after size is removed, the biological meaning of that which remains is not always entirely clear”. More recently, authors have argued the importance of size over shape in particular areas of study such as human evolution. McNulty and Vinyard (2015), for example, present the

211

following argument: “Given that shape frequently changes with size during growth and across species, it is difficult to ignore the functional and adaptive consequences of size difference in metric analyses”. However, it is important to bear in mind that the removal of size effects does not mean that size-related data are discarded; rather, the effects of shape and size are separated so that each may be studied independently of the other. In the present study, both the raw data and Z-Scores were analysed using Analysis of Variance (ANOVA; see Section

2.5.2) to assess the relative importance of size and shape in skeletal distance measurements.

In all hypothesis tests, the alpha level, or threshold against which the P-value of the test statistic was measured, was set at 0.05 in accordance with standard scientific practice. This value, which is essentially arbitrary (Kusuoka & Hoffman, 2002), applies to the normal or standard Gaussian distribution and was first suggested by Fisher (1936) who stated that:

“The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point

as a limit in judging whether a deviation ought to be considered significant or not. Deviations

exceeding twice the standard deviation are thus formally regarded as significant. Using this

criterion we should be led to follow up a false indication only once in 22 trials, even if the

statistics were the only guide available. Small effects will still escape notice if the data are

insufficiently numerous to bring them out, but no lowering of the standard of significance

would meet this difficulty” (Fisher, 1936: 46).

In other words, the alpha value provides the probability of a type I error; errors that occur when we reject a null hypothesis that is actually true. In comparison, the P-value of the test statistic may be defined as the probability, under the null hypothesis, of obtaining a result equal to or more extreme than what was actually observed (Figure 2.5A; Goodman, 1999).

Figure 2.5A: The relationship between the alpha (α)

level and P-value of the test statistic. The bell-shaped

curve represents the probability of every possible

outcome under the null hypothesis. Both α and the

P-value are “tail areas” under this curve. A statistically

significant P-value is less than alpha. Figure 3 from

Goodman, 1999.

212

A number of different statistical procedures were used in this project. The hypotheses outlined in Section 1.6.1 were tested using independent and paired samples t-tests, as appropriate, and two-factor Analysis of Variance (ANOVA). Principal Component Analysis

(PCA), a form of Projection Pursuit, was used to explore the data set with the aim of finding multivariate patterns and structures in multi-dimensional data by projecting them on a lower dimensional subspace (Huber, 1985). Finally, logistic regression and discriminant function analysis were used to analyse the relationships between multiple independent variables

(skeletal dimensions) and the single categorical dependent variable (sex), and to create equations and functions for predicting membership to one of the two categories of the dependent variable.

2.5.1 Missing data

Missing data is one of the most pervasive problems in data analysis, the seriousness of which depends on the pattern of missing data, how much is missing, and why it is missing (Tabachnick

& Fidell, 1996: 60). In studies such as this one, it is relatively common to have missing data because archaeological human remains are subject to taphonomic and destructive processes which means that not all bones will be intact and measureable. However, missing data has important consequences because many multivariate statistical techniques require complete data sets for each case (Brown et al, 2012; Clavel et al, 2014). Authors tend to agree that the pattern of missing data is more important than the amount missing (Meyers et al, 2006: 56–57;

Tabachnick & Feidell, 1996: 60). Missing values scattered randomly through a data matrix pose less serious problems than nonrandomly missing values. The latter are serious no matter how few of them there are because they affect the ability to generalise the results (Tabachnick &

Fidell, 1996: 60).

Randomly missing data may be divided into two broad categories. Observations are said to be ‘missing completely at random’ (MCAR) if none of the variables in the data set contains missing values that are related to the value of the variable under scrutiny. Alternatively, missing data are ‘missing at random’ (MAR) if, after controlling for other variables, the variable under scrutiny cannot predict the distribution of the missing data (Meyers et al, 2006: 57). In

SPSS, the MCAR assumption is tested using Little’s chi-square test (Little, 1988).

213

There are a number of established procedures for dealing with missing data. These may be broadly divided into two categories: deletion or imputation. The easiest and most common approach is to remove any cases from the analysis that have missing data. This is known as ‘listwise’ deletion and is usually the default in SPSS. Under this method, a single missing value on just a single variable in the analysis is cause for the case to be excluded

(Meyers et al, 2006: 60). This approach is reasonable when the number of deleted cases is not too high; however, an obvious concern with listwise deletion is the restriction on sample size and the impact that this may have on statistical power (Clavel et al, 2014). A slightly more liberal approach is ‘pariwise’ deletion, in which statistics are calculated from all available cases that have valid values for the variable in question, regardless of whether values for other variables are missing (Meyers et al, 2006: 60). As such, no cases are necessarily completely excluded from the analysis. A particular problem with this approach is that the parameters of the statistical model will be based on different sets of data with different statistics, such as the sample size and standard error (Kang, 2013). An alternative approach to dealing with missing data involves imputation procedures. This group of methods attempts to impute or substitute for a missing value some other value that is deemed to be a reasonable guess or estimate (Meyers et al,

2006: 60). Perhaps the simplest way of doing this is to replace all missing values for a particular variable with the mean value for that variable. The argument for using so-called ‘mean substitution’ is based on the accepted rubric that the sample mean is the best estimate of the population mean (Meyers et al, 2006: 62). However, the true values for the missing cases would almost certainly vary over at least a modest range of scores. Therefore, by substituting the same single value for a missing value, the variability of the variable will be artificially reduced

(Meyers et al, 2006: 62). Furthermore, this approach adds no new information but only increases the sample size and leads to an underestimate of the errors. As such, mean substitution is not generally recommended (Kang, 2013). A slightly more sophisticated approach is to use regression to impute missing values. This procedure preserves all cases by replacing the missing data with a probable value estimated using a prediction equation and based on cases with complete data (Kang, 2013; Meyers et al, 2006: 63). Although considered a better approach to mean substitution, problems with this procedure can arise when missing values occur on multiple independent variables, or on the dependent variable. There is also a tendency

214

to “overfit” the missing values because they are predicted from other independent variables.

This overfitting may produce samples that may not reflect or generalise to the population from which they were drawn (Meyers et al, 2006: 63; Tabachnick & Fidell, 1996: 64). Furthermore, the analysis of imputed data as though it were a complete data set results in underestimation of standard errors and overestimation of test statistics (Meyers et al, 2006: 63).

One way to overcome some of the limitations associated with single approaches for dealing with missing data discussed above is to use a combination of procedures. One such consolidated method is expectation maximisation (EM) imputation. EM is a type of maximum likelihood method that can be used to create a new data set, in which all missing values are imputed with values estimated by the maximum likelihood methods. This procedure was introduced as early as 1950 by Ceppellini et al. in the context of gene frequency estimation; however, the standard reference on the EM algorithm and its convergence is Dempster et al.

(1977). The EM approach begins with the expectation step. In this step, the algorithm estimates the responsibilities (posterior probabilities) of the missing variables given the observed data and current parameter settings. In the maximisation step, the log-likelihood is maximised with the updated responsibilities, and the means, covariances and mixing coefficients re-estimated

(Aggarwal & Reddy, 2014: 66). The expectation step is then repeated with the new parameters and so on until the system stabilises, that is, when the covariance matrix for the subsequent iteration is virtually the same as that for the preceding iteration (Kang, 2013). EM has several advantages compared with regression imputation, for example because EM always starts with the full covariance matrix, it is possible to generate regression estimates for any set of predictors, no matter how few cases there may be in a particular missing data pattern (Meyers et al, 2006: 64). Furthermore, tests comparing numerous different imputation methods have found the EM approach to be among the best-performing methods (Clavel et al, 2014; D’Angelo et al, 2012), producing estimates of descriptive statistics and correlation coefficients that were close to those of the original variables in a complete data set from which values had been purposefully deleted (Musil et al, 2002). EM does, however, have limitations. For example, EM is model specific; each proposed data model requires a unique likelihood function (Dong &

Peng, 2013). In addition, the accuracy or inaccuracy of the maximum likelihood estimation process is not accounted for in the variances of the resulting estimators. This leads to smaller

215

variances, smaller confidence intervals, and therefore a greater risk of finding significant differences between variables when there are no actual differences (type I error; Blankers et al,

2010). Nevertheless, EM is a powerful tool when used in appropriate contexts (Dong & Peng,

2013).

2.5.2 Independent and paired samples t-tests

A t-test is used to determine whether the means of two groups are statistically significantly different from each other (Dancey & Reidy, 2004). The null hypothesis for this test is that the means are equal (H0: µ1 = µ2). If the null hypothesis is true, any difference between the two sample group means (denoted in the above equation by the subscripts 1 and 2) is due to chance (Spatz, 2001). However, if the t-value calculated from the study data is greater than the critical value (less probable than alpha), the null hypothesis is rejected and it can be concluded that the two samples came from populations with different means (Spatz, 2001).

An independent samples t-test was used to compare the male and female means for each of the 63 skeletal measurements included in the study. This test was additionally used to compare male and female means from different populations, in this instance different time periods (Predynastic Period, Old Kingdom, Late Period) and different geographic locations

(Upper Egypt, Lower Egypt). Homogeneity of variances was tested using the Levene’s statistic.

When Levene’s statistic had a P-value for F of greater than 0.05 the variances were assumed to be homogeneous; as a result, the Equal Variances Assumed line of values for the t-test was used. When Levene’s statistic had a P-value for F of less than 0.05 the variances were assumed to be heterogeneous and the Equal Variances Not Assumed line of values for the t- test was used (Kinnear & Gray, 2009: 525–563). A paired samples t-test was used in the intra- and inter-observer error tests to determine if there were any statistically significant differences between repeated measurements taken on or by the same individual for each of the 63 skeletal dimensions included in the study.

2.5.3 Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical technique used to test equality among two or more means by comparing variance within groups relative to variance between groups (Larson,

2008). The estimate of within groups variance is considered random or error variance

(Tabachnick & Fidell, 1996: 37). The second type of variance estimate comes from differences

216

in group means and is considered a reflection of group differences. If these two estimates of variance do not differ appreciably, it is concluded that all the group means come from the same sampling distribution of means, and that slight differences between them are due to random error (Tabachnick & Fidell, 1996: 38). On the other hand, if the group means differ more than expected it is concluded that they were drawn from different sampling distributions of means.

The null hypothesis that all the means are equal may therefore be rejected (Tabachnick & Fidell,

1996: 38). Differences among variances are evaluated as ratios, which form an F distribution.

The larger the value of F, the test statistic, the more evidence there is that the means of the groups differ (Altman & Bland, 1996; Tabachnick & Fidell, 1996: 38).

The ANOVA procedure is essentially a specialised application of the general linear model (GLM), a complex set of statistical methods linked by a unifying conceptual framework.

The GLM can be used to test almost any hypothesis about a dependent variable that is measured numerically (Miller & Haden, 2006: 1). In this study, two-factor ANOVA was used to examine changes in skeletal size and body proportions over time by comparing samples with respect to two factors or independent variables: sex and time period. This type of factorial experiment allowed two types of effects to be tested: main effects, which are associated with individual factors or variables, and interaction effects, which are created when the factors or variables are crossed with each other (Larson, 2008; Kinnear & Gray, 2009: 525–563). In other words, an interaction occurs when the effect of one discrete factor or independent variable on the response variable depends on the value of the other discrete factor or independent variable.

In the present study, the two independent variables each had several conditions (male and female, and Predynastic Period, Old Kingdom, and Late Period). This procedure was therefore used to compare male and female means from samples dating to the principal time periods, and for male and female means from samples pertaining to Upper versus Lower Egypt (in this instance sex and location are the independent variables).

This approach clearly requires comparison of a number of different groups, which has the potential to introduce issues and errors associated with multiple comparisons testing. The statistical problem that arises from the use of multiple comparisons tests is that any subsequent tests of hypotheses will be performed on the outcome with the same data on which the global test was performed. This can result in an uncontrolled type I error rate; the rate of rejecting the

217

null hypothesis when it should not be rejected (Cabral, 2008). For example, for a single comparison and an alpha level of 0.05, the probability of correctly rejecting the null hypothesis is

1 – 0.05 = 0.95, and the probability of falsely rejecting the null hypothesis is 0.05. However, with a second comparison, the corresponding probabilities are 0.95 x 0.95 = 0.9025 and 1 – 0.9025

= 0.0975. In other words, by allowing two attempts to reject the null hypothesis, the chances of falsely rejecting it (making a type I error) have been doubled (Kusuoka & Hoffman, 2002).

Multiple comparisons adjustments are therefore essential in all situations where the main or global hypothesis can only be proven by means of multiple significance tests (Bender & Lange,

2001). The most common method of reducing the type I error rate is to divide the probability of making a type 1 error by the number of comparisons being performed. This is the basis of the

Bonferroni correction (Kusuoka & Hoffman, 2002). The procedure was developed to control the maximum experimentwise error rate, also known as the familywise error rate, under any complete or partial null hypothesis. The familywise error rate is the probability of rejecting falsely at least one true individual null hypothesis, irrespective of which and how many of the other individual null hypotheses are true (Bender & Lange, 2001). The application of multiple comparisons adjustment procedures therefore enables one to conclude which tests are significant and which are not, but with control of the appropriate error rate (Bender & Lange,

2001).

Although the Bonferroni correction is simple and applicable in essentially any multiple test or multiple comparisons situation, it has also been criticised for being low powered, overly conservative, and for increasing the rate of type II errors (accepting the null hypothesis when it should be rejected) (Bender & Lange, 2001; Kusuoka & Hoffman, 2002; Nakagawa, 2004;

Perneger, 1998). However, many of these concerns stem from the improper and overzealous application of the correction (Nakagawa, 2004; Perneger, 1998). In the present study, two-factor

ANOVA using a GLM procedure and with the Bonferroni correction as a post hoc test was used to compare male and female mean skeletal measurements from the principal time periods included in the study. For the cranial measurements, this test involved comparison of the

Predynastic Period vs. the Old Kingdom; the Predynastic Period vs. the Late Period; and the

Old Kingdom vs. the Late Period. This is considered a family of tests; therefore, use of the

Bonferroni correction to adjust for multiple testing is justified (Kusuoka & Hoffman, 2002). For

218

the tests of post-cranial skeletal dimensions, each of the independent variables (sex and time period) consisted of two groups only (male and female, and Pre-Dynastic Period and Old

Kingdom; post-cranial remains for the Late Period sample are not available). The nature of this analysis is subject to the comparison wise error rate only, which equates to the normal type I error rate (Kusuoka & Hoffman, 2002). As such, the Bonferroni correction was applied only when comparing main effects, for example, the effect on the dependent variable (skeletal dimension) of one independent variable (in this case time period) under each condition of the second independent variable (sex). All ANOVA procedures were performed using both the raw data and Z-Scores.

2.5.4 Projection pursuit and PCA

Projection pursuit is a procedure for searching high-dimensional data for interesting low- dimensional projections via the optimisation of a criterion function called the projection pursuit index (Huber, 1985; Lee et al, 2005). This idea originated with Kruskal (1969; cited in Huber,

1985); however, it was Friedman and Tukey (1974) who coined the term “projection pursuit” to describe a technique for exploratory analysis of multivariate data. It is a useful technique for an initial data analysis, especially when data are in a high dimensional space. A problem many multivariate analysis techniques face is “the curse of dimensionality”, which is a result of the fact that most of high dimensional space is empty (Lee et al, 2005). In addition, research results show that in high dimensional space, the concepts of proximity, distance or nearest neighbour may not even be qualitatively meaningful (Aggarwal et al, 2001; Clarke et al, 2008). Projection pursuit methods allow exploration of multivariate data in interesting low dimensional spaces.

The definition of an “interesting” projection depends on the projection pursuit index and on the application or purpose (Lee et al, 2005). Many projection pursuit indices have been developed to define interesting projections. Because most low-dimensional projections are approximately normal (Huber, 1985), most of the projection pursuit indices are focused on non-normality (Lee et al, 2005).

Many of the methods of classical multivariate analysis are special cases of projection pursuit. Notable examples include discriminant function analysis and principal components analysis (PCA; Huber, 1985). However, PCA may more appropriately be described as a technique for reducing dimensions. It was first described by Pearson in 1901 as a way to

219

represent a system of “…points in plane, three, or higher dimensional space by the ‘best-fitting’ straight line or plane.” In other words, PCA is a mathematical algorithm that reduces the dimensionality of the data while retaining most of the variation in the data set. This is accomplished by transformation to a new set of variables, the principal components, which are uncorrelated, and which are ordered so that the first few retain most of the variation present in all of the original variables (Joliffe, 2002: 1). The principal components are thought to reflect underlying processes that have created the correlations among variables (Tabachnick & Fidell,

1996: 635). The specific goals of PCA are to (Tabachnick & Fidell, 1996: 636):

 Summarise patterns of correlations among observed variables

 Reduce a large number of observed variables to a smaller number of components

 Provide an operational definition (regression equation) for an underlying process by

using observed variables

 Test a theory about the nature of the underlying process.

PCA has been used extensively in physical anthropology. Topics investigated using PCA include, but are by no means limited to, allometry (Bailey et al, 2014; Corruccini, 1983), ancestry

(Kenyhercz et al, 2014; Neves et al, 2007; Neves & Pucciarelli, 1991), ancient DNA (Der

Sarkissian et al, 2013; Raff et al, 2011; Rothschild et al, 2001), palaeodemography (Mensforth,

1990), odontometrics (Harris & Bailit, 1988), body composition (Mueller & Wohlleb, 1981;

Shields et al, 2006), population continuity (Van Gerven et al, 1977; Zakrzewski, 2007), sexual dimorphism (Betti, 2014; Papaioannou et al, 2012; Van Gerven, 1972), evolution (Gordon et al,

2008b; Kennedy et al, 1984), sex estimation (Kranioti et al, 2009), and skeletal morphology

(Evteev et al, 2014; Frelat et al, 2012; Frieß & Baylac, 2003; Harvati, 2003; Howells, 1957;

Katsavrias & Halazonetis, 2005; Lockwood et al, 2002).

PCA is a standard tool in modern data analysis, and has been widely used in a number of different fields in addition to physical anthropology, including agriculture, biology, chemistry, climatology, demography, ecology, genetics, geology, meteorology, oceanography, and psychology (Jolliffe, 2002: 9). Despite this, there are a number of limitations associated with

PCA. For example, rotation of the solution, which is often performed to achieve a simpler structure that is easier to interpret (Meyers et al, 2006: 495–496), may destroy some of the

220

properties of PCA. In particular, variables may become uncorrelated or lose the maximal variance property (Stata.com, 2015). In addition, PCA is a linear transformation; thus any data set that is non-linear will not be represented sufficiently after data reduction. Furthermore, PCA assumes that the directions with the largest variances are of most interest, which might not necessarily be true (Blighe, 2013). It also follows the assumption that the original data variables are correlated; if they are not, then PCA cannot reduce the data (Blighe, 2013; Parsons et al,

2009). However, these limitations predominantly refer to instances where PCA is used as the sole analytical tool for a large data set (Blighe, 2013). In reality, most applications of PCA are exploratory in nature; PCA is used primarily as a tool for reducing the number of variables or examining patterns of correlations among variables. Under these circumstances, both the theoretical and practical limitations of PCA are relaxed in favour of a frank exploration of the data (Tabachnick & Fidell, 1996: 639).

In the present study, exploratory PCA was conducted using both the raw data and Z-

Scores in accordance with other authors (Andrews & Williams, 1973). Standardisation of the data by Z-scoring is recommended when there are likely to be big size differences between the constituents of the sample. In such instances, the principal discriminating factor between them is likely to be one of size, and to achieve maximum discrimination this must be taken into account (Andrews & Williams, 1973). However, large size differences tend to swamp smaller but biologically significant differences in morphology, and in such instances it is useful to employ both an unstandardised method (raw data) to obtain maximum discrimination, and a standardised one (Z-Scores) to identify some of the other factors involved (Andrews & Williams,

1973). Another method of standardisation is logarithmic transformation, which removes scale effects by linearising the relationships between variables. This standardisation technique has been widely used in physical anthropological studies (Kurki, 2007; Mitteroecker et al, 2013; Wu et al, 2010, amongst others); however, some researchers have criticised it for making the analysis more indirect and the results more complicated to interpret (Hammer, 2002). Others have suggested that log transformations remove the investigator one more step from the biological relationships in the data or even obscure the relationships that are of interest (Jungers et al, 1995).

221

2.5.5 Discriminant function analysis and logistic regression

Logistic regression and discriminant function analysis are two multivariate statistical approaches that can be used to predict a binary categorical variable, such as sex, from a group of independent variables, such as skeletal measurements (Albanese, 2003). Often, the two techniques reveal the same patterns in a set of data; however, they do so in different ways and require different assumptions. That said, the multivariate strategy of forming a composite of weighted independent variables remains central, despite differences in the ways in which it is accomplished (Spicer, 2005: 123).

2.5.5.1 Overview of discriminant function analysis

Discriminant function analysis is a statistical procedure used to predict group membership from a set of predictor variables (Tabachnick & Fidell, 1996: 507). The primary goals of this procedure are to find the dimension or dimensions along which groups differ and to find classification functions to predict group membership (Tabachnick & Fidell, 1996: 509). A linear combination of independent variables is generated that maximises the probability of correctly assigning observations to predetermined groups (Quinn & Keough, 2002: 435–442), in this instance male and female. The function generated will consist of coefficients and a constant that were chosen to maximise the difference between the group centroids (Kinnear & Gray, 2009:

525–563). The more widely separated the group centroids relative to within-group variation, the more successful the function will be in predicting group membership. Two statistics serve as measures of the ability of the functions to discriminate between groups (Kinnear & Gray, 2009:

525–563):

1. Canonical correlation – the maximised correlation between the dependent (sex) and

independent (skeletal dimensions) variables

2. Eigenvalue – the measure of separation achieved by the discriminant function.

Mathematically, discriminant analysis is equivalent to Multivariate Analysis of Variance

(MANOVA); however, in MANOVA, the focus is on making comparisons rather than on predicting category or group membership (Kinnear & Gray, 2009: 525–563).

The major underlying assumptions of discriminant function analysis are (Burns & Burns,

2009: 591):

 The observations are a random sample

222

 Each predictor variable is normally distributed

 There must be at least two groups or categories, with each case belonging to only one

group so that the groups are mutually exclusive and collectively exhaustive (all cases

can be placed in a group)

 Each group or category must be well defined, clearly differentiated from any other

group(s) and natural

 The groups or categories should be defined before collecting the data

 The attribute(s) used to separate the groups should discriminate quite clearly between

the groups so that group or category overlap is clearly non-existent or minimal

 Group sizes of the dependent variable should not be grossly different.

There are three main types of discriminant analysis: direct, sequential, and stepwise (statistical).

In direct discriminant function analysis, all predictors (independent variables) enter the equations at once and each predictor is assigned only the unique association it has with groups

(Tabachnick & Fidell, 1996: 528). Sequential discriminant function analysis is used to evaluate contributions to prediction of group membership by predictors as they enter the equations in an order determined by the researcher. The researcher assesses improvement in classification when a new predictor is added to a set of prior predictors (Tabachnick & Fidell, 1996: 529).

When the researcher has no reasons for assigning some predictors higher priority than others, statistical criteria alone can be used to determine the order of entry. This is the basis of stepwise discriminant function analysis (Tabachnick & Fidell, 1996: 532). In this procedure a subset of variables is selected based on the squared partial correlation and the significance level from an analysis of covariance that has the greatest amount of discriminating ability

(Wescott, 2000). Entry of predictors is determined by changes in the value of Wilks’ lambda.

When a new variable is added, the value of Wilks’ lambda will increase indicating that the function is more effective in separating groups (Kinear & Gray, 2009: 528). The significance of the change in this statistic when a variable is entered or removed is obtained from an approximate F test. The two values of interest are known as ‘minimum partial F-to-enter’ and

‘maximum partial F-to-Remove’. A variable is added to the function if the value of F exceeds the

‘minimum partial F-to-Enter’ value. Similarly, a variable is excluded from the analysis if its F value falls below the ‘maximum partial F-to-Remove’ value (Kinear & Gray, 2009: 528). In the

223

present research, the values of ‘F-to-Enter’ and ‘F-to-Remove’ were set at 3.84 and 2.71, corresponding to probability values of 0.05 and 0.1, respectively (Kinnear and Gray, 2009: 525–

563). As a result of these statistical criteria, it is possible that the functions produced may consist of one predictor variable (skeletal dimension) only.

Although stepwise discriminant function analysis is the most popular way of running this procedure (Huberty, 1984), some researchers suggest that it should only be used in the context of descriptive discriminant analysis (an approach which attempts to discover which continuous variables contribute to the separation of groups; Salkind, 2010: 348). This is because separation, and not classification, statistics and criteria are used for entry and removal of variables (Huberty, 1984). Other authors have criticised the use of stepwise discriminant analysis because they think it is misleading to attempt to select a single “best” set of discriminating variables, and that even if a “best” set exists in some useful sense, stepwise methods are not guaranteed to find it (Baxter, 1994). Despite this, it is counter argued that stepwise discriminant analysis is the best approach when a researcher has a very large number of independent variables (Huberty, 1984), and no reasons for assigning some predictors higher priority than others (Tabachnick & Fidell, 1996: 532), both of which are true in the present study.

As such, the use of stepwise discriminant function analysis in this research is justified.

Classification is based on the classification coefficients derived from the sample under study and they usually work too well for the sample from which they were derived. Furthermore, bias enters the classification if the coefficients used to assign a case to a group are derived, in part, from the case itself (Tabachnick & Fidell, 1996: 545). The problem of producing over- optimistic results when developing a classification model and evaluating its statistical performance on the same data was first observed in the early 1930s (Larson, 1931) and led to the suggestion that testing the model on new data would yield a good estimate of its performance (Arlot & Celisse, 2010; Geisser, 1975). Thus, to gain a better understanding of how well classification coefficients generalise to a new sample of cases, validation and cross- validation procedures were developed (Tabachnick & Fidell, 1996: 544), both of which are based on the idea of data splitting. In most statistic-based studies, only a limited amount of data is available. A single data split yields two separate samples: a ‘training’ sample, used to develop the model, and a ‘hold-out’ or ‘validation’ sample, used to test the model. This approach to

224

validation is simple; however, it suffers from several limitations. For example, if the sample is small, using only a proportion of it to develop a model is a waste of valuable information

(Hawkins et al, 2003). Furthermore, the procedure can have a high variance given that it depends heavily on which data points end up in the training sample and which end up in the validation or hold-out sample; thus, the evaluation may be significantly different depending on how the division is made. By contrast, cross-validation involves multiple data splits (Arlot &

Celisse, 2010). For example, in leave-one-out cross-validation, a single sample of size n is used. Each member of the sample is removed in turn, the full modelling method is applied to the remaining n–1 members, and the fitted model is then applied to the hold-back or ‘left-out’ member (Browne, 2000; Hawkins et al, 2003). Early applications of this approach to classification were demonstrated by Mosier (1951) and Lachenbruch and Mickey (1968); however, it was Allen (1968) who presented one of the first applications in multiple regression, followed by Geisser (1975) in other procedures. A number of more recent studies recommend cross-validation when evaluating all predictive or classification models (Harrell et al, 1996), a recommendation that has been almost universally adopted by researchers reporting the accuracy rates associated with newly-created discriminant functions for sex estimation purposes

(Bongiovanni & Spradley, 2012; Case & Ross, 2007; Dabbs & Moore-Jansen, 2010; Franklin et al, 2013; Hassett, 2011; Macaluso, 2011; Ogawa et al, 2013; Tise et al, 2013, amongst many others).

To evaluate the discriminating ability of the variables selected using the stepwise discriminant function analysis procedure and to reduce bias, a ‘leave-one-out’ cross-validation classification procedure was used to gain an estimate of the classification accuracy rate associated with each function based on the reference sample. Compared with the standard classification method, the leave-one-out cross-validation procedure is thought to provide a more realistic estimate of the ability of the functions to separate groups (Cowal & Pastor, 2008).

2.5.5.2 Overview of logistic regression

Logistic regression is a statistical procedure that allows the prediction of group membership from a set of variables that may be continuous, discrete, dichotomous, or a mix. Logistic regression is related to and answers the same questions as discriminant function analysis; however, it is more flexible, requires fewer assumptions, and is more statistically robust (Burns

225

& Burns, 2009: 568; Tabachnick & Fidell, 1996: 575). The key assumptions of logistic regression are (Burns & Burns, 2009: 569):

 The dependent variable must be a dichotomous one

 The categories (groups) must be mutually exclusive and exhaustive; a case can only be

in one group and every case must be a member of one of the groups.

A linear relationship between the dependent and independent variables is not assumed, and the independent variables need not be intervallic, nor normally distributed, nor linearly related, nor of equal variance within each group.

As a flexible alternative to discriminant function analysis, the popularity of logistic regression is growing. However, when assumptions regarding the distributions of predictors are met, discriminant function analysis may be a more powerful and efficient analytic strategy

(Tabachnick & Fidell, 1996: 579). On the other hand, discriminant function analysis sometimes overestimates the size of the association with dichotomous predictors (Tabachnick & Fidell,

1996: 579).

Logistic regression analysis produces coefficients for each measurement included in a model as well as a constant. The logistic regression model takes the natural logarithm of the odds ratio (logit) as a regression function of the predictors (LaValley, 2008). In order to use this information to assess the sex of an individual, the logit or log odds is first calculated using the following formula:

Li = β0 + β1X1 + β2X2 … BnXn

where the logit (Li) is a linear function of the independent variable(s) X, β0 is the value for the constant, β1 is the first coefficient, X1 is the first measurement, and so on (Viciano et al, 2013).

The logistic model is an S-shaped function of the form:

P = 1 ÷ (1 + e-Li) where P is the probability of the event. Calculated probabilities are always between 0 and 1. If P is greater than 0.5, then the individual is considered male. If P is less than 0.5, the individual is considered female. For example, if P = 0.84, there is an 84% probability that the individual is male, and only a 16% (1 – P) probability that the individual is female (Albanese, 2003).

226

An important aspect of logistic regression is how well the model agrees with the observed data. This is called the goodness of fit of the model. The odds ratio values describe the model as it is applied to the data. If the model and the data are not in good agreement, then the odds ratios are not very meaningful (LaValley, 2008). Several authors have pointed out that although goodness of fit is crucial for the assessment of the validity of logistic regression results in medical research, it is often not included in published articles (Bagley et al, 2001; Bender &

Grouven, 1996; Hosmer et al, 1991). This, however, does not appear to be an issue in the physical anthropological literature (Acharya et al, 2011a; Urbanová et al, 2013; Viciano et al,

2013).

Goodness of fit is usually evaluated in two steps. The first step is to generate global measures of how well the model fits the whole set of observations; the second step is to evaluate individual observations to see whether any are problematic for the regression model

(LaValley, 2008). In SPSS, goodness of fit is evaluated using the Hosmer–Lemeshow test. This is a chi-square test which calculates a c-statistic; the c-statistic measures how well the model can discriminate between observations at different levels of the outcome (LaValley, 2008; Paul et al, 2013). The minimum value of c is 0.5; the maximum is 1.0. According to the researchers who created this method, c-values of 0.7 to 0.8 show acceptable discrimination, values of 0.8 to

0.9 indicate excellent discrimination, and values of >0.9 show outstanding discrimination

(Hosmer & Lemeshow, 2000: 162). The Hosmer–Lemeshow test evaluates whether the logistic regression model is well calibrated so that probability predictions from the model reflect the occurrence of events in the data. Obtaining a significant result on the test would indicate that the model is not well calibrated, so the fit is not good (LaValley, 2008). Thus in the present research, logistic regression equations were rejected if the result of the Hosmer–Lemeshow goodness of fit test demonstrated a c-value below 0.7.

2.5.5.3 Discriminant function analysis versus logistic regression

Discriminant function analysis is the most commonly used approach in skeletal sex estimation methods, as evidenced by a huge body of literature presenting discriminant functions for use by other researchers. In comparison, logistic regression has been relatively underused for this purpose, despite being a very powerful statistical procedure (Albanese, 2003; Viciano et al,

227

2013). Table 2.5A provides a summary of previous studies using discriminant function analysis or logistic regression in metric sex estimation methods.

Table 2.5A: Summary of studies using discriminant function analysis or logistic regression in metric sex estimation methods.

Discriminant function analysis Logistic regression

Bone Studies Bone Studies

Skull Franklin et al, 2013; Gapert et al, Dentition Acharya et al, 2011a; 2009; Giles & Elliot, 1963; Giles, Viciano et al, 2013 1964; Kajanoja, 1966; Kalmey & Rathbun, 1996; Ogawa et al, 2013; Patil & Mody, 2005; Walker, 2008

Dentition Hassett, 2011; Macaluso, 2011 Os coxa and Albanese, 2003; femur Albanese et al, 2008

Vertebrae Wescott, 2000 Os coxa Karakas et al, 2013; Klales et al, 2012

Hyoid Kindschuh et al, 2010 Long bones Holland; 1991; Saunders & Hoppa, 1997

Ribs Çöloğlu et al, 1998 Metacarpals Scheuer & Elkington, 1993

Sternum Bongiovanni & Spradley, 2012 Tarsals Harris & Case, 2012

Os coxa Schulter-Ellis et al, 1983, 1985

Sacrum Flander, 1978

Scapula Dabbs, 2010; Dabbs & Moore- Jansen, 2010

Long bones – Holman & Bennett, 1991; Kranioti & upper limb Michalodimitrakis, 2009; Mall et al, 2001; Purkait, 2001; Rίos Frutos, 2005

Long bones – İşcan & Miller-Shaivitz, 1984b; lower limb Raxter, 2007; Seidemann et al, 1998

Metacarpals Falsetti, 1995

228

Discriminant function analysis Logistic regression

Bone Studies Bone Studies

Metatarsals Robling & Ubelaker, 1997

Hand and foot Case & Ross, 2007 bones

Carpals Mastrangelo et al, 2011a, 2011b

Tarsals Bidmos & Asala, 2003; Bidmos & Dayal, 2003

Despite the apparent popularity of discriminant function analysis, there are a number of reasons why logistic regression may be a better choice for use in studies wishing to predict a binary dependent variable (such as sex). Firstly, discriminant function analysis works under the assumption that the independent variables are normally distributed (Acharya et al, 2011a;

Albanese, 2003; Pohar et al, 2004; Press & Wilson, 1978; Viciano et al, 2013). Although skeletal metric data are usually normally distributed when samples are large, logistic regression does not require that this assumption is met to optimise prediction accuracy and categorical variables can be used as independent variables along with metric data (Albanese, 2003;

Viciano et al, 2013). Secondly, logistic regression does not require equal variance-covariance matrices in the two groups (male and female), a condition that is necessary for discriminant function analysis (Albanese, 2003; Viciano et al, 2013). Therefore, logistic regression analysis is considered superior to discriminant function models by some authors because it is more flexible in its assumptions. Furthermore, even when discriminant function analysis satisfies the required assumptions, logistic regression still performs well (Acharya et al, 2011a; Albanese, 2003;

Viciano et al, 2013). An additional major benefit of logistic regression is that the probability of the event is calculated. Separate posterior probability statistics must be considered for an analogous approach when using discriminant function analysis; however, this approach is not undertaken in most discriminant function sex estimation methods (Albanese, 2003).

When choosing between logistic regression and discriminant function analysis, some authors have suggested that if the sample data are normally distributed with equal covariance matrices, discriminant function analysis should always be used (Pohar et al, 2004; Press &

Wilson, 1978). Tests have demonstrated that discriminant function analysis is a consistently

229

better method than logistic regression when the normality assumptions are met; however, the differences between the methods become negligible with a sample size of 50 or more, when the methods allocate differently only about 0.5% of the cases (Pohar et al, 2004). In instances where the independent variables are not normally distributed, the use of discriminant function analysis is theoretically wrong, because one of its key underlying assumptions has been violated. Decisions regarding whether to use discriminant function analysis or logistic regression may additionally be based on practical issues, for example sample size. It is generally recommended that the sample size in discriminant function analysis should provide at least 20 cases for each independent variable. A sample size smaller than 20 cases can result in discriminant coefficients that are not stable across samples and are therefore not trustworthy

(Spicer, 2005: 146). It is also recommended that the smallest group size in the dependent variable category be at least 20, with an absolute minimum greater than the number of independent variables (Spicer, 2005: 146). The sample size required for logistic regression is typically greater than that required for discriminant function analysis; usually a minimum of 50 cases per independent variable is recommended (Spicer, 2005: 134).

A number of studies in the physical anthropological literature that compared the predictive powers of logistic regression and discriminant function analysis for sex estimation purposes found the former to be superior to the latter (Acharya et al, 2011a; Urbanová et al,

2013). To explore these findings further, the present study employed both logistic regression and discriminant function analysis to develop metric sex estimation equations. This practice of using both statistics is recommended by some authors because the techniques are derived in different ways and function independently of one another (Marino, 1995). Using these methods, the collected data were analysed in different ways in order to generate equations with the greatest possible classification accuracy: as a complete sample, by principal time period, by location (Upper or Lower Egypt), and by skeletal region, for example skull, bony pelvis, upper limb, or lower limb. Equations were rejected if they did not meet the minimum sample size requirements outlined above.

230

2.6 Test sample and methods

In forensic anthropological contexts, expert testimony which draws on methods of untested accuracy is generally considered inadmissible in a court of law (Fienberg, 1997: 101–123;

Good, 2001). Though not explicitly stated in the physical anthropological literature, it is the general consensus of experts in the field that the accuracy of newly-created, population-specific metric sexing methods should be tested on samples other than those upon which they were created (Buckberry, Jo; Personal Communication, 2012). For these reasons, collaboration was established with a departmental colleague who has access to a cemetery collection of human skeletal remains from Old Kindom and Ptolemaic Period Saqqara (Figure 2.6A) through her work as the osteologist on a project run by the Polish Centre of Mediterranean Archaeology at the University of Warsaw. The primary aim of this test was therefore to test the accuracy of metric sex estimation equations developed as part of this PhD project (see Section 3.3.2) on a different sample.

231

Figure 2.6A: Map of ancient Egypt showing the location of Saqqara (Adapted by JJL from: http://oi.uchicago.edu/research/lab/map/maps/egypt.html, with additions/amendments. Accessed February

2013).

2.6.1 Test sample

The skeletal remains of adult individuals used in the test sample were recovered from the late

Old Kingdom (Fifth–Sixth Dynasties; c. 2494–2181 BC) and Ptolemaic Period (c. 332–30 BC)

232

cemeteries at Saqqara, located immediately west of the Step Pyramid complex of Pharaoh

Djoser (c. 2667–2648 BC). Presently located around 40 kilometres south-west of the capital city of Cairo, the Saqqara necropolis was one of several extensive cemetery sites (including neighbouring Giza) that served the population of the ancient Memphite region as burial grounds from the Early Dynastic (c. 3000–2686 BC) to the Byzantine Period (AD 395–641). To date, the

Saqqara-West site, which represents only a fraction of the entire Saqqara necropolis, has yielded 650 burials uncovered during excavation works conducted annually since 1996 by a team from the Polish Centre of Mediterranean Archaeology, University of Warsaw (Myśliwiec et al, 2004; Myśliwiec, 2008; Myśliwiec et al, 2010; Kuraszkiewicz, In Press).

The sample consists of 119 adult skeletons, some of which also demonstrate preserved soft tissue, excavated and examined between 2006 and 2012 by Iwona Kozieradzka-

Ogunmakin (IK-O; University of Manchester) at the Saqqara-West necropolis. The sex of nine individuals was positively identified as male by the presence of mummified external genitalia and/or well-preserved soft tissue of the face exhibiting facial hair. These individuals were classified as the ‘known sex’ subsample. The remaining 110 individuals were confidently sexed using non-metric identification of sex-specific characteristics of the bony pelvis and skull.

Features examined on the bony pelvis included the presence or absence of a ventral arc, subpubic concavity and preauricular sulcus, the width of the sciatic notch, and subpubic angle, and the shape and size of the sacrum (Buikstra & Ubelaker, 1994). Cranial features examined included the form of the supraorbital ridges, glabella, and orbits, the mandibular angle, and the rugosity of the occipital region (Buikstra & Ubelaker, 1994). The size of the mastoid processes was also considered, although care was taken with this indicator given that it tends to be inflated in ancient Egyptian populations (Zakrzewski, 2003; 2007). As previously discussed, there is a certain degree of error associated with morphological estimates of sex. However, all of the

‘known sex’ individuals were assigned their correct sex based on consideration of the skeletal indicators listed above, suggesting a high degree of accuracy in the estimates of sex assigned to the test sample as a whole. Age estimations were primarily based on pubic symphyseal and auricular surface changes, and further supplemented by the degree of dental attrition and cranial suture closure. Skeletal elements exhibiting pathological changes were not included. The majority of individuals (n=91) date to the Ptolemaic Period; 28 individuals date to the late Old

233

Kingdom. Table 2.6A provides a summary of the Saqqara-West sample, showing the number of male and female individuals from the two different time periods.

Table 2.6A: Summary of the Saqqara-West necropolis skeletal sample.

Time period

Sex Old Kingdom Ptolemaic Period Total

Male 19* 62† 81

Female 9 29 38

Total 28 91 119

*Of these, one individual was of known sex due to the preservation of soft tissue. †Of these, eight individuals were of known sex thanks to the preservation of soft tissue.

2.6.2 Test methods

The inter-observer error test between IK-O and the present author, described in Section 2.3.2, demonstrated equivalence of the two researchers both in terms of how morphological sex was assigned to isolated crania and ossa coxae (see Section 3.2.2.1), and how metric measurements were collected (mean per cent error did not exceed the critical value of 15% for any of the dimensions tested. Two dimensions exceeded the critical level of 7.5% for per cent

TEM. These were HML (9.43%) and PL (8.66%); however, these dimensions were not included in the functions/equations used in this blind test). Section 3.3.2 presents the complete set of inter-observer error test results. The test of metric sex estimation functions and equations developed by the present author using data collected by IK-O on a different sample was therefore considered fair and unbiased. It must be noted, however, that the sample sizes of some skeletal elements were very small, which may suggest that the results are not scientifically robust. Any further research performed in collaboration with IK-O will require inter- observer error test using a large sample size.

Metric data were collected as part of the skeletal analysis performed on site at Saqqara by IK-O. This test utilised five dimensions of the cranial and post-cranial skeleton. These dimensions were selected for two reasons:

234

1. They were included in the discriminant functions and logistic regression equations

developed using data from the present study sample and are therefore required to test

the functions and equations

2. They were common to the two pieces of research conducted independently by IK-O and

the present author.

Definitions of the following dimensions are given in Table 2.2K: glabello-occipital length of cranium (GO), maximum width of cranium (MW), maximum bizygomatic diameter of cranium

(DB), femoral head diameter (FHD), and proximal epiphyseal breadth of tibia (PEB). Testing of the discriminant equations and logistic regression equations followed the same procedures described previously. A summary of the discriminant functions and logistic regression equations tested, as well as the results of the test, are given in Section 3.5.

235

3 RESULTS

3.1 Descriptive demographics of sample

3.1.1 Sample size and sex distribution

A total of 318 individuals were included in the study sample. Of these, 162 individuals were represented by both cranial and post-cranial material, while 156 individuals were represented by cranial material only. Table 3.1A presents the sex distribution of the sampled skeletons broken down by geographic location and time period.

Table 3.1A: The sex distribution of the Egyptian samples included in the study.

Collection Region Period Number of skeletons

Total Male Female

Peabody Museum Keneh Predynastic 43 21 22

Peabody Museum Giza Old Kingdom 106 69 37 & NHM, Vienna

Sheikh Farag Middle Kingdom 13 7 6 Peabody Museum Thebes 20th Dynasty 2 1 1

Duckworth Lab. Giza Late Period 154 85 69

Total, n (%) 318 (100) 183 (57.5) 135 (42.5)

NHM, Natural History Museum. Lab., laboratory.

Sex estimates were based on pelvic and/or cranial morphology, as described previously. Table

3.1B provides a summary of the number of individuals whose skeletal remains included at least one os pubis, pelvic material (at least one os coxa plus the sacrum, at least one os coxa only, or the sacrum only), or cranial material (the skull, the cranium only, or the mandible only).

236

Table 3.1B: Frequency of skeletons with pelvic or cranial material in the study sample.

Pubic bones Pelvic material Cranial material

Sample ≥1 Absent ≥1 os coxa ≥1 os Sac. Absent Skull Cran. Mand. Absent + sac. coxa only only only only

Complete 101 61 110 28 6 18 165 140 3 10 (n=318)

Predyn. 21 22 24 4 6 9 18 14 3 8 Keneh (n=43)

OK Giza 66 38 71 24 0 9 90 14 0 2 (n=106)

MK Sheikh 12 1 13 0 0 0 10 3 0 0 Farag (n=13)

20th D 2 0 2 0 0 0 2 0 0 0 Thebes (n=2)

LP Giza N/A N/A N/A N/A N/A N/A 45 109 0 0 (n=154)

Predyn., Predynastic Period; OK, Old Kingdom; MK, Middle Kingdom; 20th D, 20th Dynasty; LP, Late Period; sac., sacrum; cran., cranium; mand., mandible; N/A, not applicable.

Table 3.1B demonstrates that for the subsample of skeletons represented by both cranial and post-cranial material there were very few individuals for whom no pelvic material was available for sex estimation (n=18). This included individuals whose skeletal remains lacked pubic bones, ossa coxae and a sacrum, or individuals who only had very small or highly fragmented pieces of these bones from which an estimate of sex could not be made. For these individuals, as with the crania from the Late Period Giza sample, sex was estimated on the basis of cranial morphology alone.

3.1.2 Age at death estimation

Table 3.1C provides a summary of the age at death demographics of the complete skeletal population sampled in the study, and broken down by principal time periods.

237

Table 3.1C: Age distribution of study sample.

Sample Age category (years)

Young adult Middle adult Old adult Indeterminate

(20–34) (35–49) (50+)

Complete, n 93* 189‡ 28 8

Predynastic Period 14 25 4 0 Keneh, n

Old Kingdom Giza, n 37 52 14 3

Late Period, n 38 101 10 5

*Includes 4 individuals from Sheikh Farag; ‡Includes 9 individuals from Sheikh Farag and 2 individuals from Thebes.

As can be seen in Table 3.1C, the majority of individuals in the complete sample (n=189) were estimated to be middle adults (aged between 35 and 49 years) at their time of death; the age category assigned least frequently was old adult (aged 50 years or above at the time of death).

This pattern is additionally observed when the sample is broken down into principal time periods. Overall, a total of 8 individuals could not be assigned an age at death using the methods described in Section 2.4 as a result of insufficient skeletal evidence or damage to the skeletal age indicators examined.

3.1.3 Exploration of data

3.1.3.1 Outliers and extreme scores

Stem-and-leaf boxplots of each of the 63 dimensions included in the study were used to identify the presence of measurements considered to be outliers or extreme scores. In graphs of this type, the box represents the portion of the distribution falling between the 25th and 75th percentiles (lower and upper quartiles). The whiskers of the plot connect the largest and smallest values that are not outliers or extreme scores (Kinnear & Gray, 2009: 113–114). An outlier, denoted by o, is defined as a value more than 1.5 box-lengths away from the box; an extreme score, denoted by *, is more than 3 box-lengths away from the box (Kinnear & Gray,

2009: 113–114). Figure 3.1A provides an example of the structure of a boxplot with both outliers and extreme scores indicated.

238

Figure 3.1A: Boxplot for transverse breadth of tibia (TB) showing three measurements considered to be outliers (two male and one female) and one male measurement considered to be an extreme score. The outliers and extreme score are labelled with skeleton ID numbers.

In total, 118 individual measurements were considered to be outliers. A further 14 measurements (of the cranium [BB, PN, PB, OF; n=8], tibia [TB; n=1], humerus [EWH; n=3], os coxa [HSN; n=1], and scapula [BCB; n=1]) were identified as extreme scores. Several sources of variation may cause outliers and extreme scores:

1. Measurement error – error that resulted from using an incorrect skeletal landmark to

measure a particular dimension. Measurement error, which was evaluated in a test of

intra-observer error, was not deemed to be sufficiently great to account for the

generation of outliers or extreme scores (see Section 3.2).

2. Human error – error that occurred as a result of reading the measurement from the

equipment used incorrectly, or error that was introduced during the recording of

measurements on the recording forms and/or transposition of measurements to

Microsoft Excel spreadsheets.

239

3. Abnormal specimens in the study sample – juvenile and pathological specimens

(including those from the Duckworth Laboratory collection previously identified as adult

“pygmies”) were not included in the study sample; therefore, measurements considered

to be outliers or extreme scores would most likely be the result of inclusion in the sample

of individuals who were unusually large or small, or who were of non-Egyptian origin.

The latter situation could have occurred as the result of misattribution of provenance by

museum curators, or as a result of migration of non-Egyptians into Egypt.

After careful consideration of these factors and the measurement values identified as outliers and extreme scores, one measurement of the latter category (palate breadth, PB) was removed from the data set as it was deemed to be an anatomically impossible measurement and therefore most likely the result of human error. To reflect natural human and sample variation, all other outliers and extreme scores were included in the data set and subsequent statistical analyses, unless it was necessary to remove them.

3.1.3.2 Distribution of data

The Shapiro-Wilk test was used to test each variable (skeletal dimension) for normality. Missing values were excluded on a listwise basis, which means that cases (individual skeletons in the study sample) were excluded from the analysis if a single variable was missing. To adjust for multiple testing, the alpha level of 0.05 was divided by the number of tests being performed

(n=63, corresponding with the number of skeletal variables) to give a P-value of 0.0008.

Variables with a P-value below 0.0008 were considered to be non-normally distributed. The results of the Shapiro-Wilk test are given in Table 3.1D. This table further presents skewness and kurtosis values. Skewness relates to the symmetry of a distribution; a skewed variable is one whose mean is not in the centre of the distribution (Meyers et al, 2006: 48; Tabachnick &

Fidell, 1996: 71). Kurtosis relates to the clustering of scores toward the centre of a distribution, or in other words, the ‘peakedness’ of a distribution (Meyers et al, 2006: 48; Tabachnick &

Fidell, 1996: 71). When a distribution is normal, the values of skewness and kurtosis are zero.

There are a number of different opinions as to what is an unacceptable level of skewness and kurtosis (Meyers et al, 2006: 48). Some authors suggest that skewness and kurtosis values are acceptable if they fall within the arbitrary range of –1.00 to +1.00 (Meyers et al, 2006: 50), while others suggest a more definitive assessment strategy for detecting deviations from normality by

240

comparing a skewness or kurtosis value to its respective standard error (Kim, 2013; Meyers et al, 2006: 50; Tabachnick & Fidell, 1996: 73). For example, if the value of skewness or kurtosis is twice its standard error, the distribution differs significantly (with >95% probability) from the normal distribution (Miles & Shevlin, 2001: 74). Variables were therefore considered to be non- normally distributed if they met the following criteria:

 Shapiro-Wilk significance level P<0.0008

 Skewness or kurtosis values 3.5 times their standard error (to adjust for multiple

testing).

The complete set of results for the normality tests is given in Table 3.1D.

Table 3.1D: Results of the Shapiro-Wilk normality test, and skewness and kurtosis values for each variable.

Dimension Valid Shapiro-Wilk Skewness Kurtosis cases, n

Statistic DF Sig. Value SE Value SE

GO 275 0.994 275 0.421 0.14 0.15 -0.32 0.29

MW 271 0.996 271 0.765 0.03 0.15 0.38 0.30

BB 246 0.974 246 0.0002 -0.32 0.16 1.75 0.31

DB 199 0.986 199 0.050 -0.16 0.17 -0.68 0.34

PN 223 0.965 223 0.00003 0.72* 0.16 3.20* 0.32

BN 242 0.994 242 0.528 0.03 0.16 -0.29 0.31

BP 221 0.994 221 0.564 0.09 0.16 -0.39 0.33

NB 226 0.984 226 0.014 0.42 0.16 0.86 0.32

PB 202 0.982 202 0.012 -0.49 0.17 0.95 0.30

OF 240 0.855 240 <0.00001 -2.10* 0.16 8.62* 0.31

ML 288 0.994 288 0.258 -0.27 0.14 0.14 0.29

XSL 83 0.987 83 0.565 0.32 0.26 0.52 0.52

XDH 103 0.987 103 0.388 0.35 0.24 0.30 0.47

DSD 113 0.995 113 0.941 -0.10 0.23 0.10 0.45

DTD 113 0.990 113 0.556 0.22 0.23 0.17 0.45

LVF 110 0.988 110 0.414 0.24 0.23 0.62 0.46

SFB 88 0.982 88 0.245 -0.28 0.26 -0.35 0.51

241

Dimension Valid Shapiro-Wilk Skewness Kurtosis cases, n

Statistic DF Sig. Value SE Value SE

SFS 99 0.979 99 0.114 0.31 0.24 0.65 0.48

SFT 104 0.991 104 0.697 -0.21 0.24 0.49 0.47

FHD 136 0.984 136 0.106 -0.03 0.21 -0.72 0.41

FND 143 0.987 143 0.208 0.23 0.20 -0.38 0.40

FSC 141 0.985 141 0.128 -0.07 0.20 -0.70 0.41

XFL 112 0.989 112 0.481 0.24 0.23 -0.23 0.45

FTD 142 0.991 142 0.524 -0.05 0.20 -0.33 0.40

EBF 97 0.987 97 0.429 0.05 0.25 -0.45 0.49

TL 114 0.986 114 0.301 0.04 0.23 -0.13 0.45

CNF 141 0.988 141 0.248 0.09 0.20 -0.52 0.41

MSC 135 0.984 135 0.117 0.001 0.21 -0.74 0.41

APD 141 0.982 141 0.063 -0.14 0.20 -0.78 0.41

TB 141 0.935 141 <0.00001 1.15* 0.20 4.16* 0.41

PEB 92 0.957 92 0.004 -0.40 0.25 -0.42 0.50

DEB 125 0.993 125 0.818 0.16 0.22 -0.20 0.43

HHD 132 0.987 132 0.268 0.06 0.21 -0.63 0.42

XHL 112 0.989 112 0.513 -0.10 0.23 -0.20 0.45

EWH 145 0.982 145 0.059 -0.22 0.20 -0.24 0.40

XRL 112 0.988 112 0.397 -0.09 0.23 -0.56 0.45

RSBB 119 0.989 119 0.427 0.12 0.22 -0.25 0.44

MAXD 68 0.979 68 0.289 -0.28 0.29 -0.64 0.57

MIND 63 0.986 63 0.708 -0.06 0.30 -0.49 0.60

XUL 96 0.980 96 0.139 -0.04 0.25 -0.35 0.49

USBB 103 0.985 103 0.318 0.37 0.24 0.07 0.47

IAL 102 0.993 102 0.903 -0.08 0.24 -0.33 0.47

BML 102 0.990 102 0.670 -0.12 0.24 -0.41 0.47

BAP 100 0.995 100 0.965 -0.01 0.24 -0.29 0.48

HML 102 0.992 102 0.828 -0.03 0.24 -0.32 0.47

242

Dimension Valid Shapiro-Wilk Skewness Kurtosis cases, n

Statistic DF Sig. Value SE Value SE

HAP 100 0.991 100 0.735 0.01 0.24 -0.29 0.48

MS 103 0.994 103 0.953 0.05 0.24 -0.33 0.47

L 115 0.991 115 0.700 0.05 0.23 -0.49 0.45

SIH 115 0.990 115 0.556 0.03 0.23 -0.18 0.45

MLH 112 0.991 112 0.682 -0.06 0.23 -0.47 0.45

SIB 113 0.993 113 0.821 -0.001 0.23 -0.30 0.45

MLB 88 0.990 88 0.728 0.23 0.26 -0.05 0.51

MSD 115 0.984 115 0.190 0.03 0.23 -0.77 0.45

IL 62 0.980 62 0.387 -0.12 0.30 -0.74 0.60

PL 54 0.976 54 0.355 -0.31 0.33 -0.28 0.64

HSN 122 0.957 122 0.001 -0.94 0.22 2.65 0.44

ASB 125 0.989 125 0.453 -0.25 0.22 -0.36 0.43

XCL 95 0.981 95 0.169 -0.31 0.25 -0.42 0.49

XHS 27 0.964 27 0.450 0.30 0.45 -0.57 0.87

XLS 48 0.980 48 0.585 -0.11 0.34 -0.42 0.67

BXB 55 0.986 55 0.746 -0.27 0.32 -0.20 0.63

HAX 118 0.965 118 0.003 -0.30 0.22 -0.81 0.44

BCB 119 0.986 119 0.242 0.05 0.22 -0.72 0.44

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. DF, degrees of freedom;

Sig., significance; SE, standard error. *Non-nornally distributed.

Based on the criteria outlined above, the data for four skeletal variables (basion-begma height,

BB; prosthion-nasion height, PN; and opisthion-forehead length, OF of the cranium; and transverse breadth of tibia, TB) were found to be non-normally distributed based on the Shapiro-

Wilk test and skewness/kurtosis values. Data transformation, a mathematical procedure that can be used to modify variables that violate the statistical assumption of normality, was performed using the log10 function in SPSS; however, this had no effect on the distribution of the data for two of the variables (BB and OF). The decision was therefore made to remove the individual measurements of BB, PN, OF, and TB considered to be outliers or extreme scores

243

(n=13 in total). Subsequent normality tests demonstrated that the data for all four variables were normally distributed, as indicated by a Shapiro-Wilk significance level of 0.038, 0.503, 0.054, and 0.553 for BB, PN, OF, and TB, respectively.

3.1.3.3 Descriptive statistics of skeletal dimensions

Table 3.1E provides a summary of the descriptive statistics for each of the 63 skeletal dimensions included in the study. In this table, N denotes the number of individuals for whom the dimension could be measured. The minimum and maximum values for the dimension are additionally provided, as well as the overall mean, and the male and female means, for each dimension. Definitions of the acronyms used are available in Table 2.2K.

Table 3.1E: Descriptive statistics of skeletal dimensions.

Dimension N Range Min. Max. (mm) Mean* M mean F mean (mm) (mm) (mm) (mm)

GO 275 37.50 164.70 202.20 182.39 186.67 176.78

MW 271 34.60 121.90 156.50 138.34 140.47 135.58

BB 245 26.10 121.10 147.20 132.99 135.39 129.94

DB 199 28.80 111.00 139.80 125.85 129.80 121.05

PN 220 21.30 58.80 80.10 69.38 71.32 66.96

BN 242 22.70 89.10 111.80 100.13 102.67 96.87

BP 221 24.10 81.30 105.40 93.18 95.14 90.72

NB 226 12.22 19.08 31.30 24.80 25.25 24.21

PB 202 22.36 48.70 71.06 60.94 62.08 59.48

OF 234 27.40 130.40 157.80 145.00 147.80 141.39

ML 288 23.90 17.80 41.70 31.41 33.62 28.46

XSL 83 17.35 38.80 56.15 46.79 48.12 45.15

XDH 103 13.70 30.30 44.00 36.00 37.11 34.44

DSD 113 5.07 8.30 13.37 10.97 11.33 10.49

244

Dimension N Range Min. Max. (mm) Mean* M mean F mean (mm) (mm) (mm) (mm)

DTD 113 4.36 8.10 12.46 9.97 10.20 9.66

LVF 110 7.78 11.90 19.68 15.43 15.45 15.39

SFB 88 10.29 38.70 48.99 44.01 45.06 42.51

SFS 99 6.80 15.30 22.10 17.77 18.16 17.18

SFT 104 8.24 13.16 21.40 16.54 17.11 15.75

FHD 136 16.30 35.10 51.40 42.56 44.86 39.48

FND 143 14.60 23.50 38.10 30.04 31.80 27.47

FSC 141 32.00 69.00 101.00 84.72 89.00 78.21

XFL 112 133.00 382.00 515.00 438.57 454.33 417.56

FTD 142 10.59 19.20 29.79 24.63 25.61 23.11

EBF 97 25.50 63.30 88.80 74.34 77.33 69.69

TL 114 121.00 322.00 443.00 370.23 383.13 351.15

CNF 141 41.00 72.00 113.00 90.94 95.83 83.29

MSC 135 29.00 58.00 87.00 72.41 76.23 66.33

APD 141 16.70 23.00 39.70 32.07 34.17 28.80

TB 138 10.80 15.80 26.60 21.52 22.54 20.01

PEB 92 27.80 53.60 81.40 69.10 72.39 63.97

DEB 125 17.50 34.50 52.00 42.43 44.14 39.60

HHD 132 17.89 33.00 50.89 41.42 43.83 38.15

XHL 112 95.00 265.00 360.00 308.44 319.58 293.02

EWH 145 28.12 43.21 71.33 58.18 61.17 54.18

XRL 112 82.00 200.00 282.00 241.32 251.64 226.52

RSBB 119 11.19 23.87 35.06 28.71 30.00 26.81

245

Dimension N Range Min. Max. (mm) Mean* M mean F mean (mm) (mm) (mm) (mm)

MAXD 68 8.60 16.80 25.40 21.52 22.79 19.47

MIND 63 8.35 16.25 24.60 20.15 21.37 18.40

XUL 96 74.00 226.00 300.00 260.99 270.82 247.23

USBB 103 7.91 12.40 20.31 16.11 16.82 15.07

IAL 102 14.11 36.97 51.08 43.99 45.60 41.51

BML 102 6.66 11.70 18.36 14.96 15.73 13.77

BAP 100 8.20 10.50 18.70 14.53 15.14 13.53

HML 102 6.73 10.37 17.10 13.75 14.51 12.56

HAP 100 5.58 10.00 15.58 12.74 13.29 11.91

MS 103 5.90 8.30 14.20 11.20 11.80 10.25

L 115 17.89 52.40 70.29 60.77 62.50 57.64

SIH 115 8.00 14.19 22.19 18.15 18.84 16.89

MLH 112 8.91 16.00 24.91 20.21 21.15 18.50

SIB 113 11.29 21.40 32.69 27.35 28.39 25.52

MLB 88 9.20 14.20 23.40 18.42 19.10 17.09

MSD 115 6.68 9.70 16.38 12.96 13.58 11.83

IL 62 20.83 64.90 85.73 75.66 78.53 71.41

PL 54 25.60 72.60 98.20 85.83 85.32 86.77

HSN 122 32.03 30.60 62.63 51.53 52.87 49.22

ASB 125 16.40 26.50 42.90 35.54 37.13 32.81

XCL 95 49.00 119.00 168.00 144.65 150.29 136.56

XHS 27 50.50 120.50 171.00 141.89 150.79 130.76

XLS 48 41.40 111.50 152.90 134.17 139.79 123.92

246

Dimension N Range Min. Max. (mm) Mean* M mean F mean (mm) (mm) (mm) (mm)

BXB 55 35.93 72.40 108.33 92.95 97.07 85.15

HAX 118 13.51 28.79 42.30 37.15 38.97 34.20

BCB 119 11.27 20.18 31.45 26.14 27.58 23.78

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. *Overall mean for variable.

Min., minimum; Max., maximum; M mean, male mean; F mean, female mean.

3.1.3.4 Variability statistics of skeletal dimensions

Table 3.1F provides a summary of the variability statistics for each of the 63 skeletal dimensions included in the study. Variability statistics for each of the skeletal dimensions broken down according to sex, time period or cemetery site are given in Appendix 7.3. The standard error of the mean is a measure of the deviation of a sample mean from the population mean, and can be calculated using the equation:

SE of mean = SD ÷ (√n), where SD is the standard deviation and n is the sample size.

The variance is the standard deviation squared, and the 95% confidence interval defines the limits between which the population mean is likely to be found 95% of the time. The lower and upper confidence limits are calculated using the equation:

Mean ± (1.96 x SE of the mean), where SE is the standard error.

Table 3.1F: Variability statistics of the skeletal dimensions included in the study.

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 275 182.39 0.44 7.26 52.66 181.53 183.25

MW 271 138.34 0.35 5.70 32.48 137.66 139.02

BB 245 132.99 0.36 5.58 31.08 132.29 133.69

DB 199 125.85 0.44 6.18 38.21 124.98 126.71

PN 220 69.38 0.29 4.27 18.20 68.81 69.94

247

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

BN 242 100.13 0.30 4.68 21.90 99.53 93.81

BP 221 93.18 0.32 4.81 23.11 92.54 25.06

NB 226 24.80 0.13 1.99 3.97 24.54 25.06

PB 202 60.94 0.25 3.50 12.26 60.46 61.43

OF 234 145.00 0.37 5.59 31.30 144.28 145.73

ML 288 31.41 0.25 4.21 17.76 30.93 31.90

XSL 83 46.79 0.35 3.18 10.15 46.10 47.49

XDH 103 36.00 0.27 2.75 7.56 35.46 36.53

DSD 113 10.97 0.09 0.96 0.92 10.79 11.15

DTD 113 9.97 0.08 0.80 0.65 9.82 10.12

LVF 110 15.43 0.14 1.44 2.08 15.15 15.70

SFB 88 44.01 0.25 2.35 5.51 43.52 44.51

SFS 99 17.77 0.13 1.26 1.58 17.52 18.02

SFT 104 16.54 0.14 1.45 2.11 16.25 16.82

FHD 136 42.56 0.30 3.54 12.55 41.96 43.17

FND 143 30.04 0.26 3.12 9.74 29.52 30.56

FSC 141 84.72 0.60 7.16 51.31 83.52 85.91

XFL 112 438.57 2.70 28.54 814.79 433.23 443.92

FTD 142 24.63 0.18 2.19 4.82 24.26 24.99

EBF 97 74.34 0.51 5.08 25.85 73.31 75.36

TL 114 370.23 2.25 23.97 574.59 365.78 374.68

CNF 141 90.94 0.75 8.92 79.50 89.45 92.42

248

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

MSC 135 72.41 0.58 6.78 45.95 71.26 73.57

APD 141 32.07 0.31 3.68 13.56 31.46 32.69

TB 138 21.53 0.18 2.09 4.39 21.18 21.88

PEB 92 69.10 0.55 5.23 27.38 68.01 70.18

DEB 125 42.43 0.31 3.50 12.22 41.81 43.05

HHD 132 41.42 0.34 3.85 14.84 40.76 42.09

XHL 112 308.44 1.87 19.82 392.83 304.73 312.15

EWH 145 58.18 0.42 5.10 26.01 57.35 59.02

XRL 112 241.32 1.65 17.45 304.36 238.05 244.59

RSBB 119 28.71 0.21 2.32 5.40 28.29 29.14

MAXD 68 21.52 0.26 2.13 4.53 21.00 22.03

MIND 63 20.15 0.24 1.94 3.78 19.66 20.63

XUL 96 260.99 1.71 16.76 280.83 257.59 264.39

USBB 103 16.11 0.16 1.62 2.62 15.79 16.42

IAL 102 43.99 0.29 2.90 8.41 43.42 44.56

BML 102 14.96 0.14 1.42 2.03 14.68 15.24

BAP 100 14.53 0.16 1.59 2.52 14.22 14.85

HML 102 13.75 0.14 1.37 1.88 13.48 14.02

HAP 100 12.74 0.12 1.20 1.43 12.50 12.98

MS 103 11.20 0.12 1.21 1.46 10.96 11.43

L 115 60.77 0.35 3.80 14.48 60.06 61.47

SIH 115 18.15 0.15 1.59 2.52 17.85 18.44

249

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

MLH 112 20.21 0.18 1.91 3.65 19.85 20.56

SIB 113 27.35 0.21 2.21 4.88 26.94 27.76

MLB 88 18.42 0.19 1.80 3.26 18.04 18.80

MSD 115 12.96 0.14 1.48 2.19 12.68 13.23

IL 62 75.66 0.68 5.33 28.45 74.30 77.01

PL 54 85.83 0.80 5.87 34.45 84.23 87.43

HSN 122 51.53 0.42 4.68 21.88 50.69 52.37

ASB 125 35.54 0.31 3.48 12.11 34.92 36.15

XCL 95 144.65 1.12 10.95 119.97 142.42 146.88

XHS 27 141.89 2.58 13.39 179.34 136.59 147.18

XLS 48 134.17 1.51 10.49 110.01 131.12 137.22

BXB 55 92.95 1.08 8.02 64.39 90.78 95.12

HAX 118 37.15 0.29 3.11 9.69 36.58 37.71

BCB 119 26.14 0.23 2.48 6.15 25.69 26.59

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

3.1.3.5 Comparison of means

The results of an independent samples t-test used to compare male and female means for each of the 63 skeletal dimensions included in the study are provided in Table 3.1G. To adjust for multiple statistical testing, a significance level of P<0.0008 was used.

250

Table 3.1G: Results of an independent samples t-test comparing male and female means for all 63 skeletal dimensions included in the study.

Dimension N for dimension Mean (mm) Levene’s Test* T-value P-value

Male Female Male Female F Sig.

GO 156 119 186.67 176.78 3.772 0.053 15.177 <0.0001

MW 153 118 140.47 135.58 0.350 0.554 7.739 <0.0001

BB 137 108 135.39 129.94 1.794 0.182 8.669 <0.0001

DB 109 90 129.80 121.05 0.057 0.812 14.014 <0.0001

PN 122 98 71.32 66.96 0.078 0.780 8.749 <0.0001

BN 136 106 102.67 96.87 0.272 0.603 12.112 <0.0001

BP 123 98 95.14 90.72 0.652 0.420 7.619 <0.0001

NB 129 97 25.25 24.21 0.305 0.581 4.026 <0.0001

PB 114 88 62.08 59.48 0.082 0.775 5.610 <0.0001

OF 132 102 147.80 141.39 5.669 0.018 10.279** <0.0001**

ML 165 123 33.62 28.46 0.222 0.638 12.885 <0.0001

XSL 46 37 48.12 45.15 0.001 0.973 4.737 <0.0001

XDH 60 43 37.11 34.44 3.385 0.069 5.520 <0.0001

DSD 65 48 11.33 10.49 0.508 0.478 5.042 <0.0001

DTD 65 48 10.20 9.66 4.221 0.042 3.918** <0.0001**

LVF 63 47 15.45 15.39 1.861 0.175 0.242 0.810

SFB 52 36 45.06 42.51 0.000 0.989 5.909 <0.0001

SFS 59 40 18.16 17.18 0.009 0.923 4.116 <0.0001

SFT 60 44 17.11 15.75 0.281 0.598 5.330 <0.0001

FHD 78 58 44.86 39.48 0.054 0.817 13.260 <0.0001

FND 85 58 31.80 27.47 2.528 0.114 11.111 <0.0001

251

Dimension N for dimension Mean (mm) Levene’s Test* T-value P-value

Male Female Male Female F Sig.

FSC 85 56 89.00 78.21 0.159 0.691 12.949 <0.0001

XFL 64 48 454.33 417.56 0.315 0.576 8.742 <0.0001

FTD 86 56 25.61 23.11 0.001 0.969 8.007 <0.0001

EBF 59 38 77.33 69.69 0.182 0.671 10.632 <0.0001

TL 68 46 383.13 351.15 0.274 0.602 9.233 <0.0001

CNF 86 55 95.83 83.29 0.819 0.367 11.182 <0.0001

MSC 83 52 76.23 66.33 1.630 0.204 11.745 <0.0001

APD 86 55 34.17 28.80 0.506 0.478 11.998 <0.0001

TB 83 55 22.54 20.01 1.568 0.213 8.576 <0.0001

PEB 56 36 72.39 63.97 0.966 0.328 12.238 <0.0001

DEB 78 47 44.14 39.60 0.158 0.692 9.047 <0.0001

HHD 76 56 43.83 38.15 0.027 0.871 12.217 <0.0001

XHL 65 47 319.58 293.02 1.648 0.202 9.323 <0.0001

EWH 83 62 61.17 54.18 0.587 0.445 11.089 <0.0001

XRL 66 46 251.64 226.52 0.030 0.864 10.616 <0.0001

RSBB 71 48 30.00 26.81 0.118 0.732 9.896 <0.0001

MAXD 42 26 22.79 19.47 0.657 0.421 9.582 <0.0001

MIND 37 26 21.37 18.40 0.037 0.849 9.155 <0.0001

XUL 56 40 270.82 247.23 0.016 0.899 9.446 <0.0001

USBB 61 42 16.82 15.07 4.100 0.046 6.672** <0.0001**

IAL 62 40 45.60 41.51 0.308 0.580 9.586 <0.0001

BML 62 40 15.73 13.77 0.025 0.875 9.171 <0.0001

252

Dimension N for dimension Mean (mm) Levene’s Test* T-value P-value

Male Female Male Female F Sig.

BAP 62 38 15.14 13.53 2.633 0.108 5.630 <0.0001

HML 62 40 14.51 12.56 0.229 0.633 9.764 <0.0001

HAP 60 40 13.29 11.91 0.168 0.683 6.857 <0.0001

MS 63 40 11.80 10.25 1.322 0.253 8.102 <0.0001

L 74 41 62.50 57.64 1.178 0.280 8.288 <0.0001

SIH 74 41 18.84 16.89 0.818 0.368 7.789 <0.0001

MLH 72 40 21.15 18.50 0.006 0.937 9.419 <0.0001

SIB 72 41 28.39 25.52 0.431 0.513 8.462 <0.0001

MLB 58 30 19.10 17.09 1.408 0.239 5.820 <0.0001

MSD 74 41 13.58 11.83 0.895 0.346 7.384 <0.0001

IL 37 25 78.53 71.41 0.270 0.605 6.803 <0.0001

PL 35 19 85.32 86.77 0.716 0.401 -0.867 0.390

HSN 77 45 52.87 49.22 0.038 0.845 4.477 <0.0001

ASB 79 46 37.13 32.81 0.974 0.326 8.323 <0.0001

XCL 56 39 150.29 136.56 1.959 0.165 7.611 <0.0001

XHS 15 12 150.79 130.76 0.462 0.503 5.800 <0.0001

XLS 31 17 139.79 123.92 0.600 0.442 7.273 <0.0001

BXB 36 19 97.07 85.15 0.194 0.661 7.401 <0.0001

HAX 73 45 38.97 34.20 0.701 0.404 12.110 <0.0001

BCB 74 45 27.58 23.78 2.558 0.112 12.065 <0.0001

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. *Levene’s Test for Equality of Variances. **Equal variances not assumed. N, number of individuals for whom the dimension could be measured;

Sig., significance.

253

The independent samples t-test revealed statistically significant differences between male and female mean measurements for all 63 skeletal dimensions included in the study (P<0.0001), excluding length of the vertebral foramen of the second cervical vertebra (LVF; P=0.810) and pubic length (PL; P=0.390).

3.1.3.6 Z-Scores

Z-scores were calculated to reduce the weighting associated with large dimensions, which otherwise would have a dominating effect in multivariate analyses. Like the raw data, the Z-

Scores represent a large data set and are therefore not presented here. Instead, the Z-scored data were saved in the SPSS Data Editor and were subsequently used in the ANOVA and exploratory principal components analysis tests.

3.2 Intra- and inter-observer error

3.2.1 Intra-observer error

A total of 22 randomly selected individuals were included in the intra-observer error test sample.

Of these, 16 individuals were represented by a complete skeleton; six were represented by a cranium only. Overall, the tests of intra-observer error revealed that the majority of measurements used in the study can be reliably measured. Across all 63 dimensions, the average mean per cent absolute error is 1.65, with no individual measurement exceeding 9%.

Figure 3.2A shows the dimensions for which mean per cent absolute error exceeded 1%. The five dimensions that exhibited the highest mean per cent absolute error were: opisthion- forehead length (OF; 5.24%) and mastoid length (ML; 5.92%) of the cranium, medio-lateral breadth of base (BML; 4.53%) and antero-posterior breadth of base (BAP; 8.65%) of the first metacarpal, and breadth of infraspinous body of the scapula (BXB; 6.43%).

254

9

8

7

6

5

4

3 Mean per cent error, % per cent Mean 2

1

0

Skeletal dimension

Figure 3.2A: Dimensions included in the study demonstrating mean per cent error >1%.

OF, opisthion-forehead length; ML, mastoid length; XSL, maximum sagittal length of C2; XDH, maximum height of dens;

DSD, dens sagittal diameter; DTD, dens transverse diameter; LVF, length of vertebral foramen of C2; SFT, superior facet transverse diameter of C2; FND, femoral neck diameter; CNF, circumference at nutrient foramen of tibia; MSC, minimum shaft circumference of tibia; APD, antero-posterior diameter of tibia; DEB, distal epiphyseal breadth of tibia;

RSSB, radius semi-bistyloid breadth; USBB, ulna semi-bistyloid breadth; BML, medio-lateral breadth of base of MC1;

BAP, antero-posterior breadth of base of MC1; HML, medio-lateral breath of head of MC1; HAP, antero-posterior breadth of head of MC1; MS, maximum shaft diameter of MC1; SIH, supero-inferior head height of MT1; MLH, medio- lateral head width of MT1; SIB, supero-inferior base height of MT1; MLB, medio-lateral base width of MT1; MSD, midshaft diameter of MT1; IL, ischial length; PL, pubic length; HSN, height of sciatic notch; BXB, breadth of infraspinous body of scapula; HAX, height of glenoid prominence of scapula.

The lowest mean per cent absolute errors (<0.20%) were obtained for measurement of the maximum width of the cranium (MW; 0.13%), the bizygomatic width of the cranium (DB; 0.17%), the maximum breadth across the superior facets of C2 (SFB; 0.18%), the maximum length of humerus (XHL; 0.18%), and the maximum length of ulna (XUL; 0.18%).

255

The intra-observer technical error of measurement (TEM) did not exceed 8.50 mm for any individual dimension. A total of 12 dimensions exhibited TEM scores of greater than

1.00 mm, as shown in Figure 3.2B.

9 8.35

8

7 6.17 6 5.14 5

4

3 2.14 2 1.69 1.47 1.54 1.43 1.31 1.50 1.13 1.28 1 Technicalerror of measurement mm (TEM),

0 OF ML XFL TL CNF MSC XRL BAP IL PL HSN BXB Skeletal dimension

Figure 3.2B: Skeletal dimensions that exhibited TEM scores >1.00 mm.

OF, opisthion-forehead length; ML, mastoid length; XFL, maximum length of femur; TL, tibial length; CNF, circumference of tibia at nutrient foramen; MSC, minimum shaft circumference of tibia; XRL, maximum length of radius;

BAP, antero-posterior breadth of base of MC1; IL, ischial length; PL, pubic length; HSN, height of sciatic notch; BXB, breadth of infraspinous body of scapula.

The lowest TEMs were observed for nasal breadth (NB; 0.14 mm), dens transverse diameter

(DTD; 0.14 mm), maximum breadth across superior facets of C2 (SFB; 0.09 mm), superior facet sagittal diameter of C2 (SFS; 0.14 mm), maximum radial head diameter (MAXD; 0.13 mm), minimum radial head diameter (MIND; 0.10 mm), maximum midshaft diameter of metacarpal 1

(MS; 0.14 mm), and length of metatarsal 1 (L; 0.14 mm). Four dimensions demonstrated relative or per cent TEM in excess of 5%. These were: OF (5.81%), ML (5.35%), BAP (9.00%), and BXB

(5.36%).

The results of the paired samples t-test demonstrated that for four of the 63 skeletal dimensions included in the study, the retaken measurements showed a statistically significant shift in a particular direction using a significance level of 0.05. In other words, the retaken

256

measurements for these dimensions were found to be predominantly higher or lower than the original measurements in a manner that was unlikely to have occurred by chance. Table 3.2A provides a summary of these four dimensions, including T- and P-values, and the direction of change. However, using a significance level of 0.05 there is a one in 20 chance that the null hypothesis will be rejected when it is in fact true. Thus, for 60 statistical tests, it is expected that a true null hypothesis will be rejected on three occasions, which resembles the results obtained here. Therefore, if a significance level of P<0.0008 is used (0.05 ÷ 63) to adjust for the issue of multiple testing, the results of the paired samples t-test are not statistically significant for any of the 63 skeletal variables included in the study.

Table 3.2A: Results of a paired samples t-test showing the dimensions exhibiting statistically significant differences between the original and retaken measurements.

Dimension T-value P-value Direction of change

LVF (n=13) -2.316 0.039 9 of 13 retaken measurements were higher than the original

FHD (n=14) -2.369 0.020 10 of 14 retaken measurements were higher than the original

FTD (n=15) 2.681 0.018 12 of 15 retaken measurements were lower than the original

XHL (n=13) -2.739 0.018 5 of 13 retaken measurements were higher than the original (8 retaken measurements were equal to the original)

LVF, length of vertebral foramen of C2; FHD, femoral head diameter; FTD, femoral transverse diameter; XHL, maximum length of humerus.

Examination of the Pearson correlation coefficient for each dimension revealed that the method of measuring the skeletal dimensions included in the study was reliable (r ≥0.80) for all 63 dimensions excluding opisthion-forehead length of cranium (OF; r =0.416) and antero-posterior breadth of base of metacarpal 1 (BAP; r =-0.153). Similarly, the coefficient of reliability, R, scores for all but eight dimensions was greater than 0.95, suggesting that only around 5% of the variance associated with the means of the original and retaken measurements was the result of measurement error. The eight exceptions were OF (R=–1.231), mastoid length (ML, R=0.839),

257

maximum height of dens (XDH, R=0.900), medio-lateral breadth of base of metacarpal 1 (BML,

R=0.757), BAP (R=0.321), medio-lateral base width of metatarsal 1 (MLB, R=0.929), height of sciatic notch (HSN, R=0.897), and breadth of infraspinous body of scapula (BXB, R=0.589).

Overall, it can be concluded that the majority of dimensions measured in the study are reliable and replicable. The dimensions OF and BAP failed to meet the critical level for both the

Pearson correlation coefficient (r) and coefficient of reliability (R); six additional dimensions (ML,

XDH, BML, MLB, HSN, and BXB) failed to meet the critical value of R. Based on all the results presented above, four dimensions (OF, ML, BAP, and BXB) were excluded from subsequent statistical analyses. The complete set of results for the intra-observer error test is available in

Appendix 7.4.1.

3.2.2 Inter-observer error

A total of 43 of the original 63 skeletal dimensions included in the study were assessed in the test of inter-observer error between EJM and IK-O using an unprovenanced collection of skeletal elements from the KNH Centre for Biomedical Egyptology’s Tissue Bank at the

University of Manchester. Total mean per cent absolute error for all 43 dimensions was 3.01%, with no individual measurement exceeding 15%. The five dimensions with the highest per cent error were HML (14.29%), PL (11.54%), ML (10.53%), MIND (8.70%), and NB (6.04%). TEM did not exceed 9 mm for any of the 43 dimensions. TEM was highest for PL (8.49 mm), CNF (5.22 mm), TL (4.95 mm), and OF (3.96). The two dimensions with the highest relative or per cent

TEM (%TEM) were HML (9.43%) and PL (8.66%). The lowest %TEM was obtained for FTD

(0.00%), EWH (0.00%), IAL (0.00%), and DB (0.06%). It was possible to calculate R for 31 of the 43 skeletal dimensions included in this inter-observer error test. Of these 31 dimensions, the

R values of eight dimensions failed to reach the critical level of 0.95. These dimensions are shown in Figure 3.2C.

258

1.0 0.920 0.933 0.936 0.847 0.833 0.8

0.6 0.445 0.4 value R

0.2 0.108

0.0 NB PB OF ML CNF BXB HAX BCB -0.038 -0.2 Skeletal dimension

Figure 3.2C: Skeletal variables demonstrating values of R <0.95 in the inter-observer error test between EJM and IK-O. NB, nasal breadth; PB, palate breadth; OF, opisthion-forehead length; ML, mastoid length; CNF, circumference at nutrient foramen of tibia; BXB, breadth of infraspinous body of scapula; HAX, height of the glenoid fossa of scapula;

BCB, breadth of the glenoid fossa of scapula.

Owing to the small number of skeletal elements included in the inter-observer error test between

EJM and IK-O, correlation analyses and the paired samples t-test were not performed, and no dimensions were excluded from further analyses. The full set of results for the inter-observer error test between EJM and IK-O can be found in Appendix 7.4.2.

The results of the inter-observer error test using individuals from the Peabody Museum collection that were measured by both EJM and MR are shown in Table 3.2B.

Table 3.2B: Results of the inter-observer error test between EJM and MR using skeletal remains from the Peabody Museum collection.

Precision Reliability Paired samples t- test

Dimension N TEM, %TEM Mean per R Pearson T-value P-value mm cent error, % correlation, r

FHD 55 0.34 0.81 1.01 0.991 0.993 -3.418 0.001

XFL 45 1.59 0.37 0.33 0.996 0.996 0.591 0.557

TL 43 3.34 0.92 1.17 0.974 0.996 14.231 <0.001

259

XHL 47 2.60 0.86 0.76 0.977 0.978 -1.095 0.279

XRL 5 0.32 0.14 0.09 >0.999 1.000 -1.000 0.374

FHD, femoral head diameter; XFL, maximum femoral length; TL, tibial length; XHL, maximum humeral length; XRL, maximum radial length; TEM, technical error of measurement.

As can be seen, mean per cent absolute error did not exceed 2% for any of the five dimensions; relative or per cent TEM did not exceed 1%. Furthermore, none of the reliability values fell below the critical levels for R or Pearson’s correlation, r, of 0.95 and 0.80, respectively. However, statistically significant results of the paired samples t-test were obtained for femoral head diameter (FHD) and tibial length (TL), suggesting systematic differences in the techniques used to measure these dimensions between the two observers. Indeed, for each of the 43 individuals for whom TL was measured by both observers, the measurement produced by EJM was greater than that produced by MR. The implications of this finding will be discussed in greater depth in Chapter 4, Discussion.

3.2.2.1 Morphological sex estimation error

A total of 30 specimens (20 ossa coxae and 10 skulls) were independently sexed by two observers (the present author and IK-O) using the Phenice traits and morphological indicators, as described in Sections 2.2.1.2–2.2.1.5. Sex assessments made by the two observers were compared using the kappa statistic (K), which provides a measure of agreement among observers corrected for chance. Kappa values are scaled between 0–1, with 0 indicating the amount of agreement expected if scores were assigned randomly to specimens, and 1 indicating perfect agreement (Walker, 2006). The results of this test are presented in Table

3.2C.

260

Table 3.2C: Kappa values and level of agreement for assessment of inter-observer morphological sex estimation.

Kappa P-value Agreement

Phenice traits 0.792 <0.0004 Substantial

Ossa coxae morphological indicators 0.903 <0.00002 Almost perfect

Skull morphological indicators 0.583 0.065 Moderate

Levels of agreement: K=0.0, no agreement; K=0.01 to 0.20, slight agreement; K=0.21 to 0.40, fair agreement; K=0.41 to

0.60, moderate agreement; K=0.61 to 0.80, substantial agreement; and K=0.81 to 1.0, almost perfect to perfect agreement (Landis & Koch, 1977).

As can be seen in Table 3.2C, there was a substantial to near perfect level of agreement between the two observers in estimating sex using features of the ossa coxae (including the

Phenice traits). The level of agreement for sex estimates using the skull was lower, but still in excess of what would have been expected from chance alone, suggesting a good level of consistency and repeatability in the morphological sex estimation technique employed in the present study.

3.3 Principal components analysis

Principal components analysis (PCA) was performed using both the raw data (with outliers and extreme scores removed, as described above, to ensure normality of the distribution) and Z-

Scores. Provided that PCA is used descriptively and exploratively, as in the present study, assumptions regarding the distributions of variables are not in force (Tabachnick & Fidell, 1996:

640). If variables are normally distributed, the solution is enhanced; however, to the extent that normality fails, the solution may be weakened but is still worthwhile (Tabachnick & Fidell, 1996:

640).

Four skeletal dimensions, OF, ML, BAP, and BXB, were not included in the analysis as the results of the intra-observer error test demonstrated that they could not be precisely and reliably measured. This left 59 of the original set of 63 skeletal dimensions that were eligible for inclusion in the PCA tests. Unfortunately, PCA could not be performed using all 59 skeletal dimensions, as the large amount of missing data (dimensions that could not be measured for a particular skeleton) meant that the number of valid cases included in the analysis was far too low. Two approaches were therefore undertaken. The first was to analyse only variables that

261

relate to measurements of the cranium, for which there was a sufficient number of cases. The second was to perform expectation maximisation (EM) imputation, a procedure that uses a maximum likelihood approach for estimating missing values (Meyers et al, 2006: 63). As described in Section 2.5.1 above, the EM algorithm is a two-step iterative process. During the E step, regression analyses are used to estimate the missing values. Maximum likelihood procedures are then used to make estimates of parameters such as correlations using the missing data replacements. This is known as the M step (Meyers et al, 2006: 64). The SPSS program iterates through the E and M steps until convergence or no change occurs between the steps (Meyers et al, 2006: 64).

3.3.1 Principal components analysis of cranial variables

3.3.1.1 Raw data

The results presented below are based on the analysis using the raw data; PCA of cranial variables using Z-Scores produced exactly the same results. A total of 160 valid cases were included in the analysis of nine cranial variables (OF and ML were excluded). The result of the

Kaiser-Meyer-Olkin (KMO) Measure of Sampling Adequacy test (0.783) suggests that the data are suitable for PCA (based on a heuristic cut-off point of 0.70; Meyers et al, 2006: 521). This is confirmed by the highly significant result of Bartlett’s Test of Sphericity (P<0.001); the statistically significant result allows rejection of the null hypothesis of lack of sufficient correlation between variables. Table 3.3A provides the communalities for the initial and extraction principal components solution. The communality of a test is the proportion of the variance of the test that has been accounted for by the components extracted (Kinnear & Gray, 2009: 573).

Table 3.3A: Communalities for the initial and extraction principal components solution.

Variable Initial Extraction

GO 1.000 0.774

MW 1.000 0.759

BB 1.000 0.614

DB 1.000 0.751

PN 1.000 0.452

BN 1.000 0.849

262

BP 1.000 0.882

NB 1.000 0.790

PB 1.000 0.648

Definitions of the bone dimension (variable) acronyms may be found in Table 2.2K in Section 2.2.2.1.

Three components with eigenvalues of greater than or equal to 1.00, in accordance with the

Kaiser criterion (Kaiser, 1960), were extracted and rotated. These three components accounted

for 72% of the total variance, as shown in Table 3.3B.

Table 3.3B: Total variance accounted for by each of the components after the initial, extraction, and rotation phases of the analysis (only the first four components are shown).

Comp. Initial eigenvalues Extraction sums of squared Rotation sums of squared loadings loadings

Total % of Cumulative Total % of Cumulative Total % of Cumulative variance % variance % variance %

1 4.41 49.00 49.00 4.41 49.00 49.00 2.98 33.10 33.10

2 1.09 12.13 61.14 1.09 12.13 61.14 1.90 21.14 54.24

3 1.02 11.29 72.43 1.02 11.29 72.43 1.64 18.19 72.43

4 0.68 7.54 79.96 ------

Comp., component.

The unrotated component matrix is shown in Table 3.3C. The values in this table are

correlations between the variables and the components, also known as the “loadings” of the

nine variables on the three components extracted.

Table 3.3C: Unrotated component matrix showing component loadings.

Variable Component

1 2 3

GO 0.868 -0.079 -0.122

DB 0.826 -0.120 0.235

BN 0.806 0.022 -0.446

BB 0.691 -0.369 0.017

MW 0.686 -0.486 0.229

PN 0.668 -0.059 0.048

263

PB 0.622 0.417 0.295

NB 0.458 0.571 0.504

BP 0.580 0.442 -0.592

Definitions of the bone dimension (variable) acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.3D shows the varimax rotated component matrix. The purpose of rotation is not to change the number of components extracted or the amount of variance accounted for, but to achieve a simpler structure which is easier to interpret (Meyers et al, 2006: 495–496). Simple structure is obtained when the correlations of the variables with the components is either very high (values near 1) or very low (values near 0). Rotation is an iterative process of “twisting” or

“pivoting” the component structure in the multidimensional space until a satisfactory “fit” is established (Meyers et al, 2006: 496). The variables that correlate most strongly (show a correlation >0.6) with each component are highlighted with red boxes.

Table 3.3D: Rotated component matrix showing component loadings.

Variable Component

1 2 3

MW 0.866 -0.010 0.089

BB 0.756 0.201 0.045

DB 0.744 0.192 0.401

GO 0.681 0.506 0.233

PN 0.550 0.280 0.266

BP 0.046 0.919 0.189

BN 0.508 0.764 0.083

NB 0.085 0.053 0.883

PB 0.265 0.242 0.720

Variance, % 33.1 21.1 18.2

Definitions of the bone dimension (variable) acronyms may be found in Table 2.2K in Section 2.2.2.1.

When PCA is used to explore data relating to anatomical measurements, the first unrotated principal component (PC) almost always has positive coefficients and reflects overall ‘size’ of the individual or anatomical part. Later PCs usually contrast some of the measurements with

264

others, and can often be interpreted as defining certain aspects of ‘shape’ (Joliffe, 2002: 64).

After rotation, the groups of variables on each component tend to reflect different structural components of the object under analysis. The unrotated component matrix given in Table 3.3C above shows that principal component 1 (PC1) represents the overall size of the whole cranium, and accounts for 49.0% of the total variance (Table 3.3B). After rotation, the amount of variance accounted for by PC1 dropped to 33.1%. The variables contributing strongly to this component include those that define the maximum width (MW), length (GO), and height (BB) of the cranium; in other words, these variables define the overall size of the cranial vault only. The second rotated PC accounts for 21.1% of the total variance and includes basion to nasion length (BN) and basion to prosthion length (PN), two measures of the forward extension of the facial skeleton. The final rotated component, PC3, accounts for 18.2% of the total variance and includes two facial breadths that contribute to the shape of the facial skeleton: nasal breadth

(NB) and palate breadth (PB). The PC analyses were subsequently performed separately for males and females (Table 3.3E).

Table 3.3E: Rotated component matrix showing component loadings in males and females separately.

Males Females

Variable Component Variable Component

1 2 3 1 2 3

BN 0.850 0.154 -0.046 MW 0.865 0.141 -0.049

BP 0.838 -0.113 0.135 BB 0.759 -0.052 0.105

GO 0.579 0.505 0.218 DB 0.660 0.505 0.181

MW 0.088 0.789 0.004 PB 0.026 0.789 0.044

DB -0.144 0.607 0.410 NB 0.029 0.732 0.109

PN 0.044 0.560 0.038 PN 0.148 0.531 0.160

BB 0.419 0.520 -0.002 BN 0.334 0.068 0.868

NB -0.036 0.014 0.853 BP -0.358 0.246 0.814

PB 0.231 0.134 0.749 GO 0.489 0.208 0.555

Variance, % 22.5 20.9 17.0 Variance, % 25.1 20.3 20.1

Definitions of the bone dimension (variable) acronyms may be found in Table 2.2K in Section 2.2.2.1.

265

As can be seen in Table 3.3E above, the form of the first three PCs are similar between males and females, although they account for different proportions of the variance. In males, PC1 accounts for 22.5% of the total variance and represents the size of the forward extension of the facial skeleton (BN and BP); in females, these variables form PC3 and account for 20.1% of the total variance. The greatest amount of variance (25.1%) in females is accounted for by two variables which represent the width of the cranium (MW and DB). In males, these variables form

PC2 and account for 20.9% of the cranium. PC3 accounts for 17.0% of the total variance in males and represents the width of features of the facial skeleton (NB and PB); in females, these variables form PC2 and account for 20.3% of the total variance. Overall, the first three componenets account for a large proportion of the total variation: 60.4% in males and 65.5% in females. These results suggest that different structural components make different contributions to the overall size and shape of the cranium in males and in females. In males, the greatest contribution to overall size is made by dimensions that represent the forward extension of the facial skeleton, whereas in females, the greatest contribution is made by width measurements.

3.3.2 Principal components analysis with expectation maximisation imputation

3.3.2.1 Raw data

Little’s Missing Completely At Random (MCAR) test resulted in a P-value for the chi-square statistic of 0.824. The null hypothesis that the data are missing completely at random may therefore be accepted. Using EM imputation to estimate missing values, PCA was performed using the complete sample of 318 cases. The KMO measure of sampling adequacy was 0.480, suggesting that the correlations may not be adequate for PCA. According to some authors, correlation matrices with a KMO value of less than 0.5 are inappropriate (Budaev, 2010).

Bartlett’s Test of Sphericity was statistically significant (P<0.001); however, previous studies have demonstrated the rejection of the null hypothesis (that there is no correlation between variables) even when the correlation matrix is inappropriate (Budaev, 2010). Dziuban & Shirkey

(1974) recommend that Bartlett’s test “may be used as a lower bound to the quality of the matrix. That is, if one fails to reject the [correlation] hypothesis, the matrix need be subjected to no further analysis. On the other hand, rejection of the [correlation] hypothesis on the Bartlett test is not a clear indication that the matrix is psychometrically sound”. Based on the results of

266

the KMO and Bartlett tests, no further analyses of the outcomes of this PCA test were undertaken.

3.3.2.2 Z-Scores

Little’s Missing Completely At Random (MCAR) test resulted in a P-value for the chi-square statistic of 0.865. The KMO measure of sampling adequacy was 0.720, suggesting that the correlations are adequate for PCA. This is confirmed by the highly significant result of Bartlett’s

Test of Sphericity (P<0.001); the statistically significant result allows rejection of the null hypothesis of lack of sufficient correlation between variables. Using the Kaiser criterion for the retention of eigenvalues greater than 1.00, a nine-component solution was obtained. These nine components accounted for 84.0% of the total variance. Alternatively, applying a stricter criterion of retention of eigenvalues greater than 2.00, three components were extracted which accounted for 70.1% of the total variance. Table 3.3F shows the rotated component loadings for these three components. Only the four variables that correlated most strongly with each component are shown.

Table 3.3F: Rotated component matrix showing component loadings.

Variable Component

1 2 3

ZMSC 0.843 0.301 0.260

ZRSBB 0.811 0.212 0.263

ZBCB 0.804 0.352 0.267

ZCNF 0.801 0.278 0.299

ZXFL 0.462 0.734 0.316

ZTL 0.526 0.712 0.321

ZXHL 0.540 0.702 0.264

ZXHS 0.405 0.694 0.183

ZSFT 0.362 0.099 0.738

ZBN 0.168 0.443 0.726

ZSFB 0.410 0.349 0.717

ZSFS 0.271 0.164 0.706

267

Variance, % 37.1 17.1 15.9

Definitions of the bone dimension (variable) acronyms may be found in Table 2.2K in Section 2.2.2.1. The Z-prefix indicates that the Z-Scores were used.

As can be seen from the results presented above, principal component 1 (PC1) accounted for

37.1% of the total variance and includes variables that primarily relate to breadth or circumference measurements of the limbs. The second PC accounted for 17.1% of the total variance and includes three measures of long bone length. The final component, PC3, accounted for 15.9% of the total variance and includes three dimensions of the second cervical vertebra (SFT, SFB, and SFS). The rotated component loadings for males and females separately are shown in Table 3.3G and 3.3H. Only the variables that correlated most strongly

(correlation >0.5) with each component are shown. Using the Kaiser criterion for the retention of eigenvalues greater than 1.00, a 14-component solution was obtained for males. These 14 components accounted for 85.5% of the total variance. Alternatively, applying a stricter criterion of retention of eigenvalues greater than 2.00, seven components were extracted which accounted for 63.5% of the total variance. In females, an 11-component solution was obtained, which accounted for 84.1% of the total variance, when using the Kaiser criterion for the retention of eigenvalues greater than 1.00. Retaining eigenvalues greater than 2.00 resulted in a six- component solution which accounted for 65.1% of the total variance.

Table 3.3G: Rotated component matrix showing component loadings in males.

Variable Component

1 2 3 4 5 6 7

ZRSBB 0.765 0.097 0.217 -0.089 0.162 0.269 -0.217

ZMAXD 0.757 0.325 0.173 0.271 0.060 0.111 0.045

ZEBF 0.754 0.238 0.300 0.154 0.170 -0.038 0.130

ZHHD 0.742 0.742 0.251 0.266 0.251 0.022 -0.046

ZXHL 0.243 0.818 0.168 0.200 0.127 0.129 -0.061

ZTL 0.288 0.811 0.144 0.158 0.199 0.119 0.145

ZXFL 0.192 0.810 0.164 0.157 0.088 0.107 0.146

ZXRL 0.365 0.807 0.179 0.150 0.061 0.099 -0.006

268

ZCNF 0.299 0.150 0.840 0.078 0.248 0.062 -0.017

ZAPD 0.250 0.207 0.803 0.031 0.039 0.049 0.168

ZMSC 0.338 0.213 0.798 0.064 0.308 0.113 -0.114

ZSFS 0.064 0.068 0.163 0.818 0.107 -0.023 0.042

ZSFB 0.342 0.187 0.152 0.871 0.016 -0.003 0.129

ZSFT 0.292 -0.035 0.051 0.765 0.046 -0.008 -0.016

ZMSD 0.237 0.173 0.365 0.193 0.710 0.024 0.113

ZPL -0.012 0.427 0.262 -0.130 0.553 -0.035 -0.089

ZMLB 0.374 0.196 0.129 -0.112 0.539 -0.055 -0.073

ZDB 0.191 0.154 0.369 0.240 -0.195 0.644 0.132

ZNB 0.328 0.061 -0.174 -0.202 0.066 0.570 0.151

ZMW 0.128 -0.007 0.374 0.167 0.100 0.545 0.055

ZBP 0.121 0.075 0.046 0.246 0.152 0.501 0.836

ZLVF -0.011 0.280 0.016 -0.160 -0.028 0.000 0.595

Variance, % 20.5 13.4 10.6 8.1 6.7 5.0 4.5

Definitions of the bone dimension (variable) acronyms may be found in Table 2.2K in Section 2.2.2.1. The Z-prefix indicates that the Z-Scores were used.

Table 3.3H: Rotated component matrix showing component loadings in females.

Variable Component

1 2 3 4 5 6

ZFND 0.738 0.232 0.168 0.403 0.236 -0.039

ZBML 0.723 0.348 0.282 0.082 0.005 0.047

ZMIND 0.709 0.499 0.228 0.174 0.251 0.045

ZMAXD 0.704 0.530 0.124 0.186 0.248 0.064

ZXHL 0.199 0.854 0.208 0.058 0.273 -0.046

ZXFL 0.256 0.826 0.088 0.226 0.230 -0.152

ZXRL 0.356 0.821 0.222 0.105 0.171 -0.020

ZTL 0.253 0.817 0.243 0.225 0.204 -0.083

ZXUL 0.294 0.807 0.242 0.099 0.103 -0.005

269

ZMSD 0.264 0.189 0.774 0.092 0.285 0.112

ZMLB 0.362 0.204 0.733 -0.037 0.004 -0.161

ZUSBB 0.316 0.086 0.609 0.344 0.182 -0.136

ZBN 0.121 0.117 0.050 0.753 0.114 -0.055

ZBP -0.033 0.048 0.119 0.725 -0.269 -0.372

ZSFB 0.196 0.361 0.094 0.613 0.448 0.074

ZMW 0.162 0.246 0.034 0.045 0.704 0.077

ZTB 0.331 0.203 0.442 0.056 0.624 0.232

ZCNF 0.457 0.276 0.485 0.141 0.557 0.019

ZLVF 0.109 0.426 -0.039 0.160 -0.043 -0.662

ZBB 0.139 0.295 -0.156 0.300 0.321 0.608

Variance, % 20.4 16.9 12.5 8.9 8.4 4.5

Definitions of the bone dimension (variable) acronyms may be found in Table 2.2K in Section 2.2.2.1. The Z-prefix indicates that the Z-Scores were used.

The components presented in Tables 3.3G and 3.3H above are slightly more difficult to interpret than those presented for the raw cranial data; however, some interesting similarities between the results for males and females are noteworthy. The skeletal dimension MAXD, the maximum diameter of the radial head, correlated strongly with the first PC in both males and females. In addition, after adjusting for the dominating effect of large dimensions (by using Z-Scores), the main source of variation in both males and females relates to diameters of proximal or distal ends of the long bones. This PC accounts for around 20% of the total variation in both sexes.

The form of PC2 additionally shows a high degree of similarity between the sexes, with 13.4% of the total variance in males and 16.9% of the total variance in females being accounted for by the length of the humerus, tibia, femur, and radius. In males, the third component (tibial shape) accounts for 10.6% of the total variance, while the shape of the superior articular facets of the second cervical vertebra (C2) accounts for 8.1% of the variation. In females, PC4 relates to the forward extension of the facial skeleton and accounts for 8.9% of the total variance. PC6 in females is the only component to consist of a negative coefficient, which suggests that the main source of variation is between individuals with a large basion-bregma measurement relative to

270

the length of the vertebral foramen of C2. The implications and importance of these findings will be discussed in greater detail in Chapter 4.

3.4 Sex estimation

3.4.1 Accuracy of previously developed metric sex estimation methods

3.4.1.1 Accuracy of “modern methods”

Table 3.4A provides a summary of the number of correct sex estimates and per cent accuracy rates (or in this instance per cent consistency rates) obtained when the 12 “modern” metric sex estimation methods selected for inclusion in the study were tested using data collected from the complete study sample of ancient Egyptian skeletal remains. In this table, N is used to denote the number of individuals to which the method or individual equation could be applied, n the number of correct sex estimates, and % the accuracy or consistency rate associated with the method or equation. The number of correct sex estimates and accuracy rates are presented for individual equations and the method as a whole. They are further presented for males and females separately. The total accuracy rate in per cent is the weighted mean for males and females combined (weighted by sample size).

Table 3.4A: Correct sex estimates and accuracy rates associated with 12 “modern” metric sex estimation methods when tested on ancient Egyptian skeletal remains.

Method Correct sex estimates

No. Bone Total, % Male, n/N (%) Female, n/N (%)

1 Cranium (Giles & Elliot, 1963)

Overall 91.9 103/108 (95.4) 79/90 (87.8) Function 3 88.2 83/87 (95.4) 59/74 (79.7) Function 6 88.9 97/101 (96.0) 72/89 (80.9) Function 9 88.7 100/105 (95.2) 72/89 (80.9) Function 10 86.4 72/88 (81.8) 68/74 (91.9) Function 13 88.9 84/88 (95.5) 60/74 (81.1) Function 16 89.9 101/108 (93.5) 77/90 (85.6) Function 17 86.5 85/103 (82.5) 81/89 (91.0) Function 18 87.6 74/87 (85.1) 67/74 (90.5) Function 21 88.5 90/103 (87.4) 79/88 (89.8)

271

Method Correct sex estimates

No. Bone Total, % Male, n/N (%) Female, n/N (%)

2 C2 (Wescott, 2000)

Overall 57.8 14/46 (30.4) 34/37 (91.9) Function 1 57.8 14/46 (30.4) 34/37 (91.9) Function 2 55.7 13/45 (28.9) 31/34 (91.2) Function 3 59.0 15/44 (34.1) 31/34 (91.2) Function 4 61.5 17/44 (38.6) 31/34 (91.2) Function 5 63.2 17/44 (38.6) 31/32 (96.9)

3 Femoral head diameter 69.1 37/78 (47.4) 57/58 (98.3) (Krogman & İşcan, 1986)

4 Femoral neck diameter 80.4 64/85 (75.3) 51/58 (87.9) (Seidemann et al, 1998)

5 Femoral shaft circumference 73.0 50/85 (58.9) 53/56 (94.6) (İşcan & Miller-Shaivitz, [1/85 (1.2) Indet.] [1/56 (1.8) Indet.] 1984a)

272

Method Correct sex estimates

No. Bone Total, % Male, n/N (%) Female, n/N (%)

6 Tibia (İşcan & Miller-Shaivitz, ------1984b) ------Univariate ------Overall – White 60.8 36/88 (40.9) 51/55 (92.7) [13/88 (14.8) Indet.] [1/55 (1.8) Indet.] Overall – Black 52.4 22/88 (25.0) 53/55 (96.4) [8/88 (9.1) Indet.] [1/55 (1.8) Indet.] PEB – White 55.4 17/56 (30.4) 34/36 (94.4) [5/56 (8.9) Indet.] - PEB – Black 44.6 6/56 (10.7) 35/36 (97.2) [6/56 (10.7) Indet.] - CNF – White 76.6 59/86 (68.6) 49/55 (89.1) [5/86 (5.8) Indet.] [1/55 (1.8) Indet.] CNF – Black 68.1 43/86 (50.0) 53/55 (96.4) [3/86 (3.5) Indet.] - DEB – White 52.8 22/78 (28.2) 44/47 (93.6) [13/78 (16.7) Indet.] - DEB – Black 48.8 17/78 (21.8) 44/47 (93.6) [4/78 (5.1) Indet.] [1/47 (2.1) Indet.] Multivariate Overall – White 66.1 37/77 (48.1) 45/47 (95.7) [2/77 (2.6) Indet.] - Overall – Black 55.6 24/77 (31.2) 45/47 (95.7) [10/77 (13.0) Indet.] - Function 4 – White 62.2 23/55 (41.8) 33/35 (94.3) Function 4 – Black 63.3 24/55 (43.6) 33/35 (94.3) Function 5 – White 61.7 31/74 (41.9) 43/46 (93.5) Function 5 – Black 60.8 29/74 (39.2) 44/46 (95.7) Function 6 – White 75.3 32/51 (62.7) 32/34 (94.1) Function 6 – Black 61.2 20/51 (39.2) 32/34 (94.1) Function 7 – White 73.9 34/56 (60.7) 34/36 (94.4) Function 7 – Black 65.2 26/56 (46.4) 34/36 (94.4) Function 8 – White 62.4 21/51 (41.2) 32/34 (94.1) Function 8 – Black 51.8 12/51 (23.5) 32/34 (94.1) Function 9 – White 58.8 18/51 (35.3) 32/34 (94.1) Function 9 – Black 43.5 3/51 (5.9) 34/34 (100)

273

Method Correct sex estimates

No. Bone Total, % Male, n/N (%) Female, n/N (%)

7 Humeral head diameter ------(Spradley & Jantz, 2011) ------

White 51.5 14/76 (18.4) 54/56 (96.4) Black 65.0 33/76 (43.4) 54/56 (96.4)

8 Humerus, radius and ulna ------(Holman & Bennett, 1991) ------

Overall 52.1 15/70 (21.4) 46/47 (97.9) [2/70 (2.9) Indet.] - Function 1 52.3 10/52 (19.2) 36/36 (100) Function 2 50.0 9/56 (16.1) 38/38 (100) Function 3 74.0 37/63 (58.7) 40/41 (97.6) Function 4 70.5 26/52 (50.0) 36/36 (100) Function 5 48.9 7/54 (13.0) 38/38 (100) Function 6 48.8 5/48 (10.4) 36/36 (100) Function 7 48.3 3/48 (6.3) 39/39 (100)

9 Radial head diameter ------(Berrizbeitia, 1989) ------

Maximum 45.6 8/42 (19.0) 23/26 (88.5) [33/42 (78.6) Indet.] [2/26 (7.7) Indet.] Minimum 44.4 4/37 (10.8) 24/26 (92.3) [30/37 (81.1) Indet.] [2/26 (7.7) Indet.]

10 MC1 (Scheuer & Elkington, 84.2 48/57 (84.2) 32/38 (84.2) 1993)

11 MT1 (Robling & Ubelaker, 41.4 7/58 (12.1) 29/29 (100) 1997)

274

Method Correct sex estimates

No. Bone Total, % Male, n/N (%) Female, n/N (%)

12 Multiple (Stewart, 1979)

Function 1 90.1 39/41 (95.1) 25/30 (83.3) Function 2 89.6 43/45 (95.6) 26/32 (81.3) Function 3 39.3 2/19 (10.5) 9/9 (100) Function 4 37.8 1/24 (4.2) 13/13 (100) Function 5 38.1 1/27 (3.7) 15/15 (100) Function 6 34.7 0/32 (0) 17/17 (100)

N, the number of individuals to which the method or equation could be applied; n, the number of correct sex estimates;

%, the accuracy rate associated with the method or equation; No., number; C2, second cervical vertebra; Indet., indeterminate sex; MC1, metacarpal 1; MT1, metatarsal 1.

To facilitate analysis, the accuracy rate associated with each method, both in total (the weighted mean) and in males and females separately, was plotted on a horizontally-oriented bar chart

(Figure 3.4A). The methods were ranked according to total accuracy, with the most accurate shown at the top and the least accurate at the bottom. It was deemed inappropriate to calculate overall accuracy rates for two methods: the tibia method developed by İşcan and Miller-Shaivitz

(1984b), and the multiple bones method developed by Stewart (1979), as they both consisted of several different parts. Separate accuracy rates have therefore been presented for the univariate and multivariate equations of the former method, and the functions of the latter method demonstrating the highest and lowest accuracy (Function 1 and Function 6, respectively). As a result of this, Figure 3.4A appears to show the accuracy rates associated with 14 rather than 12 “modern” metric sex estimation methods.

275

Total Male Female

Cranium Multiple - Function 1‡ Metacarpal 1 Femoral neck diameter Femoral shaft circumference Femoral head diameter Tibia - Multivariate* Humeral head diameter* Second cervical vertebra Tibia - Univariate* Humerus, radius, ulna Radial head diameter** Metatarsal 1 Multiple - Function 6‡

0 20406080100 Accuracy rate, %

Figure 3.4A: Weighted total, male, and female accuracy rates associated with 12 modern metric sex estimation methods, ranked in descending order according to total accuracy.

*Mean of separate equations for Black and White individuals. **Mean of separate equations for the maximum and minimum radial head diameter. ‡In the absence of an overall accuracy rate, the functions exhibiting the highest and lowest accuracy have been presented separately.

As can be seen in Table 3.4A and Figure 3.4A only three of the 12 “modern” methods tested produced male and female accuracy rates that reached or exceeded the 80% cut-off mark at which metric sex estimation methods are considered useful. Of these three methods, the highest overall accuracy rate (91.9%), which is a weighted average of the male and female rates, was produced by the cranium method developed by Giles and Elliot (1963). This overall accuracy rate for the method as a whole was obtained by considering the number of correct and incorrect sex estimates for each individual using the separate functions presented by Giles and

Elliot (1963) and assigning sex based on majority. Two further methods, using Functions 1 and

2 of the multiple method developed by Stewart (1979) and the metacarpal 1 (MC1) method developed by Scheuer and Elkington (1993) also produced accuracy rates that exceeded the

80% cut-off mark in males and females separately. Based on these results, it is possible to

276

suggest that the three methods mentioned above may be of value to other researchers requiring metric methods of tested accuracy to estimate sex in ancient Egyptian skeletal samples.

However, it is important to bear in mind that the test was conducted using an estimated sex reference sample; therefore, when one considers the rate of error associated with morphological sex estimates in conjunction with the error rate associated with the metric sex estimates, the

80% cut-off point may not in reality be reached. For example, supposing the morphological sex of a known-sex skeletal sample was estimated and was correct in 80 of 100 cases, then the accuracy rate would be 80%. If the morphological sex of an unknown-sex sample was subsequently estimated in the same manner, before testing a metric sex estimation technique and obtaining an accuracy rate of 80%, the overall accuracy rate is in fact 80% of 80% (0.80 x

0.80), which is 64%, well below the 80% accuracy cut-off point. However, it is pleasing to note that the morphological sex of the two named individuals from Thebes included in the study sample was estimated correctly in each case using the methodology set out in Section 2.2.1 and without prior knowledge of the names and sexes. In addition, the results of the inter- observer test of morphological sex estimation demonstrated fair to very high levels of agreement between the present author and a second observer, indicating that the morphological procedures used in the study were repeatable. This suggests that the error rate associated with the assignment of morphological sex to the individuals included in the study sample was low and unlikely to be affected by systematic error. These issues will be considered further in Chapter 4.

Of the remaining nine “modern” methods tested, three produced total weighted accuracy rates that were worse than what would have been achieved using simple guesswork.

These were Function 6 of the multiple method developed by Stewart (1979), which produced a total weighted accuracy rate of 34.7%; the minimum and maximum radial head diameter methods of Berrizbeitia (1989), which produced total accuracy rates of 44.4% and 45.6%, respectively; and the first metatarsal method of Robling and Ubelaker (1997), which produced a total accuracy rate of 41.4%. These accuracy rates are the result of the tendency of the methods to classify individuals as female, which suggests that ancient Egyptian males had skeletal measurements and proportions that were closer in size to modern European or

American females than males, the populations used to create the “modern” techniques. As

277

such, female accuracy rates of 100%, as can be seen in Table 3.4A, do not indicate that the method in question was highly accurate in correctly classifying the sex of females if it is associated with a very low male accuracy rate. Rather, it simply indicates that the majority of individuals in the study sample were classified as female regardless of their morphological sex.

To put it another way, these methods have no discriminatory power when applied to ancient

Egyptian skeletal remains and should not be used to estimate sex in samples from this population.

Four of the 12 modern metric sex estimation methods tested were unable to classify the sex of a number of individuals in the study sample, resulting in estimates of indeterminate sex.

The proportion of individuals who could not be assigned a sex using these methods was generally low, with the exception of those produced using the radial head diameter method developed by Berrizbeitia (1989), which assigned 78.6% and 81.1% of male individuals to the category of indeterminate sex using the maximum and minimum radial head diameters, respectively. This issue is largely the result of the non-overlapping sectioning points associated with the method. While measurements of greater than or equal to 23 mm indicate a male and less than or equal to 20 mm indicate a female (Berrizbeitia, 1989), measurements that fall in the range 20.01 to 22.99 mm cannot be assigned to either group.

Two methods, using the tibia (İşcan & Miller-Shaivitz, 1984b) and humeral head diameter (Spradley & Jantz, 2011) present separate metric sex estimation equations for ‘Black’ and ‘White’ populations (see previous discussions in Sections 1.2.1.1.2 and 1.5.1.3). When applied to ancient Egyptian skeletal remains, the ‘White’ univariate equations of the former method were found to produce higher accuracy rates than the ‘Black’ equations, both in total and in males separately. In females, the ‘Black’ equations for proximal epiphyseal breadth

(PEB) and circumference at nutrient foramen (CNF) but not distal epiphyseal breadth (DEB) were found to be more accurate; the equation for the latter produced equal accuracy rates in females. A similar pattern was observed using the multivariate equations of the tibia method.

The ‘White’ equations were more accurate than the ‘Black’ equations, both in total and in males separately, for all functions excluding Function 4. In females, the accuracy rates obtained using the equations for Black and White populations were equal for all functions excluding Function 5 and Function 9, where the accuracy rates using the ‘Black’ equations were greater than those

278

using the ‘White’ equations. In comparison, the accuracy rates obtained from the humeral head diameter (HHD) method were greater using the ‘Black’ compared with the ‘White’ equation, both in total and in males separately. The female accuracy rates using the two different equations were equal.

3.4.1.2 Accuracy of “population-specific” methods

The number of correct sex estimates and accuracy (consistency) rates associated with metric sex estimation equations that were specifically created for use in ancient Egyptian populations when applied to a different Egyptian population are shown in Table 3.4B. Accuracy rates are depicted graphically in Figure 3.4B.

Table 3.4B: Correct sex estimates and accuracy rates associated with two “population-specific” metric sex estimation methods when tested on the study sample.

Method Correct sex estimates

No. Bone Total, % Male, n/N (%) Female, n/N (%)

13 Long bones (Raxter, 2007)

FHD 90.4 72/78 (92.3) 51/58 (87.9) CNF 82.3 80/86 (93.0) 36/55 (65.5) [1/86 (1.2) Indet.] [2/55 (3.6) Indet.] HHD 87.1 66/76 (86.8) 49/56 (87.5) [1/76 (1.3) Indet.] [1/56 (1.8) Indet.]

14 Scapula (Dabbs, 2010)

Overall 88.1 69/73 (94.5) 35/45 (77.8) [2/73 (2.7) Indet.] [1/45 (2.2) Indet.] Function 1 86.4 68/73 (93.2) 34/45 (75.6) Function 2 87.2 70/72 (97.2) 32/45 (71.1) Function 3 100 15/15 (100) 12/12 (100) Function 4 95.8 13/14 (92.9) 10/10 (100) Function 5 95.8 13/14 (92.9) 10/10 (100)

N, the number of individuals to which the method or equation could be applied; n, the number of correct sex estimates;

%, the accuracy rate associated with the method or equation; No., number; FHD, femoral head diameter; CNF, circumference of the tibia at nutrient foramen; HHD, humeral head diameter; Indet., indeterminate sex.

279

Total Male Female

Function 3

Function 5

Function 4

Function 2

Function 1

FHD

HHD

Long bones Long bones Scapula CNF

0 102030405060708090100 Accuracy rate, %

Figure 3.4B: Weighted total, male, and female accuracy rates associated with two population-specific metric sex estimation methods, ranked in descending order according to total accuracy.

FHD, femoral head diameter; HHD, humeral head diameter; CNF, circumference of the tibia at nutrient foramen.

Scapula: method of Dabbs (2010); Long bones: method of Raxter (2007).

As can be seen in Table 3.4B and Figure 3.4B, the male accuracy rates associated with all individual equations or components of the two population-specific metric sex estimation methods exceeded the 80% cut-off mark. However, the accuracy rates associated with a number of individual equations were unacceptably low in females. Using the long bones method developed by Raxter (2007), only 65.5% of the female sample was correctly sexed by measuring the circumference of the tibia at the nutrient foramen (CNF). An overall accuracy rate for females of 77.8% was obtained using the scapula method developed by Dabbs (2010); accuracy rates of 75.6% and 71.1% were obtained for the female subsample using Function 1 and Function 2, respectively. In comparison, Functions 3, 4 and 5 of this method produced accuracy rates of 100% in females and were additionally very accurate in correctly classifying the sex of males. Based on these results, the FHD and HHD sectioning points of Raxter (2007) and Functions 3–5 developed by Dabbs (2010) may be of value to other researchers, given that these methods produced acceptable levels of accuracy in males and females separately in this independent test using a different population sample.

280

A small proportion of individuals in the study sample was categorised as being of indeterminate sex using the CNF and humeral head diameter (HHD) equations developed by

Raxter (2007). This problem arises because of the nature of the sectioning point, or the way it was reported. According to Raxter’s publication, the sectioning point for the HHD method is exactly 41 mm. Thus, to be classified as male using the HHD method, measurements must be greater than 41 mm; to be classified as female, measurements must be less than 41 mm. As a result, individuals with an HHD measurement of exactly 41 mm cannot be assigned to a sex group. It appears that Raxter obtained this sectioning point by adding the mean male and female HHD measurement (43.91 mm and 38.12 mm, respectively), dividing by two (which equals 41.015 mm) and then rounding to the nearest 1 mm. However, this sectioning point does not take into account differences in sample sizes of males and females. As such, a weighted sectioning point of 41.55 mm may be more appropriate and reduce or eliminate assessments of indeterminate sex. The estimates of indeterminate sex using the scapula method were not associated with a particular function; rather, they were introduced via the method with which the overall accuracy rate was obtained. For example, assuming that four of the five separate functions could be applied to a single individual, if two were to indicate male sex and two were to indicate female sex, the lack of a majority necessitated that the overall sex assessment using the method would be indeterminate.

3.4.1.3 Accuracy of “living Egyptian” methods

Table 3.4C shows the number of correct sex estimates and accuracy (consistency) rates associated with metric sex estimation equations that were specifically created for use in the living Egyptian population, notably in forensic contexts, when applied to the ancient Egyptian population sampled in the present study. Accuracy rates are depicted graphically in Figure 3.4C.

281

Table 3.4C: Correct sex estimates and accuracy rates associated with two “living Egyptian” metric sex estimation methods when tested on the study sample.

Method Correct sex estimates

No. Bone Total, % Male, n/N (%) Female, n/N (%)

15 MC1 (Eshak et al, 2011) 83.2 57/61 (93.4) 27/40 (67.5)

16 MC1 (El Morsi & Al Hawary, 59.4 20/61 (32.8) 40/40 (100) 2013)

N, the number of individuals to which the method or equation could be applied; n, the number of correct sex estimates; %, the accuracy rate associated with the method or equation; No., number; MC1, metacarpal 1.

Total Male Female

83.2

Length of MC1 (Eshak et al, 2011) 93.4 67.5

59.4

Length of MC1 (El Morsi & Al Hawary, 2013) 32.8 100

0 102030405060708090100 Accuracy rate, %

Figure 3.4C: Weighted total, male, and female accuracy rates associated with two “living Egyptian” metric sex estimation methods, ranked in descending order according to total accuracy.

MC1, metacarpal 1.

Although the methods of Eshak and colleagues (2011) and El Morsi and Al Hawary (2013) are both based on interarticular length of the first metacarpal, the sectioning points for the two methods are quite considerably different. To be classified as male using the former method, the length of MC1 must exceed 42.55 mm, whereas to be classified as male using the latter method, the length of MC1 must exceed 46.5 mm, a difference of almost 4 mm. This is interesting given that both methods were created using a sample of the living Egyptian

282

population. Several reasons to explain the difference in sectioning points are considered in

Section 2.2.2.2.3 above. When applied to the ancient Egyptian sample included in the present study, the method of Eshak et al. (2011) was found to classify males with a high degree of accuracy (93.4%) but was less accurate in classifying females (67.5%). This suggests that the sectioning point of 42.55 mm is too low for the ancient Egyptian sample, as a large proportion of female MC1 measurements exceeded the sectioning point and were therefore classified as male. As a result, the male rate of correct sex classification is meaningless; it does not indicate that the method has good discriminatory power, but rather that the method will automatically classify a large proportion of individuals in the sample as male because of the low sectioning point. In comparison, the method of El Morsi and Al Hawary (2013) correctly classified all females in the study sample, but only 32.8% of males. This therefore suggests that the sectioning point of 46.5 mm is too high for ancient Egyptians, as the interarticular length of MC1 of a large proportion of males failed to exceed the sectioning point and these individuals were therefore classified as female. Again, these results do not indicate that the method is accurate in classifying females, as this is simply an effect of using a sectioning point that is too high for the target population.

Based on these findings, neither of the “living Egyptian” methods of estimating sex based on MC1 length should be used to estimate the sex of ancient Egyptian remains as the methods were unable to produce accuracy rates of 80% or above in both males and females.

By contrast, the MC1 method created by Scheuer and Elkington (1993) using a “modern” population sample produced an accuracy or ‘consistency of sex estimation methods’ rate of

84.2% in both males and females separately. This method does, however, require a number of different measurements of the first metacarpal, including the antero-posterior and medio-lateral widths of the head and base, and therefore may not be applicable if MC1 is fragmented.

3.4.2 Discriminant function analysis

Stepwise discriminant function analysis (DFA) was performed using different combinations of skeletal dimensions to create functions with the greatest possible discriminating ability. The procedure was performed using the complete study sample (n=318) and principal time periods

(Old Kingdom and Late Period). The Predynastic Period subsample was not analysed separately because of its relatively small size. Cases with missing data were deleted on a

283

listwise basis, that is, if a case had one or more specified variables (skeletal dimensions) missing, the case was excluded from the analysis. Within this section, only discriminant functions that met the following criteria are presented:

 Number of valid cases for analysis ≥40 (≥20 males and ≥20 females)

 Number of valid cases in classification sample ≥40 (≥20 males and ≥20 females)

 Canonical correlation >0.5

 Cross-validated accuracy rate of discriminant function ≥80% (overall and in males and

females separately).

As described in Section 2.5.5.1, variables (skeletal dimensions) are added to or removed from a function based on their values of ‘F to Enter’ and ‘F to Remove’. As such, the number of variables included in a discriminant function may not be the same as the number of variables selected for the initial analysis. For example, of the nine cranial variables available for the analysis, only three might be included in a discriminant function, the other six being excluded because they did not increase the discriminatory power of the function. Similarly, the number of valid cases included in the initial analysis might not equal the number of cases in the classification or cross-validation sample. For example, let us assume that there are 150 cases for which all nine cranial dimensions were measured. If only three of the dimensions are included in the discriminant function it is likely that there will be more than 150 cases for which the three selected dimensions were measured. As such, the size of the classification sample will be greater than the size of the analysis sample.

For each new discriminant function created the following data, constituting the complete set of pertinent results of each analysis, are presented:

 Canonical discriminant function statistics

. Number of valid cases

. Prior probabilities (to adjust for the difference in sample size between the two

groups [male and female], prior probabilities were calculated from group sizes)

. Eigenvalue (variance explained by the discriminant function)

. Wilks’ lambda (significance of discriminant function under the null hypothesis that

the group centroids are equal)

284

. Unstandardised canonical discriminant function coefficients for the variables

entered into the function

. Group centroids (mean value of discriminant scores for each group)

. Sectioning point (based on the unweighted or weighted mean of the group

centroids, depending on whether the sample sizes for the two groups are equal or

unequal).

 Canonical discriminant function classification statistics

. Original and cross-validated predicted group membership for males and females

. Total original and cross-validated cases correctly classified.

For the first set of DFA results presented below, a description of the relevance, importance, and meaning of the various statistics is provided. For subsequent sets of DFA results, the data are presented without additional comment, unless required.

3.4.2.1 Complete sample

3.4.2.1.1 The cranium

A total of nine dimensions were eligible for this analysis (OF and ML were excluded based on results of the intra-observer error test discussed above). Table 3.4D(i) provides a summary of the canonical discriminant function and relevant statistics associated with cranial function 1. The classification results are presented in Table 3.4D(ii).

Table 3.4D(i): Cranial discriminant function 1 (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 87 M: 0.544 1.532 0.395 (<0.001) GO: 0.092 M: 1.127 0.005 F: 73 F: 0.456 DB: 0.146 F:-1.343 T: 160 BP: 0.055 Con.: -40.272

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

285

The canonical correlation for the function is 0.778 (not shown in table). This is a measure of the association between the discriminant function and the outcome variable (in this case sex); a value above 0.5 indicates a strong relationship (Meyers et al, 2006: 271). The square of the canonical correlation, in this case 0.605 or 60.5%, indicates the proportion of the variance explained in the dependent variable. The prior probabilities shown in Table 3.4D(i) demonstrate that there is a marginally higher probability of being classified as male, even before any other statistics are calculated, because the male sample is larger than the female sample. The eigenvalue is a measure of the separation of the outcome or dependent variable groups (in this case male and female) achieved by the discriminant function. As a general rule, the higher the value the better the separation; however, in this context, the eigenvalue has no upper limit and should therefore be interpreted with caution. Eigenvalues can additionally be converted to the percentage of the between-group variance accounted for by the discriminant function (Kinnear

& Gray, 2009: 527). Wilks’ lambda is used to test the null hypothesis that the means of all the independent variables (skeletal dimensions) are equal across both groups (male and female) of the dependent variable (sex). A statistically significant result for Wilks’ lambda, as demonstrated in Table 3.4D(i) above, allows the null hypothesis to be rejected. One may therefore conclude that the means of the independent variables are not equal and thus that there is a relationship between the independent variables and the dependent variable.

The coefficients given in Table 3.4D(i) are the unstandardised canonical discriminant function coefficients, which are used to calculate the discriminant score. Each variable (skeletal dimension) is multiplied by its coefficient, summed, and the constant added or subtracted as indicated by the preceding sign. In comparison, the standardised canonical discriminant function coefficients indicate the partial contribution of each variable in the discriminant function controlling for other variables in the function (Meyers et al, 2006: 261). In other words, they may be used to assess each variable’s unique contribution to the discriminant function. For the cranial function presented in Table 3.4D(i) above, the standardised canonical discriminant function coefficients are 0.662, 0.484, and 0.238 for DB, GO, and BP, respectively (not shown in table). This suggests that DB makes the biggest contribution to the separation of groups using the function. The group centroids are the mean value of the discriminant score for a given group

(male or female) of the dependent variable (sex). The means are presented in standardised (Z-

286

Score) form and are based on the variate, that is, the weighted linear composite making up the discriminant function (Meyers et al, 2009: 273). The centroids are used to establish the sectioning point for classifying cases. If the groups of the dependent variable are of equal size, the best sectioning point is halfway between the values of the functions at group centroids (the mean). If the groups of the dependent variable are unequal in size, the appropriate sectioning point is the weighted mean of the two values (Meyers et al, 2009: 273), calculated using the formula:

[(n1 x C1) + (n2 x C2)]/2

where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively.

The classification results for cranial function 1 presented above are given in Table

3.4D(ii).

Table 3.4D(ii): Classification results associated with cranial function 1 (complete sample).

Cranial function 1 Predicted group membership

Male Female Total Overall

Male 90 14 104 Count 193 Female 9 89 89 Original Male 86.5 13.5 100 % 88.1% Female 10.1 89.9 100

Male 90 14 104 Count 193 Female 11 78 89 Cross- validated* Male 86.5 13.5 100 % 87.0% Female 12.4 87.6 100

Table 3.4D(ii) presents the number and proportion of correct and incorrect classifications for the original and ‘leave-one-out’ or cross-validation samples, as well as the overall accuracy rate associated with the function. Cross-validation is done only for those cases in the analysis. In cross-validation, each case is classified by the function derived from all cases other than that case. As shown in the table, when tested on 193 individuals with the appropriate measurements

287

the cranial function created using the complete study sample produced a cross-validated accuracy rate of 87%. This is the mean of the accuracy rates for males and females. As can be seen, using the cross-validation procedure, 90 males were classified as male using the function, while 14 were misclassified as female giving an accuracy rate for males of 86.5%. Similarly, 78 females were correctly classified as female while 11 were misclassified as male. This gives an accuracy rate of 87.6%. For each case classified, SPSS provides a posterior probability, which is analogous to the calculation of the probability of the event (for example male or not male) in logistic regression.

A second cranial function was created using the complete study sample and based on only those cranial variables that were measured by the present author and IK-O. This function, cranial function 2, which is presented in Table 3.4E(i) is therefore available for testing on the

Saqqara-West cemetery population of skeletal remains. The classification results associated with cranial function 2 are given in Table 3.4E(ii).

Table 3.4E(i): Cranial discriminant function 2 (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 109 M: 0.548 1.401 0.416 (<0.001) GO: 0.108 M: 1.070 -0.005 F: 90 F: 0.452 DB: 0.146 F:-1.296 T: 199 Con.: -38.163

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4E(ii): Classification results associated with cranial function 2 (complete sample).

Cranial function 2 Predicted group membership

Male Female Total Overall

Male 95 14 109 Original Count 199 Female 12 78 90

288

Male 87.2 12.8 100 % 86.9% Female 13.3 86.7 100

Male 94 15 109 Count 199 Cross- Female 12 78 90 validated* Male 86.2 13.8 100 % 86.4% Female 13.3 86.7 100

3.4.2.1.2 The femur

A total of six variables were available for this analysis. Table 3.4F(i) provides a summary of the canonical discriminant function and relevant statistics associated with the femoral function. The classification results are presented in Table 3.4F(ii).

Table 3.4F(i): Femoral discriminant function (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 48 M: 0.585 1.751 0.363 (<0.001) FHD: 0.235 M: 1.100 -0.001 F: 34 F: 0.415 FSC: 0.089 F:-1.553 T: 72 XFL: 0.016 Con.: -24.278

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4F(ii): Classification results associated with the femoral function (complete sample).

Femoral function Predicted group membership

Male Female Total Overall

Male 55 4 59 Count 107 Original Female 4 41 48

% Male 93.2 6.8 100 89.7%

289

Female 14.6 85.4 100

Male 55 4 59 Count 107 Cross- Female 8 40 48 validated* Male 93.2 6.8 100 % 88.8% Female 16.7 83.3 100

3.4.2.1.3 The tibia

A total of seven variables were available for this analysis. Table 3.4G(i) provides a summary of the canonical discriminant function and relevant statistics associated with the tibial function. The classification results are presented in Table 3.4G (ii).

Table 3.4G(i): Tibial discriminant function (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 50 M: 0.595 1.953 0.339 (<0.001) MSC: 0.122 M: 1.139 0.000 F: 34 F: 0.405 PEB: 0.283 F:-1.675 T: 84 DEB: -0.155 Con.: -21.728

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4G(ii): Classification results associated with the femoral function (complete sample).

Tibial function Predicted group membership

Male Female Total Overall

Male 51 2 53 Count 88 Female 3 33 35 Original Male 96.2 3.8 100 % 95.5% Female 5.7 94.3 100

290

Male 51 2 53 Count 88 Cross- Female 3 32 35 validated* Male 96.2 3.8 100 % 94.3% Female 8.6 91.4 100

3.4.2.1.4 The upper limb

A total of nine variables of the humerus, radius and ulna were available for this analysis. Table

3.4H(i) provides a summary of the canonical discriminant function and relevant statistics associated with the upper limb function. The classification results are presented in Table

3.4H(ii).

Table 3.4H(i): Upper limb discriminant function (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 24 M: 0.545 2.295 0.304 (<0.001) XUL: 0.046 M: 1.351 0.002 F: 20 F: 0.455 MIND: 0.511 F:-1.621 T: 44 Con.: -22.332

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4H(ii): Classification results associated with the upper limb function (complete sample).

Upper limb function Predicted group membership

Male Female Total Overall

Male 30 1 31 Count 51 Female 1 19 20 Original Male 96.8 3.2 100 % 96.1% Female 5.0 95.0 100

291

Male 30 1 31 Count 51 Cross- Female 1 19 20 validated* Male 96.8 3.2 100 % 96.1% Female 5.0 95.0 100

3.4.2.1.5 The first metacarpal (MC1)

A total of five variables were available for this analysis (BAP was excluded based on the results of the intra-observer error test). Table 3.4I(i) provides a summary of the canonical discriminant function and relevant statistics associated with the MC1 function. The classification results are presented in Table 3.4I(ii).

Table 3.4I(i): MC1 discriminant function (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 57 M: 0.588 1.450 0.408 (<0.001) IAL: 0.227 M: 0.998 0.003 F: 40 F: 0.412 HML: 0.458 F:-1.422 T: 97 MS: 0.434 Con.: -21.032

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4I(ii): Classification results associated with the MC1 function (complete sample).

MC1 function Predicted group membership

Male Female Total Overall

Male 56 4 60 Count 100 Female 3 37 40 Original Male 93.3 6.7 100 % 93.0% Female 7.5 92.5 100

292

Male 55 5 60 Count 100 Cross- Female 5 35 40 validated* Male 91.7 8.3 100 % 90.0% Female 12.5 87.5 100

3.4.2.1.6 The lower limb

A total of 13 variables of the femur and tibia were available for this analysis. Table 3.4J(i) provides a summary of the canonical discriminant function and relevant statistics associated with the lower limb function 1. The classification results are presented in Table 3.4J(ii).

Table 3.4J(i): Lower limb discriminant function 1 (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 35 M: 0.574 1.929 0.341 (<0.001) FSC: 0.116 M: 1.177 -0.0075 F: 26 F: 0.426 PEB: 0.139 F:-1.585 T: 61 XFL: 0.017 Con.: -26.898

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4J(ii): Classification results associated with the lower limb function 1 (complete sample).

Lower limb function 1 Predicted group membership

Male Female Total Overall

Male 45 2 47 Count 81 Female 3 31 34 Original Male 95.7 4.3 100 % 93.8% Female 8.8 91.2 100

Cross- Count Male 45 2 47 81

293

validated* Female 3 31 34

Male 95.7 4.3 100 % 93.8% Female 8.8 91.2 100

A second discriminant function was created using only those variables of the femur and tibia that were measured by both the present author and IK-O (FHD and PEB). This function, lower limb function 2, is presented in Table 3.4K(i), and is available for testing on the Saqqara-West skeletal population. The classification results associated with this function are given in Table

3.4K(ii).

Table 3.4K(i): Lower limb discriminant function 2 (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 51 M: 0.593 1.929 0.341 (<0.001) FHD: 0.247 M: 1.137 -0.004 F: 35 F: 0.407 PEB: 0.178 F:-1.657 T: 86 Con.: -22.718

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4K(ii): Classification results associated with the lower limb function 2 (complete sample).

Lower limb function 2 Predicted group membership

Male Female Total Overall

Male 48 3 51 Count 86 Female 3 32 35 Original Male 94.1 5.9 100 % 93.0% Female 8.6 91.4 100

Cross- Male 48 3 51 Count 86 validated* Female 3 32 35

294

Male 94.1 5.9 100 % 93.0% Female 8.6 91.4 100

3.4.2.1.7 First principal component variables

For this analysis, the four variables that correlated most strongly with principal component one

(PC1) after performing PCA using Z-Scores and with EM imputation were selected (MSC,

RSBB, BCB, and CNF). Table 3.4L(i) provides a summary of the canonical discriminant function and relevant statistics associated with the PC1 function. The classification results are presented in Table 3.4L(ii).

Table 3.4L(i): PC1 variables discriminant function (complete sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 53 M: 0.663 1.827 0.348 (<0.001) BCB: 0.489 M: 0.964 0.5205 F: 27 F: 0.338 PEB: 0.069 F:-1.893 T: 80 Con.: -19.050

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4L(ii): Classification results associated with the first principal component variables with the highest loadings (complete sample).

PC1 function Predicted group membership

Male Female Total Overall

Male 66 2 68 Count 105 Female 4 33 37 Original Male 97.1 2.9 100 % 94.3% Female 10.8 89.2 100

Cross- Male 66 2 68 Count 105 validated* Female 5 32 37

295

Male 97.1 2.9 100 % 93.3% Female 13.5 86.5 100

Of the new discriminant functions presented above, the greatest cross-validated accuracy was obtained using dimensions of the upper limb (96.1%) followed by dimensions of the tibia alone

(94.3%) and dimensions of the lower limb (93.8% for the first function; 93.0% for the second function). The function created using the dimensions that correlated most strongly with the first principal component also produced a high cross-validated accuracy rate (93.3%). The lowest accuracy rates were obtained using dimensions of the cranium (87.0% using cranial function 1;

86.4% for cranial function 2). However, these accuracy rates are still above the 80% cut-off point at which metric sex estimation methods are considered useful.

3.4.2.2 Old Kingdom Giza

Despite performing discriminant function analysis using multiple combinations of variables, none of the resulting functions based on the Old Kingdom Giza sample alone met the sample size, accuracy, and statistical criteria specified above. For a large proportion of the analyses, the number of females with the relevant variables failed to meet the requirement for a minimum of

20 cases. A further issue was that for some of the discriminant functions, the proportion of females that were correctly classified using cross-validation failed to meet the 80% cut-off point, despite classification rates in excess of this value both overall (the mean of the male and female rates) and for males separately.

3.4.2.3 Late Period Giza

A total of nine variables were available for this analysis (OF and ML were excluded based on the results of the intra-observer error test). Table 3.4M(i) provides a summary of the canonical discriminant function and relevant statistics associated with cranial function 1. The classification results are presented in Table 3.4M(ii).

296

Table 3.4M(i): Cranial discriminant function 1 (Late Period Giza sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 68 M: 0.557 2.241 0.309 (<0.001) GO: 0.145 M: 1.323 0.000 F: 54 F: 0.443 MW: -0.073 F:-1.666 T: 122 DB: 0.156 BP: 0.060 Con.: -41.595

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4M(ii): Classification results associated with cranial function 1 (Late Period Giza sample).

Cranial function 1 Predicted group membership

Male Female Total Overall

Male 76 5 81 Count 147 Female 4 62 66 Original Male 93.8 6.2 100 % 93.9% Female 6.1 93.9 100

Male 76 5 81 Count 147 Cross- Female 5 61 66 validated* Male 93,8 6.2 100 % 93.2% Female 7.6 92.4 100

To produce a discriminant function that could be tested on the skeletal material from the

Saqqara-West cemetery, discriminant function analysis was performed using just the cranial variables that were measured by both the present author and IK-O (GO, MW, and DB). Cranial function 1 presented above could not be tested on the Saqqara population because IK-O does not routinely measure BP. The resulting function, cranial function 2, is presented in Table

297

3.4N(i). This table additionally provides a summary of the relevant statistics associated with the function. The classification results are presented in Table 3.4N(ii).

Table 3.4N(i): Cranial discriminant function 2 (Late Period Giza sample).

Function

Valid cases, N Prior Eigenvalue Wilks’ Lambda Coefficients* Centroids Sectioning (%) probability (significance) point**

M: 81 M: 0.551 2.178 0.315 (<0.001) GO: 0.160 M: 1.323 -0.0105 F: 66 F: 0.449 MW: -0.074 F:-1.624 T: 147 DB: 0.166 Con.: -39.949

M, male; F, female; T, total; Con., constant. *Unstandardised canonical discriminant function coefficients. **The sectioning point between males and females was derived by taking the weighted average of the function centroids, using the formula [(n1 x C1) + (n2 x C2)]/2, where n1 and C1 are the sample size and group centroid for males, respectively, and n2 and C2 are the sample size and group centroid for females, respectively. A score greater than the sectioning point indicates a male, lower than the sectioning point a female. Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

Table 3.4N(ii): Classification results associated with cranial function 2 (Late Period Giza sample).

Cranial function 2 Predicted group membership

Male Female Total Overall

Male 76 5 81 Count 147 Female 6 60 66 Original Male 93.8 6.2 100 % 92.5% Female 9.1 90.9 100

Male 76 5 81 Count 147 Cross- Female 7 59 66 validated* Male 93.8 6.2 100 % 91.8% Female 10.6 89.4 100

3.4.3 Logistic regression analysis

Logistic regression analysis (LRA) was performed using different combinations of skeletal dimensions to create equations with the greatest possible sex-prediction ability. The procedure

298

was performed using the complete study sample (n=318) and the Late Period Giza subsample.

The Predynastic Period and Old Kingdom subsamples were not analysed separately because of their relatively small sizes. Cases with missing data were deleted on a listwise basis. Within this section, only logistic regression equations that met the following criteria are presented:

 Number of valid cases for analysis ≥50

 Hosmer–Lemeshow test (P-value associated with chi-square statistic) >0.05

 Hosmer–Lemeshow test (c-statistic) ≥0.7

 Accuracy rate for males and females separately ≥80%.

For each new logistic regression equation created the following data, constituting the complete set of pertinent results of each analysis, are presented:

 Likelihood ratio test

 Omnibus test of model coefficients

 Pseudo R2

 Hosmer–Lemeshow test

 Wald test.

For the first set of LRA results presented below, a description of the relevance, importance, and meaning of the various statistics is provided. For subsequent sets of LRA results, the data are presented without additional comment, unless required. For all analyses, females were coded as ‘0’ and males as ‘1’. The outputs for logistic regression are based on two models. The first, known as the intercept model, provides the statistics of the baseline or ‘intercept-only’ approach to prediction of category membership. In other words, these results are computed with only the constant in the equation and none of the independent or predictor variables. The second model, known as the complete model, provides the statistics of prediction from the regression model with the independent or predictor variables (in this case skeletal dimensions) included in the regression equation as a complete block.

299

3.4.3.1 Complete sample

3.4.3.1.1 The cranium

A total of nine variables were available for this analysis (OF and ML were excluded based on the results of the intra-observer error test). Table 3.4O(i) provides a summary of the pertinent

‘intercept model’ statistics.

Table 3.4O(i): Intercept model statistics for cranial equation 1 (complete sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 87 54.5 1.222 0.269 F: 73 T: 160 (50.3)

Constant is included in the model. The cut value is 0.50.

As shown in Table 3.4O(i), a total of 160 cases were included in the analysis. The ‘percentage correct’ column of this table demonstrates the consequences of only including the constant or intercept in the model. This leads to the prediction of the more common or more frequent outcome. In the table above, it can be seen that males are more frequent than females; therefore, the more common outcome is that an individual will be male. If it was to always predict that an individual was male, without reference to any other information provided by the independent variables, this would be correct in 54.5% of cases.

The Wald test is analogous to the t-test. In the table above, the level of significance associated with the Wald statistic demonstrates that the constant or intercept by itself does not significantly improve prediction, which is what would be expected given the proportion of correct cases. The ‘complete model’ statistics are presented in Table 3.4O(ii).

300

Table 3.4O(ii): Complete model statistics for cranial equation 1 (complete sample).

Step Omnibus tests of Hosmer–Lemeshow Model summary model coefficients test

Chi- Sig. Chi- Sig. ‒2LL Cox & Nagelkerke square square Snell R2 R2

1 1.802 0.987 75.00 0.597 0.799 Step 145.581 <0.000 - - - - - Block 145.581 <0.000 - - - - - Model 145.581 <0.000 - - - - -

The omnibus tests of model coefficients columns in the above table contain the model chi- square, a statistical test of the null hypothesis that all the coefficients are zero. The model chi- square value is 145.581, which is the difference between the intercept- (constant)-only model and the complete model (with both the constant and independent or predictor variables included). In this case, the null hypothesis is rejected because the significance associated with the chi-square value is less than 0.05. It is therefore possible to conclude that the set of independent variables significantly improves the prediction of the outcome (dependent) variable

(sex; male or female). The complete model summary statistics presented in Table 3.4O(ii) above provide three measures of how well the logistic regression model fits the data. With all the variables in the model, the goodness-of-fit ‒2 Log likelihood (‒2LL) statistic is 75.00. This fit statistic is not usually interpreted directly, but may be useful when comparing different logistic models (Meyers et al, 2006: 249). The pseudo R2 measures in logistic regression are defined as: (1 – Lfull) ÷ Lreduced, where Lreduced represents the log likelihood for the intercept-only model and Lfull is the log likelihood for the complete model with both the constant and predictors. Of the two R2 statistics presented above, the Nagelkerke R2 is usually preferred because, unlike the

Cox & Snell R2, it can achieve a maximum value of one. In addition, the Nagelkerke R2 may be interpreted as the proportion of variance of the dependent variable that is accounted for by the regression model (Kinnear & Gray, 2009: 552). For the cranial equation (complete sample) presented above, the model accounts for around 80% (79.9%) of the variance.

The Hosmer–Lemeshow test provides a formal test assessing whether the predicted probabilities match the observed probabilities. The null hypothesis is that the predictions and observed values do not differ. As such, researchers are actually seeking a non-significant

301

P-value associated with this test because the goal is to derive predictors that will accurately predict the actual probabilities (Meyers et al, 2006: 249). In other words, the null hypothesis should not be rejected. In Table 3.4O(ii) above, the Hosmer–Lemeshow goodness-of-fit statistic is 1.802 and the associated P-value is 0.987, indicating an acceptable match between predicted and observed probabilities. Table 3.4O(iii) provides the classification results of the regression model; how well the model classifies cases into the two categories of the dependent variable

(male or female).

Table 3.4O(iii): Classification table for cranial equation 1 (complete sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 1 Female 64 9 87.7 Male 9 78 89.7 Overall % 88.8

The cut value is 0.50.

Overall, the sex of 88.8% of the sample was correctly classified using the regression model. The final output of interest relates to the variables in the equation, the statistics for which are shown in Table 3.4O(iv).

Table 3.4O(iv): Variables in cranial equation 1 (complete sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 1 Lower Upper

GO 0.217 7.561 0.006 1.242 1.064 1.450

MW -0.081 1.024 0.312 0.923 0.789 1.078

BB 0.112 1.563 0.211 1.119 0.938 1.333

DB 0.294 11.567 0.001 1.342 1.133 1.591

PN 0.157 2.327 0.127 1.169 0.956 1.430

BN 0.148 0.979 0.322 1.159 0.865 1.554

BP 0.058 0.359 0.549 1.060 0.876 1.282

302

NB -0.013 0.004 0.949 0.987 0.669 1.457

PB -0.048 0.216 0.642 0.953 0.780 1.166

Constant -107.710 31.001 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

Within this table the equation coefficients indicate the amount of changes expected in the log odds when there is a 1-unit change in the predictor variable with all other variables in the model held constant. A coefficient close to zero suggests there is no change to the predictor variable.

The Wald statistic tests the regression coefficients and constant for significance under the null hypothesis that the value of the coefficient does not differ from zero. The ‘Sig.’ column represents the P-value for the Wald statistic. A significant value (<0.05) indicates that the predictor or independent variable (in this case skeletal dimensions) is significantly associated with the dependent variable (sex). As can be seen in Table 3.4O(iv), only two variables, GO and

DB, are significantly associated with the outcome variable. The second column from the right in the above table provides the odds ratio. Because the independent or predictor variables are quantitative, these values may be interpreted as the increase in the odds of being classified as male associated with an increase in one unit on the skeletal dimension measure. For example, the odds ratio for GO is 1.242 suggesting that an increase in the measured value of GO by one unit increases the odds of being classified as male by 1.242 times.

A sex prediction equation that requires nine measurements of the cranium, seven of which are not significantly associated with the dependent variable, is unlikely to be useful in practice. Therefore, variables were selected using a stepwise procedure. This may be performed in either a forward or backward direction; at each stage the significance of inclusion or elimination of the variable is tested. The tests are based on the change in likelihood from including or excluding the variable (Bewick et al, 2005). In this study, backward stepwise elimination, also known as the backward likelihood ratio method, was used. Backward elimination begins with a full model consisting of all candidate predictor variables. Variables are sequentially eliminated from the model until a prespecified stopping rule is satisfied. In contrast, forward selection begins with the empty (intercept-only) model. Variables are added sequentially to the model until a predefined stopping rule is satisfied (Austin & Tu, 2004). One disadvantage to forward stepwise methods is the possible exclusion of variables involved in suppressor

303

effects. The term ‘suppressor effects’ refers to a phenomenon in which a variable may appear to have a statistically significant effect only when another variable is controlled or held constant.

With backward elimination, since all variables will already be in the model, there is less risk of failing to find a relationship when one exists. Usually, the results of backward elimination and forward inclusion methods of stepwise logistic regression will produce the same results; however, when they differ, backward elimination may uncover relationships missed by forward inclusion (Menard, 2010: 117).

Re-running logistic regression using the backward likelihood ratio method produced a model with a Nagelkerke R2 for the final step of 0.782, suggesting that the model accounts for

78% of the variance of the dependent variable. The Hosmer–Lemeshow test for the final step produced a P-value of 0.993, demonstrating that there is acceptable agreement between the predicted and observed probabilities. The c-statistic associated with this test (equivalent to the area under the Receiver Operating Characteristics, ROC, curve) was 0.962, which indicates outstanding discrimination of the model (Hosmer & Lemeshow, 2000: 162). In total, the sex of

87.5% of the sample was correctly classified by the model; the accuracy rates in males and females were 89.7% and 84.9%, respectively. Table 3.4O(v) shows the variables that are in the equation, as well as their associated statistics.

Table 3.4O(v): Variables in cranial equation 1 (complete sample); backward likelihood ratio method.

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 7 of 7 Lower Upper

GO 0.232 10.241 0.001 1.261 1.094 1.454

DB 0.287 18.234 <0.000 1.333 1.168 1.521

BN 0.248 5.333 0.021 1.281 1.038 1.581

Constant -102.814 35.616 <0.000 0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

To use this information to predict the sex of an individual from their cranial measurements (described in Section 2.5.5.2), a logit or log–odd is calculated, as follows:

Li = (GO x 0.232) + (DB x 0.287) + (BN x 0.248) – 102.814.

The probability that the individual is male is then calculated as shown:

304

–[(GO x 0.232) + (DB x 0.287) + (BN x 0.248) –102.814] Pm = 1 ÷ {1 + e }. For example, supposing an individual had the following cranial measurements: GO, 184.6 mm; DB, 132.1 mm; BN, 100.4 mm. The logit would be calculated as:

Li = (184.6 x 0.232) + (132.1 x 0.287) + (100.4 x 0.248) – 102.814 = 2.8251.

–2.8251 Pm = 1 ÷ (1 + e ) = 1 ÷ (1 + 0.059302726) = 0.944. The cut point is 0.5.

Therefore, this result indicates that there is a 94.4% chance that the individual is male and only a 5.6% chance (1 – 0.944) that the individual is female.

To produce a logistic regression equation that could be tested on the sample of skeletal remains from Saqqara-West, LRA with backward elimination was performed using only the cranial dimensions that were measured by both the present author and IK-O (GO, MW, DB).

The statistics associated with this equation, cranial equation 2, are presented in Tables 34.P(i– iv).

Table 3.4P(i): Intercept model statistics for cranial equation 2 (complete sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 109 54.8 1.809 0.179 F: 90 T: 199 (62.6)

Constant is included in the model. The cut value is 0.50.

Table 3.4P(ii): Complete model statistics for cranial equation 2 (complete sample).

Step Omnibus tests of Hosmer–Lemeshow test Model summary model coefficients

Chi- Sig. Chi- Sig. C-statistic ‒2LL Cox & Nagelkerke square square (AUC) Snell R2 R2

2 of 2 5.880 0.661 0.953 110.16 0.561 0.751 Step -0.238* 0.626 - - - - - Block 163.899 <0.000 - - - - - Model 163.899 <0.000 - - - - -

AUC, area under the ROC curve. *A negative chi-square indicates that the chi-squares value has decreased from the previous step.

305

Table 3.4P(iii): Classification table for cranial equation 2 (complete sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 2 of 2 Female 78 12 86.7 Male 15 94 86.2 Overall % 86.4

The cut value is 0.50.

Table 3.4P(iv): Variables in cranial equation 2 (complete sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 2 of 2 Lower Upper

GO 0.276 22.837 <0.000 1.317 1.177 1.475

DB 0.313 26.357 <0.000 1.368 1.214 1.542

Constant -89.212 50.252 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

3.4.3.1.2 The femur

A total of six variables were available for this analysis. The statistics associated with the femoral equation are presented in Tables 3.4Q(i–iv).

Table 3.4Q(i): Intercept model statistics for femoral equation (complete sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 48 58.5 2.367 0.124 F: 34 T: 82 (25.8)

Constant is included in the model. The cut value is 0.50.

306

Table 3.4Q(ii): Complete statistics for the femoral equation (complete sample).

Step Omnibus tests of Hosmer–Lemeshow test Model summary model coefficients

Chi- Sig. Chi- Sig. C-statistic ‒2LL Cox & Nagelkerke square square (AUC) Snell R2 R2

3 of 3 2.783 0.947 0.983 28.398 0.636 0.857 Step -0.323* 0.570 - - - - - Block 82.876 <0.000 - - - - - Model 82.876 <0.000 - - - - -

AUC, area under the ROC curve. *A negative chi-square indicates that the chi-squares value has decreased from the previous step.

Table 3.4Q(iii): Classification table for the femoral equation (complete sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 3 of 3 Female 30 4 88.2 Male 3 45 93.8 Overall % 91.5

The cut value is 0.50.

Table 3.4Q(iv): Variables in the femoral equation (complete sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 2 of 2 Lower Upper

FND 1.064 6.344 0.012 2.899 1.266 6.636

FSC 0.586 6.767 0.009 1.797 1.156 2.796

FTD -0.890 2.761 0.097 0.411 0.144 1.173

EBF 0.278 3.392 0.066 1.320 0.982 1.773

Constant -78.667 13.516 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

307

3.4.3.1.3 The tibia

A total of seven variables were available for this analysis. The statistics associated with the tibial equation are presented in Tables 3.4R(i–iv).

Table 3.4R(i): Intercept model statistics for tibial equation (complete sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 50 59.5 3.010 0.083 F: 34 T: 84 (26.4)

Constant is included in the model. The cut value is 0.50.

Table 3.4R(ii): Complete model statistics for the tibial equation (complete sample).

Step Omnibus tests of Hosmer–Lemeshow test Model summary model coefficients

Chi- Sig. Chi- Sig. C-statistic ‒2LL Cox & Nagelkerke square square (AUC) Snell R2 R2

5 of 5 4.475 0.812 0.980 31.016 0.625 0.844 Step -0.644* 0.422 - - - - - Block 82.367 <0.000 - - - - - Model 82.367 <0.000 - - - - -

AUC, area under the ROC curve. *A negative chi-square indicates that the chi-squares value has decreased from the previous step.

Table 3.4R(iii): Classification table for the tibial equation (complete sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 5 of 5 Female 31 3 91.2 Male 2 48 96.0 Overall % 94.0

The cut value is 0.50.

308

Table 3.4R(iv): Variables in the tibial equation (complete sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 2 of 2 Lower Upper

MSC 0.521 6.980 0.008 1.683 1.144 2.477

PEB 0.785 12.065 0.001 2.192 1.408 3.412

DEB -0.739 5.253 0.022 0.477 0.254 0.898

Constant -59.232 16.572 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

3.4.3.1.4 The first metacarpal (MC1)

A total of five variables were available for this analysis (BAP was excluded based on the results of the intra-observer error test). The statistics associated with the MC1 equation are presented in Tables 3.4S(i–iv).

Table 3.4S(i): Intercept model statistics for the MC1 equation (complete sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 57 58.8 2.948 0.086 F: 40 T: 97 (30.5)

Constant is included in the model. The cut value is 0.50.

Table 3.4S(ii): Complete model statistics for the MC1 equation (complete sample).

Step Omnibus tests of Hosmer–Lemeshow test Model summary model coefficients

Chi- Sig. Chi- Sig. C-statistic ‒2LL Cox & Nagelkerke square square (AUC) Snell R2 R2

3 of 3 7.881 0.445 0.975 40.634 0.608 0.819 Step -1.520* 0.218 - - - - - Block 90.842 <0.000 - - - - - Model 90.842 <0.000 - - - - -

AUC, area under the ROC curve. *A negative chi-square indicates that the chi-squares value has decreased from the previous step.

309

Table 3.4S(iii): Classification table for the MC1 equation (complete sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 3 of 3 Female 37 35 92.5 Male 4 53 93.0 Overall % 92.8

The cut value is 0.50.

Table 3.4S(iv): Variables in the MC1 equation (complete sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 2 of 2 Lower Upper

IAL 0.648 6.121 0.013 1.912 1.144 3.194

HML 1.658 8.992 0.003 5.251 1.776 15.522

MS 1.675 8.739 0.003 5.341 1.759 16.220

Constant -68.419 18.304 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

3.4.3.1.5 The lower limb

An analysis using all 13 dimensions of the femur and tibia produced a model that failed to meet the criteria specified above. LRA was therefore performed using just the dimensions of the femur and tibia that were measured by both the present author and IK-O. The statistics associated with this lower limb equation are presented in Tables 3.4T(i–iv). This equation is available for testing on the Saqqara-West sample.

Table 3.4T(i): Intercept model statistics for the lower limb equation (complete sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 51 59.3 2.942 0.086 F: 35 T: 86 (27.0)

Constant is included in the model. The cut value is 0.50.

310

Table 3.4T(ii): Complete model statistics for the lower limb equation (complete sample).

Step Omnibus tests of Hosmer–Lemeshow test Model summary model coefficients

Chi- Sig. Chi- Sig. C-statistic ‒2LL Cox & Nagelkerke square square (AUC) Snell R2 R2

1 11.623 0.169 0.961 37.436 0.600 0.809 Step 78.791 <0.000 - - - - - Block 78.791 <0.000 - - - - - Model 78.791 <0.000 - - - - -

AUC, area under the ROC curve.

Table 3.4T(iii): Classification table for the lower limb equation (complete sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 1 Female 32 3 91.4 Male 3 48 94.1 Overall % 93.0

The cut value is 0.50.

Table 3.4T(iv): Variables in the lower limb equation (complete sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 1 Lower Upper

FHD 0.600 5.502 0.019 1.822 1.104 3.009

PEB 0.385 6.060 0.014 1.469 1.082 1.996

Constant -51.173 21.836 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

Of the new logistic regression equations presented above, the greatest accuracy rate was obtained for the equation using measurements of the tibia alone (94.0%) followed by that which used measurements of the lower limb (femur and tibia; 93.0%). High rates of accuracy were additionally obtained by the equation using measurements of MC1 (92.8%) and the femur alone

(91.5%).

311

3.4.3.2 Late Period sample

3.4.3.2.1 Cranial equation 1

A total of nine variables were available for this analysis (OF and ML were excluded based on the results of the intra-observer error test). The statistics associated with cranial equation 1 are presented in Tables 3.4U(i–iv).

Table 3.4U(i): Intercept model statistics for cranial function 1 (Late Period sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 68 55.7 1.599 0.206 F: 54 T: 122 (79.2)

Constant is included in the model. The cut value is 0.50.

Table 3.4U(ii): Complete model statistics for cranial equation 1 (Late Period sample).

Step Omnibus tests of Hosmer–Lemeshow test Model summary model coefficients

Chi- Sig. Chi- Sig. C-statistic ‒2LL Cox & Nagelkerke square square (AUC) Snell R2 R2

6 of 6 2.512 0.961 0.984 40.286 0.648 0.867 Step -1.198* 0.274 - - - - - Block 127.232 <0.000 - - - - - Model 127.232 <0.000 - - - - -

AUC, area under the ROC curve. *A negative chi-square indicates that the chi-squares value has decreased from the previous step.

Table 3.4U(iii): Classification table for cranial equation 1 (Late Period sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 6 of 6 Female 49 5 90.7 Male 5 63 92.6 Overall % 91.8

The cut value is 0.50.

312

Table 3.4U(iv): Variables in cranial equation 1 (Late Period sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 6 of 6 Lower Upper

GO 0.522 10.645 0.001 1.686 1.232 2.308

MW -0.213 3.115 0.078 0.808 0.637 1.024

DB 0.392 13.407 <0.000 1.480 1.200 0.1826

BP 0.160 2.934 0.087 1.174 0.977 1.410

Constant -129.591 18.107 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

3.4.3.2.2 Cranial equation 2

To produce an equation that could be tested on the skeletal remains from the Saqqara-West sample, LRA was performed using just the cranial dimensions that were measured by both the present author and IK-O. The statistics associated with cranial equation 2 are presented in

Tables 3.4V(i–iv).

Table 3.4V(i): Intercept model statistics for cranial equation 2 (Late Period sample).

Valid cases, n (%) Intercept model statistics

Percentage correct, % Wald statistic Significance

M: 81 55.1 1.525 0.217 F: 66 T: 147 (95.5)

Constant is included in the model. The cut value is 0.50.

313

Table 3.4V(ii): Complete model statistics for cranial equation 2 (Late Period sample).

Step Omnibus tests of Hosmer–Lemeshow test Model summary model coefficients

Chi- Sig. Chi- Sig. C-statistic ‒2LL Cox & Nagelkerke square square (AUC) Snell R2 R2

1 3.592 0.892 0.984 49.755 0.646 0.864 Step 152.497 <0.000 - - - - - Block 152.497 <0.000 - - - - - Model 152.497 <0.000 - - - - -

AUC, area under the ROC curve. *A negative chi-square indicates that the chi-squares value has decreased from the previous step.

Table 3.4V(iii): Classification table for cranial equation 2 (Late Period sample).

Predicted

Sex Percentage correct, % Observed Female Male

Step 1 Female 59 7 89.4 Male 5 76 93.8 Overall % 91.8

The cut value is 0.50.

Table 3.4V(iv): Variables in cranial equation 2 (Late Period sample).

Variables Equation Wald Sig. Odds 95% CI for odds ratio coefficients ratio

Step 1 Lower Upper

GO 0.522 16.905 <0.000 1.685 1.314 2.161

MW -0.193 3.446 0.063 0.825 0.673 1.011

DB 0.443 17.368 <0.000 1.558 1.265 1.919

Constant -124.078 22.853 <0.000 <0.000 - -

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. CI, confidence interval.

314

3.4.4 Comparison of samples: two-factor ANOVA

3.4.4.1 Effect of sex and time period on skeletal dimensions

3.4.4.1.1 Raw data

In total, 59 separate two-factor ANOVA tests were performed corresponding to the 59 skeletal dimensions (variables) included in the study (four of the original 63 variables were excluded based on the results of the intra-observer error test). In addition to the Bonferroni correction, which was used to adjust for the familywise error rate and the multiple comparisons being made within each test (Section 2.5.3), adjustment of the significance level was additionally required for the number of tests being performed (n=59); therefore, to adjust for multiple testing, the alpha level of 0.05 was divided by 59 to give a P-value of 0.0008. Table 3.4W provides a summary of the two-factor ANOVA results for the 59 dimensions. In each test, the dimension is the dependent variable, and sex and time period are the independent variables. The results obtained when the ANOVA tests were performed using Z-Scores instead of the raw data were exactly the same as those presented in Table 3.4W below.

Table 3.4W: Results of the two-factor ANOVA tests examining the effect of sex and time period on skeletal dimensions.

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

GO (260) Pre P OK 1.000 0.056 LP 0.189 0.239 F 1.815 112.43 4.36 3.52 OK Pre P 1.000 0.056 Sig. 0.110 <0.0001 0.014 0.031 LP 0.005* 0.742

LP Pre P 0.189 0.239 OK 0.005* 0.742

MW (256) Pre P OK 0.012* <0.001* LP 0.002* <0.001* F 2.352 36.16 20.56 0.64 OK Pre P 0.012* <0.001* Sig. 0.041 <0.0001 <0.001 0.528 LP 1.000 1.000

LP Pre P 0.002* <0.001* OK 1.000 1.000

315

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

BB (230) Pre P OK 0.808 0.074 LP 0.403 0.887 F 2.258 30.71 2.823 1.389 OK Pre P 0.808 0.074 Sig. 0.050 <0.0001 0.062 0.251 LP 1.000 0.200

LP Pre P 0.403 0.887 OK 1.000 0.200

DB (188) Pre P OK 1.000 0.742 LP 1.000 0.289 F 1.193 54.32 1.61 1.44 OK Pre P 1.000 0.742 Sig. 0.314 <0.0001 0.203 0.241 LP 0.012* 1.000

LP Pre P 1.000 0.289 OK 0.012* 1.000

PN (209) Pre P OK 1.000 1.000 LP 1.000 0.634 F 0.439 30.65 1.39 0.19 OK Pre P 1.000 1.000 Sig. 0.821 <0.0001 0.253 0.831 LP 1.000 0.820

LP Pre P 1.000 0.634 OK 1.000 0.820

BN (227) Pre P OK 1.000 1.000 LP 0.955 0.945 F 0.453 63.47 0.89 2.12 OK Pre P 1.000 1.000 Sig. 0.811 <0.0001 0.413 0.123 LP 0.036* 1.000

LP Pre P 0.955 0.945 OK 0.036* 1.000

BP (210) Pre P OK 0.183 0.428 LP F 0.782 20.79 3.46 1.04 0.768 0.025*

Sig. 0.563 <0.0001 0.033 0.354 OK Pre P 0.183 0.428 LP 0.341 1.000

LP Pre P 0.768 0.025* OK 0.341 1.000

NB (215) Pre P OK 0.782 1.000 LP 0.065 0.112

316

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

F 3.093 8.91 5.99 0.106 OK Pre P 0.782 1.000 LP 0.354 0.561 Sig. 0.010 0.003 0.003 0.899 LP Pre P 0.065 0.112 OK 0.354 0.561

PB (191) Pre P OK 1.000 1.000 LP 1.000 0.090 F 1.42 7.90 1.93 1.23 OK Pre P 1.000 1.000 Sig. 0.218 0.005 0.149 0.294 LP 1.000 0.335

LP Pre P 1.000 0.090 OK 1.000 0.335

XSL (73) Pre P OK 0.214 0.790

F 0.345 21.81 0.44 1.13

Sig. 0.793 <0.0001 0.508 0.292

XDH (93) Pre P OK 0.887 0.525

F 1.48 22.13 0.07 0.23

Sig. 0.225 <0.0001 0.797 0.634

DSD (103) Pre P OK 0.021* 0.371

F 3.17 9.64 1.27 5.48

Sig. 0.028 0.002 0.263 0.021

DTD (103) Pre P OK 0.211 0.677

F 1.10 7.37 1.51 0.55

Sig. 0.352 0.008 0.222 0.462

LVF (100) Pre P OK 0.377 0.063

F 0.72 0.60 0.25 3.45

Sig. 0.541 0.442 0.622 0.066

SFB (78) Pre P OK 0.607 0.600

F 0.26 26.37 0.01 0.56

Sig. 0.857 <0.0001 0.937 0.457

SFS (89) Pre P OK 0.698 0.213

F 0.18 11.08 1.36 0.39

317

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

Sig. 0.911 0.001 0.247 0.536

SFT (94) Pre P OK 0.155 0.280

F 0.69 22.29 3.22 0.03

Sig. 0.559 <0.0001 0.076 0.866

FHD (121) Pre P OK 0.172 0.270

F 1.05 122.48 3.09 0.03

Sig. 0.376 <0.0001 0.081 0.875

FND (128) Pre P OK 0.342 0.567

F 0.69 84.68 0.10 1.14

Sig. 0.561 <0.0001 0.754 0.288

FSC (126) Pre P OK 0.416 0.301

F 0.86 119.92 0.03 1.71

Sig. 0.462 <0.0001 0.858 0.193

XFL (97) Pre P OK 0.148 0.340

F 1.64 47.34 2.91 0.15

Sig. 0.186 <0.0001 0.092 0.700

FTD (127) Pre P OK 0.104 0.621

F 1.40 48.92 0.50 2.16

Sig. 0.245 <0.0001 0.479 0.145

EBF (84) Pre P OK 0.163 0.488

F 2.32 83.52 0.14 2.10

Sig. 0.082 <0.0001 0.707 0.151

TL (99) Pre P OK 0.912 0.528

F 0.85 59.28 0.30 0.161

Sig. 0.473 <0.0001 0.584 0.690

CNF (126) Pre P OK 0.044 0.588

F 0.42 77.79 3.29 1.13

Sig. 0.741 <0.0001 0.072 0.291

318

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

MSC (120) Pre P OK 0.092 0.928

F 2.68 93.77 1.23 1.54

Sig. 0.050 <0.0001 0.269 0.218

APD (126) Pre P OK 0.259 0.962

F 0.59 97.81 0.69 0.59

Sig. 0.621 <0.0001 0.407 0.445

TB (123) Pre P OK 0.004* 0.389

F 0.56 39.21 7.23 2.27

Sig. 0.641 <0.0001 0.008 0.134

PEB (80) Pre P OK 0.578 0.781

F 2.12 108.14 0.33 0.01

Sig. 0.104 <0.0001 0.568 0.939

DEB (110) Pre P OK 0.995 0.391

F 1.45 67.65 0.48 0.49

Sig. 0.232 <0.0001 0.490 0.486

HHD (117) Pre P OK 0.016* 0.565

F 0.73 102.97 4.16 1.25

Sig. 0.537 <0.0001 0.044 0.266

XHL (97) Pre P OK 0.186 0.134

F 2.29 58.88 4.25 0.12

Sig. 0.083 <0.0001 0.042 0.734

EWH (130) Pre P OK 0.404 0.501

F 0.47 93.18 1.12 0.04

Sig. 0.703 <0.0001 0.293 0.845

XRL (97) Pre P OK 0.695 0.637

F 1.26 83.55 0.01 0.38

Sig. 0.291 <0.0001 0.950 0.541

319

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

RSBB (105) Pre P OK 0.047* 0.025*

F 1.27 74.89 9.02 0.01

Sig. 0.287 <0.0001 0.003 0.952

MAXD (57) Pre P OK 0.015* 0.094

F 2.14 80.68 9.28 0.14

Sig. 0.106 <0.0001 0.004 0.713

MIND (52) Pre P OK 0.064 0.140

F 1.78 76.10 6.06 0.11

Sig. 0.164 <0.0001 0.017 0.741

XUL (81) Pre P OK 0.836 0.904

F 1.42 60.81 0.06 0.006

Sig. 0.243 <0.0001 0.815 0.938

USBB (88) Pre P OK 0.754 0.824

F 0.98 31.27 0.01 0.14

Sig. 0.408 <0.0001 0.931 0.708

IAL (93) Pre P OK 0.362 0.740

F 1.64 60.29 0.22 0.83

Sig. 0.186 <0.0001 0.639 0.366

BML (93) Pre P OK 0.493 0.677

F 0.10 55.71 0.63 0.06

Sig. 0.960 <0.0001 0.430 0.808

HML (93) Pre P OK 0.042* 0.536

F 4.13 72.92 1.05 3.69

Sig. 0.009 <0.0001 0.308 0.058

HAP (91) Pre P OK 0.325 0.924

F 1.40 28.71 0.66 0.47

Sig. 0.248 <0.0001 0.418 0.493

320

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

MS (94) Pre P OK 0.977 0.030*

F 4.73 36.42 1.77 1.65

Sig. 0.004 <0.0001 0.187 0.203

L (105) Pre P OK 0.898 0.979

F 1.05 50.17 0.01 0.01

Sig. 0.375 <0.0001 0.945 0.916

SIH (105) Pre P OK 0.577 0.936

F 0.93 35.75 0.11 0.19

Sig. 0.428 <0.0001 0.745 0.662

MLH (102) Pre P OK 0.708 0.442

F 0.09 51.09 0.13 0.71

Sig. 0.964 <0.0001 0.722 0.403

SIB (103) Pre P OK 0.765 0.994

F 0.39 51.36 0.04 0.04

Sig. 0.760 <0.0001 0.839 0.848

MLB (80) Pre P OK 0.999 0.577

F 0.69 25.31 0.15 0.15

Sig. 0.559 <0.0001 0.699 0.698

MSD (105) Pre P OK 0.002* 0.505

F 0.22 28.19 6.71 2.47

Sig. 0.886 <0.0001 0.011 0.119

IL (48) Pre P OK 0.119 0.858

F 0.45 22.21 0.59 1.19

Sig. 0.719 <0.0001 0.445 0.282

PL (43) Pre P OK 0.485 0.364

F 1.44 2.12 0.15 1.41

Sig. 0.247 0.154 0.706 0.242

321

Variable (n) Levene’s Sex Time Sex*Time Bonferonni test test†

Time Time Male Female 1 2 sig. sig.

HSN (107) Pre P OK 0.642 0.136

F 0.02 11.48 1.80 0.49

Sig. 0.995 0.001 0.183 0.487

ASB (110) Pre P OK 0.238 0.942

F 0.88 43.23 0.43 0.61

Sig. 0.453 <0.0001 0.514 0.437

XCL (83) Pre P OK 0.399 0.005*

F 0.01 40.81 8.31 3.13

Sig. 0.999 <0.0001 0.005 0.081

XHS (18) Pre P OK 0.785 0.843

F 2.54 17.48 0.12 <0.01

Sig. 0.098 0.001 0.740 0.969

XLS (39) Pre P OK 0.345 0.099

F 0.26 35.63 0.55 3.62

Sig. 0.855 <0.0001 0.462 0.065

HAX (104) Pre P OK 0.497 0.856

F 0.40 68.94 0.11 0.36

Sig. 0.754 <0.0001 0.737 0.547

BCB (105) Pre P OK 0.308 0.451

F 1.09 77.09 1.58 0.04

Sig. 0.359 <0.0001 0.212 0.849

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. *The mean difference is significant at the 0.05 level. †Levene’s test of the equality of error variances; tests the null hypothesis that the error variance is equal across groups. Design: intercept + sex + time + sex*time. Pre P, Predynastic Period; OK, Old

Kingdom; LP, Late Period.

Table 3.4W demonstrates that there were statistically significant sex differences in the mean measurements of all 59 skeletal dimensions included in the analysis, excluding LVF (P=0.442) and PL (P=0.154); seven other skeletal dimensions failed to meet the more stringent

322

significance level of P≤0.0008, which was used to adjust for multiple testing. These dimensions were GO (P=0.003), PB (P=0.005), DSD (P=0.002), DTD (P=0.008), SFS (P=0.001), HSN

(P=0.001), and XHS (P=0.001). Statistically significant differences in mean measurements between the different time periods, assessed using the Bonferroni correction, were additionally identified for GO, MW, BP, TB, HHD, RSBB, MAXD, MSD, and XCL. The results in Table 3.4W demonstrate whether these differences were the result of variation over time in the male means only, the female means only, or both. As can be seen, for five of the nine skeletal dimensions listed above, the statistically significant time period differences were the result of changes in the mean male measurement over time. With a significance level of 0.0008, there were no statistically significant interactions between sex and time period.

3.4.5 Degree of sexual dimorphism

Table 3.4X provides a summary of the degree of sexual dimorphism exhibited by each of the 63 dimensions included in the study for the complete study sample, represented by the per cent dimorphism (%D). The results are additionally displayed in an alternative format, a sexual dimorphism index, which is simply 1 + % dimorphism. Male and female means for each dimension are also given.

Table 3.4X: Per cent dimorphism and sexual dimorphism index scores for each of the 63 skeletal dimensions included in the study (complete study sample).

Dimension N for dimension Mean (mm) Per cent Index dimorphism, %

Male Female Male Female

GO 156 119 186.67 176.78 5.59 1.05

MW 153 118 140.47 135.58 3.61 1.04

BB 137 108 135.39 129.94 4.19 1.04

DB 109 90 129.80 121.05 7.23 1.07

PN 122 98 71.32 66.95 6.53 1.07

BN 136 106 102.67 96.87 5.99 1.06

BP 123 98 95.14 90.72 4.87 1.05

NB 129 97 25.25 24.21 4.30 1.04

PB 114 88 62.08 59.48 4.37 1.04

323

Dimension N for dimension Mean (mm) Per cent Index dimorphism, %

Male Female Male Female

OF 132 102 147.80 141.39 4.53 1.05

ML 165 123 33.62 28.46 18.13 1.18

XSL 46 37 48.12 45.15 6.58 1.07

XDH 60 43 37.11 34.44 7.75 1.08

DSD 65 48 11.33 10.49 8.01 1.08

DTD 65 48 10.20 9.66 5.59 1.06

LVF 63 47 15.45 15.39 0.39 >1.00

SFB 52 36 45.06 42.51 6.00 1.06

SFS 59 40 18.16 17.18 5.70 1.06

SFT 60 44 17.11 15.75 8.63 1.09

FHD 78 58 44.86 39.48 13.63 1.14

FND 85 58 31.80 27.47 15.80 1.16

FSC 85 56 89.00 78.21 13.80 1.14

XFL 64 48 454.33 417.56 8.81 1.09

FTD 86 56 25.61 23.11 10.82 1.11

EBF 59 38 77.33 69.69 10.96 1.11

TL 68 46 383.13 351.15 9.11 1.09

CNF 86 55 95.83 83.29 15.06 1.15

MSC 83 52 76.23 66.33 14.93 1.15

APD 86 55 34.17 28.80 18.65 1.19

TB 83 55 22.54 20.01 12.64 1.13

PEB 56 36 72.39 63.97 13.16 1.13

DEB 78 47 44.14 39.60 11.46 1.11

HHD 76 56 43.83 38.15 14.89 1.15

XHL 65 47 319.58 293.02 9.06 1.09

EWH 83 62 61.17 54.18 12.90 1.13

324

Dimension N for dimension Mean (mm) Per cent Index dimorphism, %

Male Female Male Female

XRL 66 46 251.64 226.52 11.09 1.11

RSBB 71 48 30.00 26.81 11.90 1.12

MAXD 42 26 22.79 19.47 17.05 1.17

MIND 37 26 21.37 18.40 16.14 1.16

XUL 56 40 270.82 247.23 9.54 1.10

USBB 61 42 16.82 15.07 11.61 1.12

IAL 62 40 45.60 41.51 9.85 1.10

BML 62 40 15.73 13.77 14.23 1.14

BAP 62 38 15.14 13.53 11.90 1.12

HML 62 40 14.51 12.56 15.53 1.16

HAP 60 40 13.29 11.91 11.59 1.16

MS 63 40 11.80 10.25 15.12 1.15

L 74 41 62.50 57.64 8.43 1.08

SIH 74 41 18.84 16.89 11.55 1.12

MLH 72 40 21.15 18.50 8.92 1.14

SIB 72 41 28.39 25.52 11.25 1.11

MLB 58 30 19.10 17.09 11.76 1.12

MSD 74 41 13.58 11.83 14.79 1.15

IL 37 25 78.53 71.41 9.97 1.10

PL 35 19 85.32 86.77 -1.67 0.98

HSN 77 45 52.87 49.22 7.42 1.07

ASB 79 46 37.13 32.81 13.17 1.13

XCL 56 39 150.29 136.56 10.05 1.10

XHS 15 12 150.79 130.76 15.32 1.15

XLS 31 17 139.79 123.92 12.81 1.13

BXB 36 19 97.07 85.15 14.00 1.14

325

Dimension N for dimension Mean (mm) Per cent Index dimorphism, %

Male Female Male Female

HAX 73 45 38.97 34.20 13.95 1.14

BCB 74 45 27.58 23.78 15.98 1.16

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1.

A total of 10 skeletal dimensions exhibited a degree of sexual dimorphism in excess of 15%

(Figure 3.4D).

20.0 18.1 18.7 17.1 15.8 16.1 16.0 15.1 15.5 15.1 15.3 15.0

10.0

5.0 Per cent sexual dimorphism, %

0.0 ML FND CNF APD MAXD MIND HML MS XHS BCB

Figure 3.4D: Skeletal dimensions showing per cent sexual dimorphism >15%.

ML, mastoid length of cranium; FND, femoral neck diameter; CNF, circumference of the tibia at nutrient foramen; APD, antero-posterior diameter of tibia; MAXD, maximum diameter of radial head; MIND, minimum diameter of radial head;

HML, medio-lateral breadth of head of MC1; MS, midshaft diameter of MC1; XHS, maximum length of scapula; BCB, breadth of glenoid prominence of scapula. For n numbers, see Table 3.4X above.

The greatest degree of sexual dimorphism was exhibited by the antero-posterior diameter of the tibia (APD; 18.65%, index score: 1.19), followed by mastoid length (ML; 18.13%, index score: 1.18). The skeletal dimension exhibiting the lowest degree of sexual dimorphism was the length of the vertebral foramen of the second cervical vertebra (LVF; 0.39%, index score: 1.00); indeed, the male and female means for this dimension were virtually the same

(15.45 mm and 15.39 mm, respectively). Of note, pubic length was the only variable to exhibit a negative per cent dimorphism (-1.65%), as well as an index score of less than one (0.98),

326

highlighting that for this dimension, the female individuals from the population sample are marginally larger, on average, than the males.

Appendix 7.5 provides a summary of the male and female means and per cent sexual dimorphism for each of the 63 study dimensions (or 11 cranial dimensions), broken down by principal time period (Predynastic Period, Old Kingdom, and Late Period). The negative %D values for PB and LVF for Old Kingdom individuals but not Predynastic Period individuals demonstrate that for these dimensions, females were larger than males in the Old Kingdom, but males were larger than females in the Predynastic Period. Using the statistical method developed by Relethford and Hodges (1985) to test the significance of the difference in sexual dimorphism between the two samples, none of the 13 dimensions of the femur and tibia were found to exhibit statistically significant differences in the degree of sexual dimorphism over time.

3.4.5.1 Adjustment for size effects

To examine sex-related differences further and the relative importance of shape, each skeletal dimension was evaluated after size effects had been removed. Exploring body size effects when direct body size data are not available requires a surrogate that is highly correlated with body size (Sylvester & Organ, 2010). A number of researchers have suggested that the most biomechanically relevant estimator of body size is overall body mass (Jungers, 1985: 345–346;

Sylvester & Organ, 2010). Unfortunately, it is highly unlikely that recorded body masses will be available for archaeological collections of skeletal remains. Therefore, femoral head diameter measured in a supero-inferior direction was used as a surrogate for body size because it has been shown to be highly correlated with body mass in modern humans (Grine et al, 1995;

McHenry, 1992; Ruff et al, 1991 & 1997). Furthermore, a number of authors have demonstrated that femoral head size has a much greater correlation with body mass than does femur length

(Lieberaman et al, 2001; Ruff et al, 1991; Wescott, 2006). For every individual included in the study sample, each skeletal measurement was divided by the femoral head diameter (FHD) for that individual to gain a new set of data for which each measurement was presented as a proportion of overall body size for the individual. Exploratory and descriptive statistical analyses were subsequently conducted using SPSS 20.0. Table 3.4Y presents the descriptive statistics for the size-adjusted data.

327

Table 3.4Y: Descriptive statistics of skeletal measurements after adjustment for size.

Dimension N Range Min. Max. Mean* M mean F mean

GO 98 1.51 3.64 5.15 4.26 4.11 4.45

MW 95 1.18 2.74 3.92 3.22 3.10 3.37

BB 75 1.16 2.54 3.70 3.14 3.02 3.29

DB 42 0.95 2.56 3.51 2.93 2.83 3.02

PN 54 0.48 1.38 1.86 1.62 1.59 1.66

BN 71 0.79 1.92 2.71 2.35 2.27 2.45

BP 54 0.72 1.85 2.57 2.23 2.13 2.33

NB 61 0.28 0.44 0.72 0.60 0.59 0.63

PB 60 1.14 0.51 1.65 1.43 1.38 1.50

OF 72 1.16 2.91 4.07 3.46 3.33 3.63

ML 108 0.60 0.45 1.05 0.73 0.74 0.72

XSL 69 0.39 0.92 1.31 1.11 1.08 1.14

XDH 85 0.32 0.70 1.02 0.85 0.83 0.88

DSD 93 0.09 0.21 0.30 0.26 0.25 0.27

DTD 93 0.09 0.19 0.28 0.23 0.23 0.25

LVF 90 0.16 0.29 0.45 0.37 0.35 0.39

SFB 73 0.26 0.91 1.17 1.04 1.01 1.08

SFS 82 0.17 0.33 0.50 0.42 0.41 0.44

SFT 88 0.14 0.33 0.47 0.39 0.38 0.40

FHD 135 0.00 1.00 1.00 1.00 1.00 1.00

FND 134 0.22 0.60 0.82 0.70 0.71 0.70

FSC 123 0.54 1.73 2.27 1.98 1.98 1.99

XFL 107 2.95 8.95 11.90 10.34 10.15 10.58

FTD 125 0.24 0.45 0.69 0.58 0.57 0.59

EBF 91 0.39 1.59 1.98 1.75 1.73 1.78

TL 103 2.62 7.41 10.03 8.69 8.50 8.94

CNF 123 0.67 1.78 2.45 2.11 2.11 2.11

MSC 118 0.44 1.47 1.91 1.69 1.69 1.68

APD 123 0.38 0.50 0.88 0.74 0.75 0.73

TB 122 0.20 0.40 0.60 0.50 0.50 0.51

328

Dimension N Range Min. Max. Mean* M mean F mean

PEB 86 0.37 1.42 1.79 1.62 1.62 1.63

DEB 112 0.24 0.87 1.11 0.99 0.98 1.00

HHD 114 0.21 0.86 1.07 0.97 0.98 0.96

XHL 96 1.96 6.27 8.23 7.24 7.12 7.39

EWH 124 0.58 0.94 1.52 1.36 1.36 1.37

XRL 96 1.62 4.90 6.52 5.66 5.60 5.75

RSBB 104 0.17 0.59 0.76 0.67 0.67 0.68

MAXD 59 0.12 0.44 0.56 0.50 0.50 0.50

MIND 57 0.09 0.43 0.52 0.47 0.48 0.47

XUL 83 1.50 5.51 7.01 6.14 6.04 6.27

USBB 90 0.16 0.29 0.45 0.38 0.37 0.38

IAL 88 0.31 0.88 1.19 1.04 1.01 1.06

BML 89 0.11 0.30 0.41 0.35 0.35 0.35

BAP 86 0.16 0.25 0.41 0.34 0.34 0.35

HML 88 0.13 0.28 0.41 0.32 0.32 0.32

HAP 86 0.12 0.25 0.37 0.30 0.30 0.31

MS 88 0.11 0.22 0.33 0.26 0.26 0.31

L 99 0.47 1.23 1.70 1.42 1.39 1.47

SIH 99 0.14 0.36 0.50 0.42 0.42 0.43

MLH 97 0.17 0.41 0.58 0.47 0.47 0.47

SIB 99 0.15 0.56 0.71 0.64 0.63 0.65

MLB 82 0.14 0.37 0.51 0.43 0.43 0.44

MSD 99 0.13 0.24 0.37 0.30 0.30 0.30

IL 58 0.42 1.57 1.99 1.77 1.75 1.82

PL 52 0.89 1.66 2.55 2.01 1.90 2.21

HSN 108 0.78 0.65 1.43 1.21 1.17 1.25

ASB 112 0.32 0.70 1.02 0.83 0.83 0.84

XCL 87 1.16 2.86 4.02 3.41 3.37 3.47

XHS 26 0.89 2.88 3.77 3.37 3.31 3.45

XLS 44 0.71 2.80 3.51 3.13 3.11 3.17

329

Dimension N Range Min. Max. Mean* M mean F mean

BXB 51 0.50 1.92 2.42 2.17 2.16 2.20

HAX 100 0.19 0.75 0.94 0.87 0.87 0.87

BCB 102 0.13 0.54 0.67 0.61 0.61 0.61

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. *Overall mean for variable.

Min., minimum; Max., maximum; M mean, male mean; F mean, female mean.

Table 3.4Z presents the variability statistics of the size-adjusted data for each of the 63 skeletal dimensions included in the study.

Table 3.4Z: Variability statistics of skeletal dimensions after adjustment for size.

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 98 4.26 0.030 0.30 0.090 4.20 4.32

MW 95 3.22 0.025 0.24 0.059 3.17 3.27

BB 75 3.14 0.028 0.24 0.060 3.08 3.19

DB 42 2.93 0.028 0.18 0.032 2.88 2.99

PN 54 1.62 0.017 0.13 0.016 1.59 1.66

BN 71 2.35 0.021 0.18 0.030 2.31 2.39

BP 54 2.23 0.026 0.19 0.036 2.18 2.28

NB 61 0.60 0.008 0.07 0.004 0.58 0.62

PB 60 1.43 0.021 0.16 0.026 1.39 1.47

OF 72 3.46 0.030 0.26 0.066 3.40 3.52

ML 108 0.73 0.010 0.11 0.011 0.71 0.75

XSL 69 1.11 0.009 0.07 0.005 1.09 1.13

XDH 85 0.85 0.006 0.05 0.003 0.84 0.87

DSD 93 0.26 0.002 0.02 <0.001 0.25 0.26

DTD 93 0.23 0.002 0.02 <0.001 0.23 0.24

LVF 90 0.37 0.004 0.04 0.002 0.36 0.37

SFB 73 1.04 0.008 0.06 0.004 1.03 1.06

SFS 82 0.42 0.004 0.03 0.001 0.41 0.43

SFT 88 0.39 0.004 0.03 0.001 0.38 0.40

FHD 135 1.00 0.000 0.00 0.000 - -

330

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

FND 134 0.70 0.004 0.04 0.002 0.70 0.71

FSC 123 1.98 0.009 0.10 0.011 1.96 2.00

XFL 107 10.34 0.057 0.59 0.353 10.23 10.46

FTD 125 0.58 0.004 0.04 0.002 0.57 0.58

EBF 91 1.75 0.007 0.07 0.005 1.73 1.76

TL 103 8.69 0.052 0.53 0.281 8.58 8.79

CNF 123 2.11 0.013 0.14 0.020 2.09 2.14

MSC 118 1.69 0.009 0.10 0.010 1.67 1.71

APD 123 0.74 0.005 0.06 0.003 0.73 0.75

TB 122 0.50 0.004 0.04 0.002 0.50 0.51

PEB 86 1.62 0.008 0.07 0.005 1.61 1.64

DEB 112 0.99 0.005 0.05 0.002 0.98 1.00

HHD 114 0.97 0.004 0.04 0.002 0.96 0.98

XHL 96 7.24 0.040 0.39 0.154 7.16 7.32

EWH 124 1.36 0.007 0.08 0.006 1.35 1.38

XRL 96 5.66 0.032 0.32 0.100 5.60 5.73

RSBB 104 0.67 0.003 0.03 0.001 0.67 0.68

MAXD 59 0.50 0.003 0.02 0.001 0.49 0.51

MIND 57 0.47 0.003 0.02 <0.001 0.47 0.48

XUL 83 6.14 0.036 0.32 0.105 6.07 6.21

USBB 90 0.38 0.004 0.03 0.001 0.37 0.38

IAL 88 1.04 0.007 0.07 0.004 1.02 1.05

BML 89 0.35 0.003 0.02 0.001 0.35 0.36

BAP 86 0.34 0.004 0.03 0.001 0.33 0.35

HML 88 0.32 0.003 0.02 0.001 0.32 0.33

HAP 86 0.30 0.003 0.03 0.001 0.30 0.31

MS 88 0.26 0.002 0.02 0.001 0.26 0.27

L 99 1.42 0.009 0.09 0.008 1.40 1.44

SIH 99 0.42 0.003 0.03 0.001 0.42 0.43

331

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

MLH 97 0.47 0.003 0.03 0.001 0.46 0.48

SIB 99 0.64 0.003 0.03 0.001 0.63 0.64

MLB 82 0.43 0.004 0.04 0.001 0.42 0.44

MSD 99 0.30 0.003 0.03 0.001 0.29 0.31

IL 58 1.77 0.011 0.09 0.007 1.75 1.80

PL 52 2.01 0.027 0.19 0.037 1.95 2.06

HSN 108 1.21 0.010 0.11 0.011 1.19 1.23

ASB 112 0.83 0.006 0.07 0.004 0.82 0.85

XCL 87 3.41 0.026 0.25 0.060 3.36 3.46

XHS 26 3.37 0.048 0.25 0.061 3.27 3.47

XLS 44 3.13 0.026 0.17 0.029 3.08 3.18

BXB 51 2.17 0.017 0.12 0.016 2.14 2.21

HAX 100 0.87 0.004 0.04 0.002 0.86 0.87

BCB 102 0.61 0.003 0.03 0.001 0.60 0.62

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SEof mean, standard error of the mean; SD, standard deviation.

The independent samples t-test was used to determine whether the size-adjusted skeletal measurement means demonstrated statistically significant differences between males and females. The results of this analysis are presented in Table 3.4AA.

Table 3.4AA: Results of the independent samples t-test comparing male and female size-adjusted means for all 63 skeletal dimensions used in the study.

Dimension N for dimension Mean Levene’s Test* T-value P-value

Male Female Male Female F Sig.

GO 55 43 4.11 4.45 0.446 0.506 -6.494 <0.0001

MW 53 42 3.10 3.37 0.733 0.394 -6.471 <0.0001

BB 42 33 3.02 3.29 0.385 0.537 -5.776 <0.0001

DB 20 22 2.83 3.02 1.403 0.243 -3.976 0.0003

PN 29 25 1.59 1.66 0.580 0.450 -2.121 0.039

BN 40 31 2.27 2.45 0.315 0.577 -4.847 <0.0001

332

Dimension N for dimension Mean Levene’s Test* T-value P-value

Male Female Male Female F Sig.

BP 29 25 2.13 2.33 0.025 0.875 -4.556 <0.0001

NB 35 26 0.59 0.63 0.111 0.740 -2.493 0.015

PB 35 25 1.38 1.50 0.032 0.859 -2.958 0.004

OF 40 32 3.33 3.63 0.422 0.518 -6.268 <0.0001

ML 62 46 0.74 0.72 2.877 0.093 1.103 0.273

XSL 36 33 1.08 1.14 0.062 0.804 -3.975 0.0002

XDH 46 39 0.83 0.88 0.252 0.617 -3.750 0.0003

DSD 50 43 0.25 0.27 2.755 0.100 -3.494 0.001

DTD 50 43 0.23 0.25 0.145 0.705 -5.229 <0.0001

LVF 48 42 0.35 0.39 1.807 0.182 -5.254 <0.0001

SFB 41 32 1.01 1.08 1.105 0.297 -4.727 <0.0001

SFS 46 36 0.41 0.44 5.559 0.021 -3.586 0.001

SFT 48 40 0.38 0.40 2.345 0.129 -2.111 0.038

FHD 77 58 1.00 1.00 N/A N/A N/A N/A

FND 77 57 0.71 0.70 0.065 0.800 1.595 0.113

FSC 69 54 1.98 1.99 0.200 0.656 -0.751 0.454

XFL 59 48 10.15 10.58 0.812 0.370 -3.924 0.0002

FTD 71 54 0.57 0.59 0.110 0.741 -2.643 0.009

EBF 53 38 1.73 1.78 0.692 0.408 -3.470 0.001

TL 59 44 8.50 8.94 0.033 0.857 -4.580 <0.0001

CNF 71 52 2.11 2.11 0.381 0.538 0.119 0.905

MSC 69 49 1.69 1.68 0.195 0.660 0.264 0.792

APD 71 52 0.75 0.73 0.011 0.917 2.060 0.042

TB 70 52 0.50 0.51 0.039 0.844 -0.876 0.383

PEB 51 35 1.62 1.63 2.652 0.107 -0.712 0.478

DEB 67 45 0.98 1.00 4.367 0.039 -2.525 0.013

HHD 63 51 0.98 0.96 2.942 0.089 1.593 0.114

XHL 54 42 7.12 7.39 0.169 0.682 -3.538 0.001

EWH 68 56 1.36 1.37 0.555 0.458 -0.651 0.516

333

Dimension N for dimension Mean Levene’s Test* T-value P-value

Male Female Male Female F Sig.

XRL 54 42 5.60 5.75 0.579 0.449 -2.337 0.022

RSBB 59 45 0.67 0.68 1.267 0.263 -1.635 0.105

MAXD 33 26 0.50 0.50 0.809 0.372 1.255 0.215

MIND 31 26 0.48 0.47 0.075 0.786 1.055 0.296

XUL 47 36 6.04 6.27 1.332 0.252 -3.352 0.001

USBB 52 38 0.37 0.38 4.209 0.043 -1.009 0.316

IAL 51 37 1.01 1.06 2.555 0.114 -3.805 0.0003

BML 52 37 0.35 0.35 2.164 0.145 -0.553 0.582

BAP 51 35 0.34 0.35 0.057 0.812 -1.508 0.135

HML 51 37 0.32 0.32 0.000 0.992 0.013 0.989

HAP 49 37 0.30 0.31 0.086 0.770 -2.183 0.032

MS 51 37 0.26 0.31 0.599 0.441 -0.434 0.665

L 61 38 1.39 1.47 0.008 0.930 -5.181 <0.0001

SIH 61 38 0.42 0.43 0.105 0.746 -2.410 0.018

MLH 60 37 0.47 0.47 0.000 0.995 -0.343 0.732

SIB 61 38 0.63 0.65 0.207 0.650 -3.782 0.0003

MLB 54 28 0.43 0.44 0.076 0.784 -2.043 0.044

MSD 61 38 0.30 0.30 0.06 0.941 -0.147 0.883

IL 36 22 1.75 1.82 0.327 0.570 -3.309 0.002

PL 34 18 1.90 2.21 0.519 0.475 -8.914 <0.0001

HSN 66 42 1.17 1.25 0.092 0.762 -4.144 <0.0001

ASB 69 43 0.83 0.84 1.668 0.199 -1.553 0.123

XCL 50 37 3.37 3.47 1.473 0.228 -2.092 0.039

XHS 14 12 3.31 3.45 0.340 0.565 -1.543 0.136

XLS 28 16 3.11 3.17 0.900 0.348 -1.119 0.270

BXB 33 18 2.16 2.20 0.559 0.458 -0.978 0.333

HAX 59 41 0.87 0.87 0.033 0.855 -0.578 0.5655

BCB 61 41 0.61 0.61 3.127 0.080 -0.939 0.350

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. *Levene’s Test for Equality of Variances. N, number of individuals for whom the dimension could be measured.

334

Using a significance level of P≤0.0008 to adjust for multiple testing, a total of 19 skeletal dimensions (GO, MW, BB, DB, BN, BP, and OF of the cranium; XSL, XDH, DTD, LVF, and SFB of C2; XFL; TL; IAL; L and SIB of MT1; PL; and HSN) were found to demonstrate statistically significant differences between males and females, even after removal of size effects. This may suggest that for these skeletal dimensions, sex differences are additionally the result of significant differences in shape as well as size. These findings are particularly pertinent in the case of LVF and PL, which were not found to demonstrate statistically significant sex differences when size effects were not controlled for (see Table 3.1G). However, it should also be considered that femoral head diameter may not be a perfect surrogate for body size. As shown in Table 3.4Z, the T-values for all except nine of the skeletal dimensions are negative, reflecting the fact that after size correction, most of the female mean values exceed the male mean values.

The size-adjusted skeletal measurement means for two of the principal time periods included in the study (Predynastic Period and Old Kingdom; the Late Period sample could not be included because it consists of cranial material only) are given in Table 3.4BB. This table additionally presents the P-value associated with the t-test statistic (after independent samples t-tests were performed separately in each subsample) to demonstrate whether the differences in male and female values were statistically significant.

Table 3.4BB: Mean values and significance of skeletal dimensions after adjustment for size effects, broken down by principal time period.

Predynastic Period Old Kingdom

Dimension N M mean F mean P-value N M mean F Mean P-value

GO 26 4.19 4.52 0.018 57 4.10 4.37 <0.0001

MW 25 3.14 3.30 0.108 55 3.10 3.42 <0.0001

BB 27 3.03 3.27 0.019 33 3.00 3.29 0.0004

DB 14 2.87 3.04 0.272 17 2.80 3.00 0.015

PN 19 1.62 1.68 0.448 24 1.57 1.62 0.385

BN 25 2.32 2.47 0.037 31 2.25 2.39 0.043

BP 20 2.22 2.37 0.061 23 2.08 2.23 0.080

NB 24 0.60 0.64 0.216 26 0.57 0.61 0.219

335

Predynastic Period Old Kingdom

Dimension N M mean F mean P-value N M mean F Mean P-value

PB 24 1.43 1.52 0.018 25 1.33 1.49 0.099

OF 24 3.38 3.66 0.014 34 3.30 3.60 0.0002

ML 29 0.79 0.74 0.252 64 0.72 0.71 0.724

XSL 21 1.14 1.16 0.517 38 1.07 1.14 0.0004

XDH 23 0.86 0.88 0.262 52 0.83 0.87 0.006

DSD 26 0.25 0.27 0.001 57 0.25 0.26 0.118

DTD 26 0.23 0.25 0.020 57 0.23 0.24 0.001

LVF 26 0.37 0.39 0.228 54 0.34 0.39 <0.0001

SFB 22 1.05 1.11 0.017 41 1.00 1.07 0.0003

SFS 24 0.42 0.45 0.020 48 0.41 0.43 0.045

SFT 25 0.40 0.42 0.355 53 0.38 0.39 0.235

FHD 41 N/A N/A N/A 79 N/A N/A N/A

FND 41 0.71 0.71 0.996 78 0.71 0.69 0.037

FSC 49 2.00 2.03 0.472 68 1.98 1.96 0.642

XFL 38 10.16 10.64 0.017 54 10.16 10.52 0.033

FTD 40 0.57 0.60 0.066 70 0.57 0.58 0.565

EBF 35 1.73 1.79 0.063 43 1.73 1.75 0.161

TL 35 8.59 8.96 0.049 53 8.46 8.93 0.002

CNF 37 2.09 2.12 0.495 71 2.13 2.10 0.525

MSC 37 1.68 1.71 0.442 66 1.70 1.67 0.190

APD 37 0.75 0.74 0.576 71 0.75 0.72 0.045

TB 37 0.49 0.51 0.121 70 0.51 0.51 0.752

PEB 29 1.63 1.64 0.888 45 1.62 1.64 0.326

DEB 35 0.99 1.00 0.644 62 0.98 1.00 0.176

HHD 37 0.96 0.97 0.757 62 0.99 1.00 0.015

XHL 36 7.14 7.43 0.042 45 7.14 7.40 0.021

EWH 38 1.40 1.39 0.640 71 1.34 1.34 0.957

XRL 30 5.74 5.82 0.459 51 5.58 5.72 0.111

RSBB 31 0.67 0.67 0.889 59 0.67 0.68 0.335

336

Predynastic Period Old Kingdom

Dimension N M mean F mean P-value N M mean F Mean P-value

MAXD 23 0.51 0.49 0.181 25 0.50 0.50 0.848

MIND 23 0.48 0.47 0.321 23 0.48 0.47 0.714

XUL 25 6.27 6.38 0.399 43 5.99 6.16 0.061

USBB 29 0.39 0.39 0.986 46 0.37 0.37 0.942

IAL 20 1.06 1.09 0.373 59 1.01 1.06 0.003

BML 20 0.37 0.36 0.747 60 0.35 0.35 0.995

BAP 20 0.35 0.36 0.705 57 0.33 0.34 0.719

HML 20 0.35 0.32 0.041 59 0.32 0.32 0.271

HAP 19 0.31 0.31 0.932 58 0.29 0.31 0.022

MS 20 0.27 0.28 0.595 59 0.26 0.25 0.209

L 26 1.42 1.50 0.011 63 1.38 1.45 0.001

SIH 26 0.42 0.44 0.007 63 0.42 0.43 0.286

MLH 25 0.47 0.49 0.178 62 0.47 0.46 0.616

SIB 26 0.63 0.67 0.033 63 0.63 0.64 0.065

MLB 25 0.43 0.45 0.172 49 0.42 0.43 0.497

MSD 26 0.28 0.30 0.097 63 0.30 0.30 0.304

IL 16 1.76 1.87 0.025 28 1.75 1.80 0.114

PL 14 1.93 2.31 0.001 27 1.90 2.16 <0.0001

HSN 28 1.19 1.26 0.044 65 1.17 1.27 0.001

ASB 28 0.82 0.85 0.393 69 0.82 0.83 0.571

XCL 32 3.36 3.40 0.642 43 3.37 3.58 0.007

XHS 9 3.46 3.56 0.423 8 3.28 3.36 0.812

XLS 16 3.17 3.15 0.772 19 3.08 3.15 0.522

BXB 18 2.22 2.23 0.794 23 2.15 2.21 0.302

HAX 20 0.87 0.90 0.101 66 0.86 0.85 0.509

BCB 20 0.61 0.61 0.706 68 0.61 0.60 0.065

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. M, male; F, female.

The results in Table 3.4BB above demonstrate that in the Predynastic Period sample, sexual dimorphism is primarily a result of size differences between the sexes, given that after

337

adjustment for size effects (by converting skeletal measurements to a proportion of overall body size) there were no statistically significant differences between the male and female means. In comparison, eight skeletal dimensions demonstrated statistically significant sex differences in the Old Kingdom subsample (GO, MW, BB, OF, XSL, LVF, SFB, and PL), suggesting that shape plays an important role in sexual dimorphism of these dimensions in the Old Kingdom time period. The relevance and importance of these findings is considered in Chapter 4,

Discussion.

3.5 Test of discriminant functions and logistic equations on the Saqqara-West sample

3.5.1 Test of discriminant functions

Table 3.5A provides a summary of the three discriminant functions (created using the complete study sample or Late Period Giza subsample) that were blind tested on the Old Kingdom and

Ptolemaic Period skeletons excavated from the Saqqara-West necropolis by IK-O. The table additionally shows the cross-validated accuracy rates associated with each function, in total, and in males and females separately.

Table 3.5A: Discriminant functions that were blind tested on skeletons from the Saqqara-West necropolis.

Function 1 Function 2 Function 3

GO 0.108 0.160

MW -0.074

DB 0.146 0.166

FHD 0.247

PEB 0.178

Constant -38.163 -22.718 -39.949

Sectioning point -0.005 -0.004 -0.0105

Overall accuracy, % 86.4 93.0 91.8

Male accuracy, % 86.2 94.1 93.8

Female accuracy, % 86.7 91.4 89.4

GO, glabello-occipital length of cranium; MW, maximum width of cranium; DB, maximum bizygomatic diameter of cranium; FHD, femoral head diameter; PEB, proximal epiphyseal breadth of tibia. >Sectioning point=male;

338

Function 1 was created using cranial dimensions of the complete study sample, Function 2 using measurements from the femur and tibia of the complete study sample, and Function 3 using cranial measurements from the Late Period Giza subsample.

3.5.2 Accuracy of discriminant functions

Functions 1 to 3 were tested on a total sample size of 119 skeletons. Calculated accuracy rates associated with each function are shown in Table 3.5B, and for Function 1 in Figure 3.5A for the complete population sample from the Saqqara-West assemblage. In Table 3.5B, n denotes the number of correct sex estimates, and N the number of individuals to whom the function could be applied. Accuracy rates are given for the total sample (weighted mean), and are additionally broken down according to sex (male and female) and time period (Old Kingdom and Ptolemaic

Period).

Table 3.5B: Accuracy rates, broken down by sex and time period, associated with the application of discriminant functions to the complete sample of skeletons from the Saqqara-West necropolis.

Correct sex estimates

Function Total Male, n/N (%) Female, n/N (%)

1 Total 81.0 28/35 (80.0) 19/23 (82.6) OK 88.2 8/10 (80.0) 7/7 (100) PP 78.0 20/25 (80.0) 12/16 (75.0)

2 Total 89.0 58/58 (100) 23/33 (69.7) OK 94.7 12/12 (100) 6/7 (85.7) PP 87.5 46/46 (100) 17/26 (65.4)

3 Total 79.3 27/35 (77.1) 19/23 (82.6) OK 82.4 7/10 (70.0) 7/7 (100) PP 78.0 20/25 (80.0) 12/16 (75.0)

OK, Old Kingdom; PP, Ptolemaic Period.

Of the three discriminant functions tested, only Function 1, requiring measurement of GO and

DB, was able to produce acceptable accuracy levels in both males and females. The Ptolemaic

339

Period females’ result was a little below the 80% cut-off point; however, this could potentially be increased with a larger test sample size (Figure 3.5A).

Female Male Total

75.0 PP 80.0 78.0

100.0 OK 80.0 88.2

82.6 Total 80.0 81.0

0.0 20.0 40.0 60.0 80.0 100.0 Accuracy rate, %

Figure 3.5A: Accuracy rates associated with Function 1 when applied to the test sample from the

Saqqara-West necropolis.

OK, Old Kingdom; PP, Ptolemaic Period. Function 1 requires measurement of GO and DB.

Considering the known-sex subsample only, it was possible to apply Functions 1 and 3

(requiring measurements of the cranium) to five individuals. Of these, one individual was misclassified by both functions. In comparison, Function 2, requiring measurements of the femur and tibia, could be applied to seven individuals, all of whom were correctly sexed using the function. These results are promising and support the accuracy of the newly-created discriminant functions when applied to a different test sample. However, it is unfortunate that all the individuals in the known-sex subsample are male. Further tests on known-sex females would be highly beneficial in the future if such a test sample can be obtained.

3.5.3 Test of logistic regression equations

Table 3.5C provides a summary of the three logistic regression equations (created using the complete study sample or Late Period Giza subsample) that were blind tested on the Old

Kingdom and Ptolemaic Period skeletons excavated from the Saqqara-West necropolis by IK-O.

These equations are used to calculate the logit, as described previously. The table additionally

340

shows the accuracy rates associated with each equation, both in total, and in males and females separately.

Table 3.5C: Logistic regression equations that were blind tested on skeletons from the Saqqara-West necropolis.

Equation 1 Equation 2 Equation 3

GO 0.276 0.522

MW -0.193

DB 0.313 0.443

FHD 0.600

PEB 0.385

Constant -89.212 -51.173 -124.078

Sectioning point 0.5 0.5 0.5

Overall accuracy, % 86.4 93.0

Male accuracy, % 86.2 94.1

Female accuracy, % 86.7 91.4

GO, glabello-occipital length of cranium; MW, maximum width of cranium; DB, maximum bizygomatic diameter of cranium; FHD, femoral head diameter; PEB, proximal epiphyseal breadth of tibia. >Sectioning point=male;

Equation 1 was created using cranial dimensions of the complete study sample, Equation 2 using measurements from the femur and tibia of the complete study sample, and Equation 3 using cranial measurements from the Late Period Giza subsample.

3.5.4 Accuracy of logistic regression equations

Equations 1 to 3 were tested on a total sample size of 119 skeletons. Calculated accuracy rates associated with each equation are shown in Table 3.5D. In this table, n denotes the number of correct sex estimates, and N the number of individuals to whom the equation could be applied.

Accuracy rates are given for the total sample (weighted mean), and are additionally broken down according to sex (male and female) and time period (Old Kingdom and Ptolemaic Period).

341

Table 3.5D: Accuracy rates, broken down by sex and time period, associated with the application of logistic regression equations to the complete sample of skeletons from the Saqqara-West necropolis.

Correct sex estimates

Equation Total Male, n/N (%) Female, n/N (%)

1 Total 77.6 29/35 (82.9) 16/23 (69.6) OK 76.5 7/10 (70.0) 6/7 (85.7) PP 78.0 22/25 (80.0) 10/16 (62.5)

2 Total 87.9 58/58 (100) 22/33 (66.7) OK 94.7 12/12 (100) 6/7 (85.7) PP 86.1 46/46 (100) 16/26 (61.5)

3 Total 81.0 29/35 (82.9) 18/23 (78.3) OK 82.4 7/10 (70.0) 7/7 (100) PP 80.5 22/25 (88.0) 11/16 (68.8)

OK, Old Kingdom; PP, Ptolemaic Period.

Of the three logistic regression equations tested, none was able to reach the 80% cut-off mark in males and females separately, although for Equation 3, the total female accuracy rate was only marginally below the 80% threshold. However, it can be seen that the accuracy of the equation was very dependent on time period. Considering the known-sex subsample only, the results obtained were exactly the same as with the discriminant functions; of the five individuals to whom Equations 1 and 3 could be applied, one individual was misclassified, whereas all seven individuals to whom Equation 2 could be applied were correctly sexed.

3.5.5 Comparison of discriminant functions vs. logistic regression equations

Tables 3.5A and 3.5C demonstrate that the discriminant functions and logistic regression equations are equivalent in terms of the dimensions they use and the population used to create them. For example, Function 1 and Equation 1 both require measurement of GO and DB and were created using the complete study sample. Function 2 and Equation 2 both require measurement of FHD and PEB and were created using the complete study sample, and

Function 3 and Equation 3 both require measurement of GO, MW and DB, and were created

342

using the Late Period Giza subsample. The accuracy rates for the three discriminant functions and three logistic regression equations when applied to the skeletal material from the Saqqara-

West assemblage may therefore be compared, as shown in Table 3.5E.

Table 3.5E: Comparison of accuracy rates associated with the discriminant functions and logistic regression equations when tested on the Saqara-West sample.

Discriminant functions, % Logistic regression equations, %

1 2 3 1 2 3

Total 81.0 89.0 79.3 77.6 87.9 81.0

Male 80.0 100 77.1 82.9 100 82.9

Female 82.6 69.7 82.6 69.6 66.7 78.3

As can be seen in Table 3.5E, similar results were obtained with the discriminant functions and logistic regression equations. Using discriminant function 1, the sex of a total of 11 individuals was misclassified. Of these 11 individuals, nine were also misclassified using logistic regression equation 1. For Function 2 and Equation 2, all of the individuals whose sex was misclassified using the discriminant function were also misclassified using the logistic regression equation; an additional individual was misclassified using the equation. Similarly, all the individuals whose sex was misclassified using logistic regression equation 3 were also misclassified using discriminant equation 3; a further individual was misclassified using the function. This suggests that the misclassification of these particular individuals was the result of their specific skeletal dimensions, which deviated from the group means, and not the statistical method used to create the classification formulae.

3.5.6 Analysis of metric data from the Saqqara-West sample

Appendix 7.6 provides a summary of the analysis of metric data from the Saqqara-West sample, including the distribution of data, outliers/extreme scores, comparison of means, and ANOVA test.

343

4 DISCUSSION The primary aim of this research was to test the accuracy of “modern” metric sex estimation methods (c. nineteenth and twentieth century population samples) when applied to human skeletal remains from ancient Egypt. Publications and reports written by osteologists contributing to ongoing excavation projects in Egypt highlight the use of such methods as part of the standard skeletal analysis procedures, despite a lack of studies validating this practice

(Lehner et al, 2008: 51; Rose, 2006; Zabcki & Dabbs, 2010; Zabecki et al, 2012). The findings of the present research indicate that many of the commonly cited and recommended “modern” metric sex estimation methods produce unacceptably low accuracy rates (<80%), or have no real sex discriminatory power, when applied to ancient Egyptian skeletal remains and should not be used in this population.

A total of 12 “modern” methods were tested in the present study. Of these methods, only three were able to meet or exceed the required 80% accuracy cut-off point in both males and females separately and hence overall. These were the Giles and Elliot (1963) cranium method, the Scheuer and Elkington (1993) MC1 method, and Functions 1 and 2 of the Stewart

(1979) multiple bones method, which produced total accuracy rates (calculated as the weighted mean across both sexes) of 91.9%, 84.2%, and 90.1% and 89.6%, respectively. These results largely contradict those of other researchers who tested the accuracy of these three methods in dissimilar population samples. For example, Kajanoja (1966) reported an accuracy rate of 65% when Function 19 developed by Giles and Elliot (1963) was tested on a known-sex sample of

Finnish crania, obtained from dissection room cadavers held within the Department of Anatomy at the University of Helsinki. Franklin and colleagues (2005) obtained an accuracy rate of 70% using Function 2 of Giles and Elliot’s (1963) method to estimate sex in a modern dissection room sample of crania held at the University of Witwatersrand in Johannesburg. By contrast,

Function 1 of the Giles and Elliot (1963) method produced an accuracy rate of 88.5% when tested on known-sex crania submitted as forensic cases to the University of Oklahoma in the

South Central United States (Snow et al, 1979). A search of the literature revealed no previous studies in which Functions 1 and 2 of the multiple bones method of Stewart (1979) were tested on other population samples. The metacarpals sex estimation method developed by Scheuer and Elkington (1993) was tested by Burrows et al. (2003) on a modern, known-sex dissection room sample held at Slippery Rock University School of Physical Therapy, Pennsylvania. They

344

reported an accuracy rate of 65.9% using the logistic regression sex estimation equation for

MC1. The present study is the first to test the Giles and Elliot (1963), Scheuer and Elkington

(1993), and Stewart (1979; Functions 1 and 2) methods on a sample of ancient Egyptian skeletal remains. Based on the findings presented herein, it is possible to recommend these methods to other researchers requiring equations of high and tested accuracy to estimate sex in fragmentary human skeletal remains from ancient Egypt.

Of the nine “modern” methods tested that failed to reach the 80% accuracy cut-off point in the present skeletal sample, the lowest accuracy rates were obtained using the radial head diameter method of Berrizbeitia (1989), the MT1 method of Robling and Ubelaker (1997), and

Function 6 of Stewart’s (1979) multiple bones method. In fact, the weighted total accuracy rates obtained with these methods are worse than what would have been expected using simple guesswork (45.6% [maximum radial head diameter], 44.4% [minimum radial head diameter],

41.4% [MT1], and 34.7% [Function 6]). In each case, these findings are the result of very low accuracy rates in males (19.0%, 10.8%, 12.1%, and 0%, respectively). In comparison, the female “accuracy rate” associated with these methods ranged from 88.5–100%. However, this does not indicate that the methods are very accurate at correctly classifying the sex of females.

Given that to be classified as male the specific sectioning point for a function or equation must be exceeded, ‘female’ may be viewed as the “default” sex assignment for all individuals failing to reach the sectioning point, regardless of their actual sex. In other words, these methods have no actual power of discrimination between the sexes and should not be applied to ancient

Egyptian skeletal remains.

It is a common trend among physical anthropologists to test a newly developed metric sex estimation method on a different population sample. However, for the most part, this has been restricted to modern population samples used in forensic contexts, where there is a strict requirement for testing and peer review of all techniques and theories presented evidentially in a court of law (Rogers, 2005). The present research is therefore unique in that it represents the first attempt to validate the use of a range of “modern” metric sex estimation methods in ancient

Egyptian skeletal samples. Typically, tests of this type result in lower rates of accuracy than were reported in the original investigation (Alunni-Perret et al, 2003; Burrows et al, 2003; Cowal

& Pastor, 2008; Marlow & Pastor, 2011; Ríos Frutos, 2003). Such a finding is, for the most part,

345

clearly corroborated by the findings of the present research. There are several explanations that could account for the difference in original and tested sex estimation accuracy rates observed in these studies. These include high rates of intra- or inter-observer error (Liu, 1988; Weinberg et al, 2005), bias in sampling methods during the creation of documented skeletal reference collections (Eriksen, 1982; Hunt & Albanese, 2005; Komar & Grivas, 2008), or secular trends in growth (Jantz & Meadows Jantz, 2000; Meadows & Jantz, 1995). However, the most likely explanation for this lack of agreement is population differences in overall body size, and in skeletal size, shape, and proportions (Asala, 2001; Cöloğlu et al, 1998; Kemkes & Göbel, 2006;

MacLaughlin & Bruce, 1986; Patriquin et al, 2003; Ruff, 2002; Tanner, 1976; Walker, 2006), which have been found to vary considerably among extant human populations (Ruff, 2002).

Within each population, the phenotypic expression of adult body size and shape results from a synergistic interaction between hereditary (genetic) factors and environmental conditions experienced during growth, including climatic conditions (temperature, altitude), diet, subsistence strategy, activity patterns, disease, and access to resources (Vercellotti et al,

2011). Metric methods of sex estimation, which rely on absolute differences in measured skeletal dimensions and population-specific sectioning points, are therefore prone to error when generalised on a global level.

As a result, numerous previous researchers have proposed that population-specific standards are required for all estimates of sex based on osteometric data (Bidmos & Asala,

2003; Bidmos & Dayal, 2004; Çöloğlu et al, 1998; King et al, 1998; Mall et al, 2000; Özer &

Katayama, 2008; Šlaus & Tomičić, 2005; Steyn & İşcan, 1999; Trancho et al, 1997, amongst others). To date, only two previous studies have attempted to create metric sex estimation methods that are specific to the ancient Egyptians (Dabbs, 2010; Raxter, 2007), and neither of these studies have been tested on a different Egyptian sample. A further aim of the present research was therefore to test the accuracy of these two previously created population-specific methods, under the hypothesis that they would produce high rates of correct sex classification.

The results of this test presented herein largely support this hypothesis. Of the three sectioning points developed by Raxter (2007), only one requiring measurement of the circumference at the nutrient foramen of the tibia failed to meet the 80% cut-off point in both males and females, producing an accuracy rate of 65.5% in females. Of the five discriminant functions created by

346

Dabbs (2010), two, Functions 1 and 2, also produced unacceptably low accuracy rates in females (75.6% and 71.1%, respectively).

There are several explanations that may account for these unexpected findings. The first relates to differences in measuring technique and hence inter-observer error between the present author and Drs. Raxter and Dabbs. However, this is unlikely for two reasons: 1) in the inter-observer error test between the present author and Dr Raxter, mean per cent absolute error and per cent TEM did not exceed 2% and 1%, respectively for any of the skeletal dimensions included in the test. Furthermore, while the result of the paired samples t-test suggested a systematic difference in measuring technique between the two observers for FHD

(P<0.001), this did not appear to affect the accuracy rate when Raxter’s sectioning point for this method was applied to the present study sample (92.3% and 87.9% accuracy in males and females, respectively); and 2) the definitions of the anatomical measurements presented by

Dabbs (2010) were followed closely in the present study when measuring scapulae to ensure a fair test of accuracy. Despite this, one of the dimensions proposed by Dabbs (BXB) was found to demonstrate intra-observer error that did not reach the acceptable level of R (0.95) and was excluded from subsequent statistical analyses. However, this dimension is not included in

Functions 1 and 2.

The second explanation that may account for the unexpected findings relates to differences in the composition of samples used by Drs. Raxter and Dabbs, and the present author. Previous research has demonstrated that skeletal size and/or proportions of ancient

Egyptians changed over time (Masali, 1972; Raxter, 2011; Zakrzewski, 2003). Dr Raxter and the present author both included skeletons from Predynastic Period Keneh, Old Kingdom Giza and Middle Kingdom Sheikh Farag in their studies; however, the present study sample includes a far greater proportion of Old Kingdom Giza individuals than did Raxter’s sample. Similarly, the method created by Dabbs was based on a sample of skeletons from New Kingdom Tell El-

Amarna, whereas the present sample contains only two individuals dated to the New Kingdom and interred at Thebes. As such, the reference samples used in these three pieces of research may not be comparable in terms of skeletal size and proportions despite being derived from the same general population. Indeed, the sectioning points required to separate the sexes based on the length of MC1, which are reported in two recent studies (El Morsi & Al Hawary, 2013; Eshak

347

et al, 2011), varied considerably despite both being derived from the living Egyptian population.

There are, however, some important differences between the samples used in these two studies that may account for this variation. For example, the sample of patients included in the study conducted by Eshak et al. (2011) ranged in age from 18 to 30 years, while the sample included by El Morsi & Al Hawary (2013) consisted of individuals who were considerably older, ranging in age from 17 to 65 years. Previous research has indicated that the age at death distribution of reference and target samples has an important influence on the accuracy of sex estimation

(Case & Ross, 2007; Lazenby, 1998), and that the rate of ageing (bone change) per unit of time is significantly higher in females than males (Kobyliansky et al, 1995). The researchers additionally differed in the imaging procedure used to view the metacarpals and the subsequent medium from which measurements were taken; Eshak and colleagues (2011) took CT scans and used a computerised measure-distance tool, while El Morsi & Al Hawary (2013) took plain

X-rays and measured bone length manually with a caliper. Waitzman and colleagues (1992) reported an excellent level of agreement between measurements taken directly on dry skulls and those made indirectly using CT-generated images. Furthermore, Moshiri et al. (2007) found that measurements taken from CT images of dry skulls were more accurate than those taken from conventional cephalograms; however, other authors have found the opposite to be true

(Chidiac et al, 2002). Finally, the hospitals that the patients attended for their imaging procedures were in different parts of Egypt; at Minya in Upper Egypt in the Eshak et al. (2011) study and at Mansoura in the Delta (Lower Egypt) in the El Morsi & Al Hawary (2013) study.

This could mean that the two patient groups were not equal in terms of socioeconomic status, or dietary and lifestyle factors. According to Habitat for Humanity (2013), Upper Egypt has only

40% of the country’s population yet it is where 80% of the country’s severe poverty is concentrated. In Minya, around 179,000 families are thought to live in severely overcrowded housing, which is often made of mud bricks, with earth floors, cracked walls prone to insect infestation, dilapidated palm roofs prone to rodent infestation, and either unhygienic shared sanitation facilities or no sanitation facilities at all. By comparison, Mansoura is an important industrial and commercial city, and the capital of the Dakahlia Governorate (TourEgypt, 2013), and might therefore be expected to possess greater economic wealth and resources. A number of different studies have demonstrated a relationship between socioeconomic status and growth

348

and development, with individuals from lower socioeconomic groups often experiencing reduced stature, delayed maturation, and reduced body size compared with individuals from higher socioeconomic groups (Dreizen et al, 1967; Graitcer & Gentry, 1981; Habicht et al, 1974;

Martorell et al, 1979).

The low rates of correct sex classification in ancient Egyptian females using the method of Eshak and colleagues (2011; 67.5%) and in ancient Egyptian males using the method of El

Morsi and Al Hawary (2013; 32.8%) may be explained by the findings of genetic studies, which suggest that modern Egyptians have close biological affinities with the Near East and Europe

(Bosch et al, 1997; Manni et al, 2002; Terreros et al, 2005). Alternatively, the low accuracy rates obtained when these “living Egyptian” methods are applied to ancient Egyptian skeletons may indicate that MC1 length alone is not a useful discriminator of sex. As such, width, breadth or height dimensions of the articular ends, as used by Scheuer and Elkington (1993), may additionally be required to accurately separate the sexes in the ancient Egyptian population.

Indeed, increased accuracy of discriminant functions that use both length and width measurements of the metacarpals compared with length alone has been reported by other authors (Case & Ross, 2007; Falsetti, 1995).

Creation of population-specific metric sex estimation methods

A further key aim of this research was to create population-specific metric sex estimation methods for a wide range of skeletal elements that would be of value to other researchers analysing both complete and highly fragmentary human skeletal remains from ancient Egypt.

This aim was considered particularly important in light of the methodological procedures employed by osteologists involved in current excavations in Egypt, which necessarily include metric sex estimation methods because of the poor level of preservation of some of the remains

(Lehner et al, 2008: 51). The present research extends the findings of Raxter (2007) and Dabbs

(2010) by presenting a total of 11 discriminant functions and eight logistic regression equations using measurements of the cranium, femur alone, tibia alone, upper limb (radius and ulna),

MC1, and lower limb (femur and tibia), with comparable or higher rates of cross-validated accuracy than were reported in the previous studies (86.4–96.1% in the present study vs. 89%

[Raxter, 2007] and 84–88% [Dabbs, 2010]). These accuracy rates are additionally comparable to those reported in other studies in which metric sex estimation equations were created for

349

specific populations, including modern North American (Harris & Case, 2012; Holland, 1991), modern Indian (Purkait, 2001), modern South African (Bidmos & Asala, 2003), modern Turkish

(Çöloğlu et al, 1998), prehistoric Scottish (MacLaughlin & Bruce, 1985), prehistoric Californian

(Dittrick & Suchey, 1986), ancient Japanese (Özer & Katayama, 2008), and medieval Croatian

(Šlaus & Tomičić, 2005), amongst others. Furthermore, they are higher than those reported by a number of previous authors who created sex estimation methods using measurements of the cranium (Gapert et al, 2009; Kalmey & Rathbun, 1996), dentition (Angadi et al, 2013; Khamis et al, 2014; Macaluso, 2010; Thapar et al, 2012), or long bones (Saunders & Hoppa, 1997).

Regardless of whether discriminant function analysis or logistic regression was used to create the new population-specific metric sex estimation methods presented herein, the highest cross-validated accuracy rates (>90%) were obtained for functions or equations using the tibia alone (c. 94%), the femur alone (89–92%), or the femur and tibia combined (93–96%). This finding is consistent with the current literature. The results of numerous previous studies – summarised in Table 4.1A – suggest that the tibia and/or femur can separate the sexes with a high degree of accuracy both in modern and ancient human populations.

Table 4.1A: Summary of previous studies reporting metric sex estimation methods for the tibia and femur.

Bone Study Population Measurement(s) Statistical Accuracy, procedure %

Tibia González-Reimers Prehispanic Length, breadth, Discriminant 94.9–98.3 et al, 2000 Canary diameter, function analysis Islanders circumference

Holland, 1991 Modern North Width/breadth of Linear regression 86.0–97.0* American proximal tibia

İşcan & Miller- Modern North Length, breadth, Discriminant 82.1–89.0* Shaivitz, 1984a, American circumference function analysis 1984b

İşcan et al, 1994 Modern Length, breadth, Discriminant 80.0–89.0 Japanese circumference function analysis

Kieser et al, 1992 Modern South Width/breadth of Discriminant 84.6–92.0 African proximal tibia function analysis

Saunders & Hoppa, 19th C Length, breadth, Logistic 76.0–92.7 1997 Canadian diameter, regression circumference

350

Šlaus & Tomičić, Medieval Length, breadth, Discriminant c. 92% 2005 Croatian diameter, function analysis using 6 circumference dimensions

Šlaus et al, 2013 Modern Length, breadth, Discriminant 84.4–91.1 Croatian diameter, function analysis circumference

Femur Asala, 2001 Modern South FHD Univariate NR African sectioning point

Black, 1978a Late Circumference Discriminant 85.0 Woodland function analysis Period North American

DiBennardo & Modern North Circumference Discriminant 82.0 Taylor, 1979 American function analysis

İşcan & Shihai, 19th/20th C Distal epiphyseal Discriminant 94.9 1995 Chinese breadth function analysis

Mall et al, 2000 Modern Midshaft diameter Discriminant 91.7 German + head function analysis circumference

Özer & Katayama, Ancient Length, breadth, Discriminant 66.9–100 2008 Japanese diameter function analysis

Purkait, 2003 Modern Indian FHD Discriminant 92.1 function analysis

Ríos Frutos, 2003 Modern Supero-inferior Discriminant 89.5 Guatemalan FND function analysis

Seidemann et al, 19th/20th C Supero-inferior Discriminant 90.0 1998 North FND function analysis American

Soni et al, 2010 Modern Indian Length, breadth, Discriminant 87.5 diameter, function analysis circumference

Tibia + Steyn & İşcan, 1997 Modern South Diameter, breadth Discriminant 85.9–91.4 femur African function analysis

*Average of multiple functions. FHD, femoral head diameter; FND, femoral neck diameter; NR, not reported.

Further findings from the present study indicate that the high level of cross-validated accuracy associated with the tibial and femoral metric methods created herein is the result of a high degree of sexual dimorphism of these skeletal elements. As shown in Figure 3.4G, Table 3.4X

351

and Appendix 7.5, dimensions of the femur and tibia are among those exhibiting the greatest per cent dimorphism in the ancient Egyptian populations sampled. A discussion of these findings is provided later in this section.

For metric methods of sex estimation to be of value, they must be based on skeletal dimensions that are replicable and can be precisely measured. In this study, measurement replicability (reliability) and precision were assessed in tests of intra- and inter-observer error.

The results of these tests indicate that the majority of skeletal dimensions included in the study can be reliably measured. In the intra-observer error test, the average mean per cent absolute error across all 63 dimensions was 1.65%. In comparison, four dimensions demonstrated relative or per cent TEM in excess of 5%. These were: OF (5.81%), ML (5.35%), BAP (9.00%), and BXB (5.36%). The dimensions OF and BAP additionally failed to meet the critical level for both the Pearson correlation coefficient (r) and coefficient of reliability (R); six additional dimensions (ML, XDH, BML, MLB, HSN, and BXB) failed to meet the critical value of R. Based on these results, four dimensions (OF, ML, BAP, and BXB) were excluded from subsequent statistical analyses. Using a significance level of P<0.0008 to adjust for multiple testing, the results of the paired samples t-test were not statistically significant for any of the skeletal variables included in the study. This suggests that there were no systematic errors in the measurement technique of the author. In the inter-observer error test between the present author and IK-O, none of the 43 dimensions included exceeded 15% per cent absolute error.

Two dimensions, HML and PL, demonstrated %TEM in excess of 7.5%. Eight dimensions failed to meet the critical level of R; however, neither HML nor PL were among them. Owing to the small sample size, r and the paired samples t-test were not performed. Therefore, based on these results, no dimensions were excluded from subsequent analyses. It must be noted, however, that the sample sizes of some skeletal elements were very small, which may suggest that the results are not scientifically robust. Any further research performed in collaboration with

IK-O will require inter-observer error tests using a large sample size. All dimensions included in the inter-observer error test between the present author and MR demonstrated relatively low levels of measurement error. However, statistically significant results of the paired samples t- test were obtained for femoral head diameter (FHD) and tibial length (TL), suggesting systematic differences in the techniques used to measure these dimensions between the two

352

observers. For example, for each of the 43 individuals for whom TL was measured by both observers, the measurement produced by EJM was greater than that produced by MR.

The ability to reliably replicate measurements is an essential component of anthropometric- or osteometric-based studies. Measurement error in anthropometry arises from numerous sources, including instrument precision (assembly, use, reading) and data recording, as well as through human error, the inconsistent execution of the measuring protocol either by the same (intra-) or different (inter-) observers as a result of imprecision in skeletal landmark location, observer positioning, and instrument application (Gordon & Bradtmiller, 1992). There are, of course, ways to limit the degree of intra- or inter-observer error, including using exact and unambiguous definitions of landmarks, enhancing observer experience and hence consistency, ensuring a consistent body position when taking measurements, taking measurements multiple times and then finding the median, and avoiding fatigue; however, total elimination of this source of variation is difficult if not impossible (Harris & Smith, 2009).

Nevertheless, consideration and understanding of the sources of error may help to minimise its expression in anthropometric-based studies. More importantly, quantification of the extent of measurement error allows researchers to make comparisons between different skeletal dimensions and even to reach decisions about whether to include a particular dimension within their data set.

Early studies of inter-observer error reported surprising amounts of disagreement between experienced craniometrists (Utermohle & Zegura, 1982), even in studies where observers had been trained by the same individual (Bennett & Osborne, 1986). In the present study, statistically significant systematic differences were noted in the measuring techniques used by two different observers (the present author and MR) to measure FHD and TL. This finding is consistent with the results of previous research. For example, Gordon and Bradtmiller

(1992) reported statistically significant bias between two observers for 17 of 30 standard anthropometric dimensions, while Flohr and colleagues (2010) found statistically significant differences in eight of 42 dimensions of the incus and malleus measured by two different observers and analysed using the paired samples t-test with Bonferroni adjustment of the P- value. Although these findings indicate a directional bias in the measuring style or technique of different researchers, this is not considered to be a significant limitation for a number of

353

reasons. Firstly, the difference in how FHD was measured by the present author compared with

Dr Raxter did not appear to reduce the accuracy of the FHD sex estimation method created by the latter researcher when tested on the present study sample. Secondly, for each of the 55 individuals for whom FHD was measured by both the present author and Dr Raxter, the two measurements differed by less than 1 mm. A similar finding was reported by Gordon and

Bradtmiller (1992), who suggested that it may be impossible to completely eliminate such subtle stylistic measuring differences because the difference in mean measurements may be smaller than instrument precision levels. Finally, these findings further highlight that a series of paired measurements taken on the same specimens can show both a high degree of agreement (high coefficient of reliability) and a significant difference between observers in a paired t-test due to a small but systematic inter-observer difference (Flohr et al, 2010).

Test of new population-specific methods

To test the hypothesis that population-specific methods of metric sex estimation would produce high rates of correct classification when applied to an independent sample, three of the discriminant functions and three of the logistic regression equations created herein were tested on a sample of Old Kingdom and Ptolemaic Period skeletons from the Saqqara-West necropolis. This study is unique because it is the first to both create and test metric sex estimation methods that are specific to the ancient Egyptians. The findings of the test demonstrate that Functions 1 and 2 and Equation 2 are accurate enough to be used to estimate the sex of skeletal remains from Old Kingdom (but not Ptolemaic Period) contexts at Saqqara;

Function 1 could additionally be considered in individuals for whom the time period is not known.

None of the functions or equations tested performed particularly well in the Ptolemaic

Period subsample. This indicates that the skeletal proportions of the individuals included in this subsample differed to those of the individuals in the reference sample. There are several reasons why this might be the case. The first is that the reference and target populations experienced different conditions during the crucial periods of growth and development. Given the composition of the reference (predominantly Predynastic Period and Old Kingdom) and target samples (Ptolemaic Period), these two groups could be considered to differ with respect to subsistence strategy, socioeconomic status, and access to resources. Previous research has

354

demonstrated that growth in body size during infancy and early childhood is relatively similar in populations growing under optimal environmental conditions. This suggests that any deviations in growth during this period are a reflection of environmental and socioeconomic differences rather than genetic differences (Graitcer & Gentry, 1981; Habicht et al, 1974). Differences in the growth of children from low socioeconomic backgrounds compared to those from high socioeconomic backgrounds reported in the literature include reduced stature and body weight, delayed skeletal maturation, and prolongation of the growth period (Dreizen et al, 1967; Graitcer

& Gentry, 1981; Habicht et al, 1974; Martorell et al, 1979). Other researchers have documented changes in stature in response to changing subsistence strategies (Cardoso & Gomes, 2009;

Mummert et al, 2011; Taylor, 2010; Tobias, 1962; Larsen, 1982), notably the switch to agriculture, which, according to several authors, results in an initial decrease in food quality and quantity and a concomitant decrease in stature and health status (Cohen & Armelagos, 1984;

Larsen, 1995; Nickens, 1976; Starling & Stock, 2007; Stini, 1971; Stock et al, 2011: 347–367;

Zakrzewski, 2003).

As we have seen, the switch from a hunter–gatherer to an agricultural subsistence economy in ancient Egypt developed hand-in-hand with the formation of the State and the rise of hierarchical society (Hassan, 1988; Marcus, 2008; Wenke, 1989, 1991). One theory linking the adoption of agriculture with the development of social stratification is that the elite class developed from groups of individuals with the desire to control agricultural surplus (Bard, 1992;

Castillos, 2007; Frood, 2010). Ultimately, this would lead to a society divided in terms of the quality and quantity of resources at the disposal of members of different social groups. For example, evidence from CT scans of mummies suggests that individuals of high socioeconomic status had access to ‘luxury’ foods that were high in saturated fats (Allam et al, 2011; David et al, 2010; Thompson et al, 2013). Thus, individuals growing in an environment of optimal, or at least plentiful, nutritional resources may be expected to exhibit different adult body size or proportions to individuals who may have suffered protein-calorie malnutrition at some point during childhood given the known relationship between stature, social status and access to resources (Komlos, 1990; Steckel, 1995).

Mortuary evidence suggests that the Ptolemaic Period individuals buried at Saqqara-

West belonged to the non-elite social class. These individuals were predominantly deposited in

355

shallow pit graves dug in the desert sand that contained minimal or no burial goods (Personal

Communication: IK-O, 2013). In comparison, the Old Kingdom burials, which were located beneath the sand and colluvial deposits into which the former graves were dug, consisted of rock-hewn shaft tombs equipped with subterranean burial chambers and grave goods (Personal

Communication: IK-O, 2013). The individuals buried in these tombs at Saqqara-West were therefore considered to belong to the upper or governing classes. Similarly, the Old Kingdom

Giza skeletal remains, which form a large proportion of the reference sample used in this project, were excavated from the Western Mastaba Field near the Great Pyramid of Khufu, which was thought to hold the tombs of the governing classes and high officials (Der Manuelian,

2009: 23; Reisner, 1942). The presence of a class division at Giza is additionally supported by the findings of more recent excavations at the site. For example, in the end of season report for the 1997 Koch-Ludwig Giza Plateau Mapping Project, Lehner (1997) notes that a large number of young male cattle bones were found in what could be a Fourth Dynasty ‘rubbish dump’. Cattle are generally viewed as a costly source of meat; therefore, Lehner (1997) suggests that his team “…could [have been] digging the discarded remains of an expensive way of life”. Despite this, the remains of fish, a cheap and common source of meat protein, were additionally found in abundance. This may suggest that a ‘working class’ also lived in close proximity to the ‘elites’ and that the two social groups were not completely segregated in this early period (Lehner,

1997). Although the evidence appears to support the theory that the reference and Ptolemaic

Period samples were not comparable in terms of social status and access to resources, and therefore may not have experienced similar conditions during growth and development, this is only one interpretation.

A second explanation for the differences in skeletal size and proportions of the reference sample and Ptolemaic Period subsample is that they exhibited different levels of genetic diversity. A number of studies examining cranial and dental traits for time-successive series of ancient Egyptian skeletal remains suggest that the basic population demonstrated a high degree of uniformity throughout the entire Dynastic Period (Berry & Berry, 1972; Irish,

2006). In other words, the size, shape and proportions of ancient Egyptian samples changed little over time, despite the numerous infiltrations, contacts and colonisations that Egypt was subject to throughout its history (Berry et al, 1967). By contrast, the Ptolemaic Period is thought

356

to mark a significant change in the Egyptian population (Berry et al, 1967). During this time

Memphis was a flourishing centre for trade and commerce, distinguished from other principal cities by its large and multiracial population, the result, no doubt, of its geographical centrality and role as capital city for much of the long history of Egypt (Thompson, 1988: 82). The initial immigrants to Memphis are thought to have originated from Persia, Macedonia and the Levant; however, it was the arrival of Alexander the Great in 332 BC and the large-scale settlement of

Greeks in Egypt that was thought to have had a major impact on the population of Memphis.

Indeed, according to Angel (1972), who examined craniometric traits of time-successive

Egyptian, African, and European populations, Thirtieth Dynasty Egyptians were “practically

Hellenistic Greeks” in terms of their cranial proportions (Angel, 1972: 310). In Memphis, the foreign communities initially settled in separate quarters of the city; however, with the passage of time the foreigners, particularly the Greek immigrants, intermarried with Egyptians. According to Keita (1992) this migration is likely to have had a “major genetic impact” on the Egyptian population that probably occurred immediately prior to and during the Ptolemaic Period.

Examining 36 dental morphological variants in 15 time-successive Egyptian samples spanning the Neolithic to the Roman Period, Irish (2006) noted the heterogeneity of the Ptolemaic Period sample included in the study, and suggested that if it is at all representative of the people living in Ptolemaic times, it may provide evidence of foreign admixture. This is supported by the findings of Schillaci and colleagues (2009), who found that the highest level of group diversity, as assessed using non-metric craniodental data and measures of biological distance, occurred during the Ptolemaic and Roman Periods. An alternative explanation, however, is that the

Ptolemaic Period sample used by Irish (2006) and the Saqqara sample used in the present study represent samples of actual Greeks, which is certainly plausible given the separation of the different ethnic groups both in terms of living quarters and burial sites (Thompson, 1988:

88). In addition, the Roman Period samples used by Irish (2006) were found to share close biological affinity with the Dynastic Period samples. Overall, the results of the study did not support significant biological differentiation in the Egyptians of the Ptolemaic and Roman

Periods relative to their dynastic predecessors (Irish, 2006).

Neither Function 3 nor Equation 3, both of which were created using data from the

‘Gizeh “E” series’ cranial sample, produced acceptable accuracy rates in both males and

357

females when applied to the test sample, even when broken down by time period. Several of the points discussed previously may also be relevant here in explaining this finding. In addition, a number of authors have suggested that the Gizeh “E” series is an atypical Egyptian sample, the crania possessing morphological features that are distinct from other Predynastic and early

Dynastic Period samples from the Egyptian Nile Valley (Zakrzewski, 2004). Additional unusual findings were reported by Nikita and colleagues (2012b), who identified four distinct clusters among cranial series from nine North African sites. The Gizeh “E” series was found to cluster with crania from Kerma in Nubia, a result that the authors suggested was difficult to explain, particularly given that this cluster was distinct from another which consisted of other Nile Valley cranial series from Naqada and Badari. Keita (1990) found that of the cranial series from 10 different sites, seven of which were in the Nile Valley, those from Sedment and Gizeh (the “E” series) were the most distinct morphologically. The cranial series from northern Africa were additionally noted to have multivariate metric patterns (by centroid values) that were intermediate to those from tropical Africa and Europe, which Keita (1990) suggests, may be secondary to hybridisation of peoples with different craniometric values from adjacent regions.

This is supported by Howells (1973) who found that the “E” series grouped with European crania in one cluster analysis and with tropical African series in another. However, it is possible that Near Eastern and European crania may be present in the “E” series, which contains crania from the final periods of dynastic Egypt, periods that were characterised by increased foreign rule and immigration (Keita, 1990).

The presence of anomalous individuals within the Gizeh “E” collection is noted by Brace and colleagues (1993). For example, upon examining specimen E597 (which incidentally was not included in the present study sample), they reported:

“…[being] immediately suspicious that a mistake had been made and a patently non-

Egyptian skull had been inadvertently incorporated into the collection. So strong was the

impression that it did not belong that he [the senior author] wrote at the bottom of the data

sheet, ‘But this one walked straight out of the German Neolithic!’” (Brace et al, 1993: 6 & 8).

The authors continued that “…the heavy, double-arched brow ridges, the shelf-like horizontal ridge at inion, and the massive mastoid processes flaring laterally at the bottom were utterly unlike anything else in that 664–341 BC series or in the earlier Pre-dynastic Egyptian material”

358

(Brace et al, 1993; 8). Despite this, Brace and colleagues (1993) reported that the specimen in question demonstrated clear evidence that the brain had been extracted via the nasal aperture and that the individual had been mummified in the manner most often afforded to the socially prominent and wealthy Egyptians. Based on the results of a discriminatory analysis procedure, the authors concluded that specimen E597 was unlikely to have been a native Egyptian (Brace et al, 1993). One explanation that accounts for these various lines of evidence is that the entire series represents a non-indigenous, immigrant population, the individuals of which were buried together in a separate and discrete cemetery; such practices are attested in other parts of the

Memphite necropolis (Thompson, 1988: 88). Alternatively, as other authors have suggested, the

Gizeh “E” series may represent a hybrid population that is neither typical nor representative of the ancient Egyptian population more generally. Either way, it may be fair to suggest that the development of metric sex estimation methods based on the metric measurements of this series is not appropriate, unless use of the resulting functions or equations is restricted to other specimens from the same series.

The results of the blind test additionally demonstrate that comparable rates of accuracy may be achieved with discriminant functions or logistic regression equations, which is consistent with previous studies (Singh & Pathak, 2013; Pohar et al, 2004). For example, Pohar and colleagues (2004) used simulations to demonstrate that the difference in results between logistic regression and discriminant analysis is negligible when sample sizes are over 50. In contrast, Acharya and colleagues (2011a) reported higher levels of accuracy using logistic regression compared with discriminant analysis in odontometric sex prediction. In fact, a perfect fit of the logistic regression model to the odontometric data was derived using the entire dentition, although there was a tendency for allocation accuracy to reduce when maxillary/mandibular teeth were assessed separately and when teeth were missing (Acharya et al, 2011a). Although discriminant analysis has traditionally been the preferred and most widely used statistical method for the creation of new metric sex estimation methods (see Table 2.5A), the utility of logistic regression analysis in this context is becoming increasingly recognised.

Some authors even suggest that it is a better approach than discriminant analysis because it is more flexible in its assumptions (Acharya et al, 2011b; Albanese, 2003; Saunders & Hoppa,

1997). Compared with discriminant analysis, logistic regression analysis does not require that

359

data are normally distributed, linearly related or of equal variance within each group (Acharya et al, 2011b; Walker, 2008). Furthermore, logistic regression can handle both discrete and continuous variables and is designed to accommodate dependent variables that only have two values (in this case male and female) and produces predicted values that can be interpreted as probabilities of group membership (Acharya et al, 2011a; Walker, 2008). Choice of statistical analysis procedure should therefore be based on the nature of the data, sample size, and outcome requirements.

A significant advantage of the test sample used in the present study is that a small proportion was individuals whose sex was known thanks to the preservation of soft tissue and facial hair. It is expected that ongoing excavations at Saqqara-West may reveal additional individuals of known sex, and when combined with the named individuals from Thebes included in the reference sample of the present study, it may be possible that in the future a known-sex reference sample of sufficient size will become available to researchers exploring questions and issues related to sex estimation in ancient Egyptian skeletal remains.

Sexual dimorphism in ancient Egyptians

The final key aim of this research was to quantify the level of sexual dimorphism exhibited by the ancient Egyptian sample included in the study, to examine size and shape differences between males and females, and to explore changes in the economic and political structure of the country that may account for differences between subsamples. Previous research has demonstrated changes over time in sexual size or stature dimorphism (Masali, 1972;

Zakrzewski, 2003). Masali (1972) noted an increase in sexual stature dimorphism from the

Predynastic to the Dynastic Period. Broadly similar results were obtained by Zakrzewski (2003).

The lowest degree of sexual dimorphism of stature, as calculated from the maximum lengths of the femur or tibia, was observed in the Badarian sample (c. 4000–3500 BC) where of all the samples studied females were found to be tallest relative to males. The greatest degree of sexual stature dimorphism was found in the Late Predynastic Period sample; sexual stature dimorphism then decreased to the Early Dynastic period and again to the Old Kingdom, before increasing in the Middle Kingdom (Zakrzewski, 2003). The results of the present study make a further contribution to this area of research by presenting sexual dimorphism indices for a large number of skeletal dimensions, both for the complete study sample, and broken down by

360

principal time period. In the present study, the differences in sexual dimorphism of the femur and tibia exhibited by two temporally distinct subsamples of ancient Egyptian skeletons (from the Predynsatic Period and Old Kingdom) were tested for statistical significance using the method proposed by Relethford and Hodges (1985). Compared with other methods, this one has the advantage of being relatively simple to compute and applicable to summary statistics.

Using this method, none of the 13 dimensions of the femur and tibia were found to exhibit statistically significant differences, suggesting that the degree of sexual dimorphism exhibited by these bones did not change significantly over time.

A huge body of evidence from experimental and observational studies suggests that bone demonstrates functional adaptation to mechanical loading (Burr et al, 1996, 2002; Carlson et al, 2007; Gross et al, 1997; Holt, 2003; Marchi, 2008; Petit et al, 2004; Ruff et al, 2006; Shaw

& Stock, 2009, 2011, 2013; Sládek et al, 2006; Stock, 2006; Umemura et al, 1997). As a result, structural analysis of skeletal morphology, notably cross-sectional geometric properties of the femur and tibia, may be used to investigate changes in mobility or activity patterns in past human populations as a result of changes in subsistence patterns (Carlson et al, 2007; Holt,

2003; Marchi, 2008; Shaw & Stock, 2011; Sládek et al, 2006; see also Section 1.4.2.4). In the simplest terms, mobility patterns affect the loads placed on the lower limbs during locomotion and may therefore influence variations in lower limb diaphyseal robusticity and shape (Shaw &

Stock, 2009). For example, long bone diaphyses respond to increased forces by augmenting their mass in the principal planes of deformation (Lanyon, 1992; Rubin et al, 1990; Shaw &

Stock, 2009). In other words, increased strain, for example through an increase in activity level, leads to deposition of more bone tissue (Gross et al, 1997), while decreased strain, as occurs through inactivity, leads to resorption of bone tissue, both of which act to restore or maintain optimal strain levels (Ruff et al, 2006). A number of previous studies have demonstrated a decrease in mobility (Bridges, 1989; Brock & Ruff, 1988; Holt, 2003; Ruff, 1987) and sexual dimorphism (Ruff, 1987) occurring concomitantly with the switch from a hunter–gatherer to an agricultural subsistence pattern, although this finding is not universal (Carlson et al, 2007).

Previous authors have further suggested that differences in mobility level can be most clearly discerned in the distal bones of the lower limb (Holliday & Ruff, 2001; Marchi, 2008; Meadows-

Jantz & Jantz, 1999; Shaw & Stock, 2009; Stock, 2006). However, the results of this study

361

provide no support for the theory that the switch to agriculture in ancient Egypt, which occurred during the Predynastic Period, resulted in changes in mobility or less distinct male and female roles. This is consistent with previous studies that have suggested that the transition to agriculture does not necessarily result in a decrease in sexual dimorphism (Marchi et al, 2006).

Indeed, some authors have suggested that the switch to agriculture may have actually increased workloads (Bridges, 1989; Marchi et al, 2006), and resulted in more physically distinct roles between males and females. It is possible, however, that the Predynastic Period and Old

Kingdom subsamples included in this study do not represent populations that practised very different subsistence strategies. The finding that sexual dimorphism of the femur and tibia did not change significantly from the Predynastic Period to the Old Kingdom additionally provides no support for the theory that the development of complex social organisation and hierarchy resulted in differences in growth as a result of unequal access to resources. Given that the Old

Kingdom in particular represented the height of social inequality and unequal distribution of wealth and power (McGuire, 1983), an increase in sexual dimorphism may be expected in a society transitioning from egalitarianism to a situation characterised by hierarchy, inequality and unequal access to resources. In such a society, males may have been afforded preferential treatment in terms of resource allocation, a practice that has been demonstrated in parts of Asia

(Behrman, 1988; Chen et al, 1981; Das Gupta, 1987; Khera et al, 2013) and Africa (Hadley et al, 2008). Although the results of sexual dimorphism analyses do not support these theories, the results of the ANOVA test do. For example, of the skeletal dimensions demonstrating statistically significant differences between the Predynastic Period and the Old Kingdom, this was caused by an increase in the male dimensions in the majority of cases.

The results of this study further show that some skeletal dimensions are sexually dimorphic with regards to both size and shape. As previous authors have noted, the separation of size and shape components of skeletal dimensions is very important (Andrews & Williams,

1973; Betti, 2014). For example, traits that appear to be virtually identical in size in males and females may appear notably dimorphic when size is taken into account. This suggests that sexual dimorphism in size and in shape follows different patterns, as was demonstrated in the case of growth trajectories during the development of pelvic sexual dimorphism in the rat

(Berdnikovs et al, 2007). This is clearly supported by some of the findings of the present study.

362

For example, the results of the independent samples t-test demonstrated that pubic length (PL) was not statistically significantly different between males and females. However, after adjustment for body size, this dimension was found to be significantly different between the sexes, presumably as a result of differences in shape. In accordance with the methods of previous researchers, femoral head diameter as a surrogate for body size was used to control for size effects (Grine et al, 1995; McHenry, 1992; Ruff et al, 1997; Wescott, 2006). According to

Ruff and colleagues (1997) articular dimensions are better body-size indicators then diaphyseal breadths because they are much less environmentally sensitive. Furthermore, a number of previous authors have presented evidence of a relationship between femoral head diameter and body mass in modern humans (Grine et al, 1995; McHenry, 1992).

A further 18 dimensions included in the present study were also found to show statistically significant shape differences between males and females using a P-value of 0.0008 to account for multiple testing. When the Predynastic Period and Old Kingdom subsamples were analysed separately, shape was found to contribute to sexual dimorphism of certain skeletal dimensions (primarily of the cranium and second cervical vertebra) in the Old Kingdom sample only. In the Predynastic Period subsample, there were no statistically significant differences between males and females after adjustment for body size.

These results may provide evidence of intra-population differences in growth trajectories that lead to sex differences in size and shape. As discussed in the introduction to this project, the phenotypic expression of adult human body size and shape results from synergistic interactions between hereditary factors and environmental conditions experienced during growth (Vercellotti et al, 2011). In comparison to inter-population variation in growth, intra-population differences have not been particularly well studied (Pinhasi et al, 2011: 178–

179; Saunders, 2000: 135–161). Yet, as previous authors have noted, even genetically homogeneous groups such as laboratory animals or population groups likely to share the same maximum genetic potential show variation in body size and shape (Bogin et al, 1994; Hunter &

Clegg, 1973; Karlberg et al, 1994; Warren & Bedi, 1985), most likely as a result of overall life conditions such as the combination of subsistence strategy, access to resources, social organisation, sanitary conditions, and activity levels, as discussed previously. Vercellotti and colleagues (2011) used PCA to explore intra-population biological variation in body size and

363

shape attributable to sex and social status in a medieval Italian population. They found that the male subsamples exhibited significant post-cranial variation in body size, while female subsamples were found to express smaller, non-significant differences. The analysis of segmental proportions highlighted differences in trunk/lower limb proportions between different status samples, and PCA indicated that in terms of purely morphological variation high status males were distinct from all other groups (Vercellotti et al, 2011). Thus, given that significant differences in skeletal dimensions over time were primarily the result of increases in male proportions (as demonstrated by the two-factor ANOVA test) the hypothesis of intra-population differences in growth is not unreasonable.

In the present study, the unrotated first principal component loadings for the cranial dimensions demonstrated that overall size accounts for the most variation; later PCs were shown to define aspects of shape, in this case the shape of the face and the forward extension of the facial skeleton. The results further showed that different structural components make different contributions to the overall size and shape of the cranium in males and females. For example, in males, the greatest contribution to overall size is made by dimensions that represent the forward extension of the facial skeleton, whereas in females the greatest contribution is made by width measurements. These results contribute to the already large body of literature in which PCA has been used to investigate cranial size either between populations or between males and females from the same population (Bulygina et al, 2006; Evteev et al,

2014; Gordon et al, 2008b; Harvati, 2003; Howells, 1957; Kennedy et al, 1984; Van Gerven et al, 1977; Viðarsdóttir et al, 2002, amongst others).

Performing PCA using Z-Scores to reduce the weighting associated with large measurements it was found that the main source of variation between males and females relates to diameters of the proximal or distal ends of long bones. This may reflect differences between the sexes in functional mobility and is therefore inconsistent with other findings arising from this study. For example, the third PC in males, tibial shape, was found to account for

10.6% of the total variation, whereas in females, the tibial shape PC accounted for only 8.4% of the variance. Furthermore, when the dimensions with the highest loadings on the first PC were used to generate a discriminant function to separate the sexes, not only was proximal

364

epiphyseal breadth of the tibia (PEB) one of only two dimensions included in the function, but the function demonstrated a high cross-validated accuracy rate of 93.3%.

These findings are consistent with the results of other studies performing PCA of long bone measurements, which have shown a tendency for lengths to cluster on one PC, diaphysis breadths on a second PC, and articular dimensions on a third PC. Van Gerven (1972) used

PCA to investigate the relative contribution of size and shape to patterns of sexual dimorphism of the femur. He found that the first PC reflected overall size, while the second reflected femoral shape and consisted primarily of angle and width measurements of the proximal end. It was suggested that the third PC described variation in femoral morphology due to muscular action, which possibly lends support for the theory of the differential effect of mobility on lower limb morphology. Van Gerven (1972) additionally reported high levels of correct sex classification using dimensions with the highest component loadings, and suggested that the contribution of shape-related femoral features to the separation of the sexes is largely due to the functional relationships between the femur and pelvis, and the differential requirement in males and females for efficient locomotion versus weight distribution. Similar results were obtained by

Frelat and colleagues (2012), who demonstrated that the first PC of their analysis of the tibia accounted for 40% of the variation and corresponded to locomotor-related differences between apes and humans, such as the size of the medial condyle, shape of the tuberosity, the pattern of muscular attachments on the shaft, and the curvature of the shaft.

4.1 Limitations of study

This research suffers from a number of limitations. At present, a large reference sample of ancient Egyptian skeletons of known (documented) sex does not exist. As such, the study reference sample consists of individuals whose sex was estimated using standard morphological procedures; from this sample, population-specific metric sex estimation methods were derived. A review of the literature indicates that this research methodology is not uncommon (Dabbs, 2010; Dittrick & Suchey, 1986; MacLaughlin & Bruce, 1985; Murphy, 2005;

Özer & Katayama, 2008; Raxter, 2007; Ríos Frutos, 2005; Šlaus & Tomičić, 2005). However, an often overlooked issue in studies using reference samples of unknown sex is the accuracy rate associated with the initial morphological sex estimates. As discussed in Section 2.2.1.1, osteologists often start at a baseline accuracy rate for sex estimation of 50%, provided there is

365

no additional evidence for them to assume that the majority of individuals in the sample will be one sex or the other. For some skeletal elements, for example the pubic bones, training, experience, and use of appropriate methods will allow for correct estimates in excess of 80% or even 90% (Bruzek, 2002; Ubelaker & Volk, 2002; White & Folkens 2000: 362). Although such accuracy rates on their own may be considered acceptable, problems arise when metric sex- estimation functions or equations are created from samples in which a significant proportion of the sample (c. 10–20%) may have been incorrectly sexed. Assuming that 80% of a study reference sample was correctly sexed using morphological methods and that a newly created discriminant function has a cross-validated accuracy rate of 90%, for example, only 72% (0.8 x

0.9) of the target sample will in reality be correctly sexed using the new function.

Acknowledgement and discussion of this limitation is not common among the numerous studies that have used unknown sex reference samples to create population-specific metric sex estimation methods. Furthermore, the extent to which use of an estimated sex reference sample is considered a limiting factor to the research design varies between investigators. Some authors acknowledge the problem and suggest that a possible solution is to only include individuals in the study sample whose sex was considered to be unambiguous based on morphological indicators of the bony pelvis and skull (Dabbs, 2010; MacLaughlin & Bruce, 1985;

Šlaus & Tomičić, 2005); other authors do not mention it as a limitation at all (Özer & Katayama,

2008; Ríos Frutos, 2005). According to Murphy (2005), research methodology of this type is appropriate because no other options are available: “Undertaking research of this nature in New

Zealand necessitates dealing with such less than ideal situations because there are no skeletal collections derived from remains of known identity or those which offer a larger sample size”

(Murphy, 2005). In contrast, Dittrick and Suchey (1986) provide justification for their research design by citing the results of a self-test of morphological sex estimation accuracy that the second author undertook using a large sample (n=1,300) of paired pubic bones removed from modern, known-sex Americans autopsied at the Office of the Chief Medical Examiner-Coroner in the County of Los Angeles. This test resulted in a high rate of accuracy (99% for individuals aged over 16 years); however, a lower rate might be expected when the same morphological sex estimation standards were applied to prehistoric Central Californian skeletal samples owing to population or temporal differences in morphology.

366

In the present research, a number of steps were taken in an effort to reduce the impact of the limitations associated with deriving metric sex estimation methods from an unknown-sex reference sample. In accordance with previous researchers (Dabbs, 2010; MacLaughlin &

Bruce, 1985; Šlaus & Tomičić, 2005), only individuals whose morphological sex was considered to be unambiguous were included in the study sample. Individuals for whom the pubic bones were available were preferentially selected for inclusion over individuals for whom the pubic bones were missing. Furthermore, the results of the inter-observer error test for morphological sex estimation do not indicate the presence of systematic bias in the author’s ability to assign sex to unknown individuals using the methodology set out in Chapter 2. It may therefore be fair to assume that the sex assessments of the study sample are accurate.

An alternative strategy that could be employed in a study of this type in the future is the use of aDNA analysis to confirm the sex of the individuals included in the study sample.

However, such an approach raises a whole new set of problems, in no small part related to the issues of contamination and authenticity (Gilbert et al, 2005; Malmström et al, 2005; Pilli et al,

2013; Yang & Watt, 2005). In addition, some groups of researchers have suggested that archaeological DNA from ancient Egypt is unlikely to survive for more than 1,000 years, given the environmental conditions of the country, as well as half-life calculations based on the rates of DNA depurination and decay, the latter being measured by aspartic acid racemisation

(Marota et al, 2002; Poinar et al, 1996). Furthermore, such molecular techniques are expensive, time consuming, and not all museums or institutions allow destructive sampling of the skeletons in their collections. They were additionally considered to be beyond the scope of the current project and therefore were not performed complementarily.

A further limitation of the study relates to the composition of the reference sample in general, as well as the composition of the Late Period cranial sample from Giza (the Gizeh “E” series) specifically. For example, the study reference sample, though a reasonable size, is not representative of the entire population from ancient Egypt because it does not include individuals dating to the complete historical period. This, however, appears to be a problem with skeletal reference collections in general, even those constructed from named decedents.

Several sets of authors, for example, have demonstrated that such collections may be biased towards a particular sex or socioeconomic background, and may not be representative of either

367

the living or decedent population from which they were derived (Hunt & Albanese, 2005; Komar

& Grivas, 2008). With regard to the ‘Gizeh “E” series’: as mentioned previously, only individuals whose cranial morphology exhibited unambiguous male or female features were included in the study sample. However, sex estimation is much more difficult when indicators of the bony pelvis are not available for analysis. As such, this sample only really represents a selected population sample of individuals demonstrating hyper male and hyper female morphological features.

There are several implications of constructing a study sample that consists only of the most sexually dimorphic or morphologically unambiguous individuals. For example, individuals exhibiting intermediate morphology (scored as 2 or 4 using the scoring system of Buikstra &

Ubelaker, 1994: 21) will often be excluded from the sample. This may subsequently remove variation from the middle of the morphology range and may therefore render the distribution more platykurtic or bimodal in its distribution. This, however, does not necessarily restrict the discriminating ability of the resulting discriminant functions or logistic regression equations.

Although the centroids of the male and female groups will tend to be slightly more separated, the discriminant functions or equations may be unchanged, and the probability of correct classification of intermediate-sex individuals may also be unchanged.

The findings of other studies suggest that use of cranial morphological indicators alone to estimate sex may result in a skewed or biased sex ratio of the target sample. For example,

Meindl and colleagues (1985) found that skulls of elderly female decedents are 2–5% larger in most linear dimensions than younger females, and may additionally present other qualitative male characteristics. By contrast, when the pubic bones are available, female skeletons are rarely misclassified as male (Meindl et al, 1985). The sex ratio of the reference sample used in the present study is consistent with other reports which have suggested there are nearly always more males than females in skeletal collections from archaeological sites (Walker, 1995: 36;

Weiss, 1972). This may be due to differences in preservation. For example, Walker (1995: 36) demonstrated that in the St. Brides documented skeletal collection statistically significantly more pubic bones of females than males in the age >44 years category could not be used for sex estimation because of poor preservation. Other researchers have noted different patterns of preservation between subadult males and females (Bello et al, 2006). By contrast, Walker and colleagues (1988) found that age biases were more important than sex biases in both the

368

Purisima Mission cemetery near California, which was used as a burial ground between 1813 and 1849, and a prehistoric cemetery known as Ca-Ven-110 in Ventura County, California.

Alternatively, biased sex ratios of archaeological skeletal samples may be the result of mimicry of the reference sample used to create the identification methods, a problem that has been particularly well explored with respect to age at death distributions. For example, according to Bocquet-Appel & Masset (1982) palaeopopulation mortality profiles do not vary significantly from the reference samples used to generate them. Other researchers have additionally shown that in cases where age is estimated rather than known, the traditional method of assigning individuals to age classes will produce biased estimates of age structure

(Konigsberg & Frankenberg, 1992). Although the suggestion of Bocquet-Appel and Masset

(1982) is not supported by the results of several studies in which the mortality profile of a target sample was compared with that of the reference sample used to create the age estimation method and found to differ significantly (Buikstra & Konigsberg, 1985; Mensforth, 1990; Van

Gerven & Armelagos, 1983), it is nevertheless viewed as an important issue in studies of this type (Bullock et al, 2013; DeWitte 2009; Komar & Grivas, 2008; Langley-Shirley & Jantz, 2010;

Meindl & Russel, 1998; Prince & Konigsberg, 2008). Techniques suggested to circumvent the issue of age mimicry include transition analysis (Bullock et al, 2013) or the application of Bayes’

Theorem (DeWitte, 2009; Konigsberg & Frankenberg, 2002; Langley-Shirley & Jantz, 2010;

Prince & Konigsberg, 2008). As succinctly summarised by Milner et al. (2000: 476), Bayes’

Theorem provides a way to estimate the unknown probability that a skeleton has a trait that cannot be observed directly (for example sex or age at death) given that it has a trait that can be observed (such as a greater sciatic notch angle). In Bayesian analysis, the conditional probability of having the unobservable trait given the presence of an observable trait is known as the posterior probability (Milner et al, 2000: 476). This methodology involves estimation of the probability density function [p(a|θi)] for each skeleton, where p(a|θi) is the probability that the skeleton died at age a given that it has characteristics θi, and θi is the set of skeletal traits observed in the i-th skeleton in the sample (Konigsberg & Frankenberg, 1992; Milner et al,

2000: 477). As Konigsberg and Hens (1998) point out, sex estimation can additionally be approached using Bayesian analysis and inverse probability. If the proper inversion is not done, the sex composition of the sample will be biased toward that of the reference sample.

369

Unfortunately, this method requires access to an appropriate, well-characterised skeletal reference sample of known sex (or age at death) (Milner et al, 2000: 476–477), which is not currently available for ancient Egyptians.

Although an important component of this type of research project, the test of newly created population-specific equations on a temporally and geographically distinct sample additionally suffered from several limitations. For example, this test was an unplanned collaboration and is therefore based on a retrospective methodological design. As a result, only a narrow range of skeletal dimensions that were common to the two independent pieces of research could be included, and the equations used for testing had to be selected based on the dimensions they contained rather than their cross-validated accuracy rates. Indeed, some equations had to be specially created based on the available skeletal dimensions and as such are not as accurate as other equations using different dimensions. This further meant that the majority of metric sex estimation equations presented in this study remain untested, and may therefore be of limited use to other researchers.

5 CONCLUSIONS

In this study, a number of different mathematical and statistical approaches were used to investigate sexual dimorphism in ancient Egyptian human skeletal remains and the effect that this has on metric methods of sex estimation. A key consideration of this study related to why males and females demonstrate differences in skeletal size and proportions and thus why it is possible to separate male and female skeletal remains based on morphological or metric methods. The comprehensive review of the literature presented in the introductory section of this thesis demonstrated that the attainment of adult sexual size dimorphism is achieved by numerous interlinking mechanisms, including sexual selection, diet/nutritional status, and sexual division of labour, as well as sex-related differences in growth and development that are apparent during the earliest stages of embryogenesis, in childhood, and during puberty and adolescence, and that clearly have both genetic and hormonal components.

A review of the different methods available to estimate sex from skeletal remains highlighted a number of important problems, limitations and considerations with respect to the ancient Egyptians. Examination of morphological indicators of the bony pelvis is a highly accurate way of estimating sex; however, population differences in morphology must be

370

considered and future work in the field of physical anthropology as a whole may be best directed at adapting the standard morphological methods for use in specific populations. The lack of ancient Egyptian-specific metric sex estimation equations of tested accuracy is quite a considerable limiting factor for researchers working with disarticulated, fragmented or damaged skeletal remains from ancient Egyptian cemetery sites. However, metric techniques in general have several notable advantages over other methods of sex estimation, and the refinement of metric equations for use in different population samples is therefore an important undertaking.

This study was the first to test the applicability of “modern” metric sex estimation equations to ancient Egyptian skeletal remains. These tests of accuracy of previously created metric sex estimation methods demonstrated that the following techniques, functions, or equations may be used by other researchers studying ancient Egyptian skeletal remains:

 Cranial method of Giles and Elliot (1963)

 MC1 method of Scheuer and Elkington (1993)

 Functions 1 and 2 of the multiple bones method of Stewart (1979)

 The HHD and FHD sectioning points of Raxter (2007)

 Functions 3–5 of Dabbs (2010).

The failure of other metric sex estimation methods to accurately separate the sexes, including the nine remaining “modern” methods, the CNF sectioning point of Raxter (2007), Functions 1 and 2 of Dabbs (2010), and the “living Egyptian” MC1 methods, is most reasonably explained by differences in the skeletal size and proportions of the reference and target samples.

New population-specific metric methods of sex estimation were created using both discriminant function analysis and logistic regression analysis. The functions and equations created were comparable in terms of accuracy; therefore, choice of technique should be dictated by the nature of the data under analysis. This was the first study to both create and test the accuracy of new sex estimation methods using reference and test samples from different ancient Egyptian subpopulations. The results of the test demonstrated that Functions 1 and 2 and Equation 2 may be used by other researchers to estimate sex in ancient Egyptian skeletal remains, provided the target sample is similar in composition (in terms of historical period) to the reference and test samples used herein.

371

Sexual dimorphism was investigated using a sexual dimorphism index, principal components analysis, and mathematical adjustment for body size. The results of these various analyses demonstrated that sexual dimorphism was present in ancient Egyptians, and that some skeletal dimensions are sexually dimorphic with regard to both size and shape. Some of these findings may be indicative of changes in mobility, subsistence strategy, or social organisation that occurred during the Predynastic Period and into the Old Kingdom.

The results of this study are relevant to all researchers working with ancient Egyptian skeletal remains who require metric methods of sex estimation of high and tested accuracy that may be applied to even highly fragmented skeletons or isolated bones. The ability to accurately estimate sex has a key role in all studies examining health and disease, stature, body treatment, diet and social status/organisation, given that many of the conclusions reached would be meaningless were it not possible to establish the demographic profile of the samples used. It is hoped that in the future, researchers will be more selective about the metric equations they use to estimate sex, relying only on those equations that have been tested and are accurate in skeletal remains from the populations in question. It is important, however, to recognise, acknowledge and attempt to rectify the limitations of any study, and those related to the present research have been discussed and justified where necessary. Other limitations and key problems integral to this project may be eliminated through additional research. Thus, key areas for future work are summarised as follows:

 To create an ancient Egyptian skeletal reference sample of known sex. This will most

likely involve identifying skeletal populations from cemetery sites where the sex of

individuals may be positively identified as a result of the preservation of soft tissue or

from documentary evidence, and is of vital importance for this type of research.

o This may also be achieved using CT scans of mummies, which often reveal soft

tissue indicators of sex. The mummies themselves may also be accompanied

by inscriptional information. As the resolution of CT images improves it should

be possible to acquire good osteometric data in this way.

o Until such a time as this becomes a reality, reference samples used in future

research should include a large number of individuals and allow comparisons

between different time periods only or different locations only.

372

 To explore the expression of morphological sex indicators that are unique to the ancient

Egyptians in order to refine morphological sex estimation standards for this population.

 To pursue further collaborations with IK-O, as well as other researchers excavating

human remains from other cemetery sites such as Amarna. Studies resulting from such

collaborations will need to be based on scientifically rigorous and prospectively-

designed methodologies, and will hopefully allow additional testing of the metric sex

estimation equations presented herein.

 To explore the possibility of creating metric sex estimation methods that are not

population-specific by basing them on a large and diverse reference sample.

 To expand the research methodology to other populations such as the ancient Nubians,

to allow the creation of population-specific metric sex estimation methods for other

groups of peoples. The research methodology could also be expanded to include

alternative methods to metric sex estimation such as geometric morphometric analysis,

which is based on 3D landmark and semi-landmark data.

Although the results of this project go some way towards addressing a key limitation in the study of ancient Egyptian skeletal remains, there is still a huge amount of research that needs to be undertaken in this field, both by the present author and other investigators. Luckily, it is doubtful that the fascination of ancient Egyptian will ever wane, and with the advance in technology, the future will likely witness an abundance of sophisticated techniques to study one of the world’s oldest and most beloved civilisations.

373

6 REFERENCES

Abd-elaleem SA, Abd-elhameed M, Ewis AA. 2012. Talus measurements as a diagnostic tool for sexual dimorphism in Egyptian population. Journal of Forensic and Legal Medicine 19: 70–76.

Acharya AB, Prabhu S, Muddapur MV. 2011a. Odontometric sex assessment from logistic regression analysis. International Journal of Legal Medicine 125: 199–204.

Acharya AB, Angadi PV, Prabhu S, Nagnur S. 2011b. Validity of the mandibular canine index (MCI) in sex prediction: reassessment in an Indian sample. Forensic Science International 204: 207.e1–207.e4.

Acsádi G, Nemeskéri J. 1970. History of Human Life Span and Mortality. Akadémiai Kiadó: Budapest.

Aggarwal CC, Hinneburg A, Keim DA. 2001. On the surprising behavior of distance metrics in high dimensional space. In: Van den Bussche J, Vianu V (eds). Database Theory – ICDT 2001. Proceedings of the 8th International Conference; London, 4–6 January 2001. Springer: Berlin.

Aggarwal CC, Reddy CK. 2014. Data Clustering: Algorithms and Applications. CRC Press: Boca Raton, FL.

Albanese J. 2003. A metric method for sex determination using the hipbone and femur. Journal of Forensic Science 48: 263–273.

Albanese J. 2013. A method for estimating sex using the clavicle, humerus, radius, and ulna. Journal of Forensic Sciences 58: 1413–1419.

Albanese J, Eklics G, Tuck A. 2008. A metric method for sex determination using the proximal femur and fragmentary hipbone. Journal of Forensic Sciences 53: 1283–1288.

Allam AH, Thompson RC, Wann LS, Miyamoto MI, el-Din A, el Maksoud G, Al-Tohamy Soliman M, Badr I, el-Rahman Amer H, Sutherland ML, Sutherland JD, Thomas GS. 2011. Atherosclerosis in ancient Egyptian mummies. Journal of the American College of Cardiology Imaging 4: 315–327.

Allen DM. 1968. The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16: 125–127.

Allen RC. 1997. Agriculture and the origins of the state in ancient Egypt. Explorations in Economic History 34: 135–154.

Allison MJ. 1984. Chapter 20: Paleopathology in Peruvian and Chilean populations. In: Cohen MN, Armelagos GJ (eds.) Paleopathology at the Origins of Agriculture. Academic Press, Inc: Orlando, FL.

Altman DG, Bland JM. 1996. Comparing several groups using analysis of variance. BMJ 312: 1472–1473.

Alunni-Perret V, Staccini P, Quatrehomme G. 2003. Reexamination of a measurement for sexual determination using the supero-inferior femoral neck diameter in a modern European population. Journal of Forensic Sciences 48: 517–520.

Alvrus A. 1999. Fracture patterns among the Nubians of Semna South, Sudanese Nubia. International Journal of Osteoarchaeology 9: 417–429.

American Anthropological Association. 2007. Early Classification of Nature. Available online: http://www.understandingrace.org/history/science/early_class.html. Accessed January 2015.

374

Amin MF, Hassan EI. 2012. Sex identification in Egyptian population using multidetector computed tomography of the maxillary sinus. Journal of Forensic and Legal Medicine 19: 65–69.

Anderson RT. 1983. Angulation of the basiocciput in three cranial series. Current Anthropology 24: 226–228.

Anderson W. 1992. Badarian burials: evidence of social inequality in Middle Egypt during the early Predynastic era. Journal of the American Research Center in Egypt 29: 51–66.

Andrews P, Williams DB. 1973. The use of principal components analysis in physical anthropology. American Journal of Physical Anthropology 39: 291–304.

Angadi PV, Hemani S, Prabhu S, Acharya AB. 2013. Analysis of odontometric sexual dimorphism and sex assessment accuracy on a large sample. Journal of Forensic and Legal Medicine 20: 673–677.

Angel JL. 1972. Biological relations of Egyptian and Eastern Mediterranean populations during Pre-dynastic and Dynastic times. Journal of Human Evolution 1: 307–313.

Arlot A, Celisse A. 2010. A survey of cross-validation procedures for model selection. Statistics Surveys 4: 40–79.

Armelagos GJ. 1969. Disease in ancient Nubia. Science 163: 255–259.

Armelagos GJ, Van Gerven DP. 1980. Sexual dimorphism and human evolution: an overview. Journal of Human Evolution 9: 437–446.

Arnay-de-la-Rosa M, González-Reimers E, Fregal R, Velasco-Vázquez J, Delgado-Darias T, González AM, Larruga JM. 2007. Canary islands aborigin sex determination based on mandible parameters contrasted by amelogenin analysis. Journal of Archaeological Science 34: 1515–1522.

Arnold AP. 2012. The end of gonad-centric sex determination in mammals. Trends in Genetics 28: 55–61.

Arnold AP, Chen X. 2009. What does the “four core genotypes” mouse model tell us about sex differences in the brain and other tissues? Frontiers in Neuroendocrinology 30: 1–9.

Arnold AP, Chen X, Itoh Y. 2012. What a difference an X or Y makes: sex chromosomes, gene dose, and epigenetics in sexual differentiation. In: Regitz-Zagrosek V (ed.) Sex and Gender differences in Pharmacology. Handbook of Experimental Pharmacology. Volume 214. Springer-Verlag: Berlin.

Arredi B, Poloni ES, Paracchini S, Zerjal T, Fathallah DM, Makrelouf M, Pascali VL, Novelletto A, Tyler-Smith C. 2004. A predominantly Neolithic origin for Y-chromosomal DNA variation in North Africa. American Journal of Human Genetics 75: 338–345.

Asala SA. 2001. Sex determination from the head of the femur of South African whites and blacks. Forensic Science International 117: 16–22.

Austin PC, Tu JV. 2004. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. Journal of Clinical Epidemiology 57: 1138–1146.

Badawi-Fayad J, Cabanis E-A. 2007. Three-dimensional Procrustes analysis of modern human craniofacial form. The Anatomical Record 290: 268–276.

Badro DA, Douaihy B, Haber M, Youhanna SC, Salloum A, Ghassibe-Sabbagh M, Johnsrud B, Khazen G, Matisoo-Smith E, Soria-Hernanz DF, Wells RS, Tyler-Smith C, Platt DE, Zalloua PA, The Genographic Consortium. 2013. Y-chromosome and mtDNA genetics reveal significant contrasts in affinities of modern Middle Eastern populations with European and African populations. PLoS ONE 8: e54616.

375

Bagley SC, White H, Golomb . 2001. Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. Journal of Clinical Epidemiology 54: 979–985.

Bailey SE, Benazzi S, Hublin JJ. 2014. Allometry, merism, and tooth shape of the upper deciduous M2 and permanent M1. American Journal of Physical Anthropology; ePub ahead of print.

Baker PT. 1984. The adaptive limites of human populations. Man 19: 1–14.

Baker BJ, Judd MA. 2012. Development of paleopathology in the Nile Valley. In: Buikstra J, Roberts C (eds.) The Global History of Paleopathology: Pioneers and Prospects. Oxford University Press Inc: New York.

Bard . 1988. A quantitative analysis of the Predynastic burials in Armant Cemetery 1400– 1500. The Journal of Egyptian Archaeology 74: 39–55.

Bard KA. 1992. Toward an interpretation of the role of ideology in the evolution of complex society in Egypt. Journal of Anthropological Archaeology 11: 1–24.

Bard KA. 1994. The Egyptian Predynastic: a review of the evidence. Journal of Field Archaeology 21: 265–288.

Bard KA. 1999. Encyclopedia of the Archaeology of Ancient Egypt. Routledge: New York.

Barkley MS. 1978. Vertebral arch defects in ancient Egyptian populations. Journal of Human Evolution 7: 553–557.

Barrier ILO, L’Abbé EN. 2008. Sex determination from the radius and ulna in a modern South African sample. Forensic Science International 179: 85.e1–85.e7.

Bass WM. 2005. Human Osteology. A Laboratory and Field Manual. 5th edtn. Missouri Archaeological Society Inc: USA.

Batrawi A. 1946. The racial history of Egypt and Nubia: part II. The racial relationships of the ancient and modern populations of Egypt and Nubia. The Journal of the Royal Anthropological Institute of Great Britain and Ireland 76: 131–156.

Bauer CM, Niederstätter H, McGlynn G, Stadler H, Parson W. 2013. Comparison of morphological and molecular genetic sex-typing on mediaeval human skeletal remains. Forensic Science International: Genetics 7: 581–586.

Baxter MJ. 1994. Stepwise discriminant analysis in archaeometry: a critique. Journal of Archaeological Science 21: 659–666.

Beall CM, Baker PT, Baker TS, Haas JD. 1977. The effects of high altitude on adolescent growth in Southern Peruvian Amerindians. Human Biology 49: 109–124.

Beckelmann J, Budik S, Bartel C, Aurich C. 2012. Evaluation of Xist expression in preattachment equine embryos. Theriogenology 78: 1429–1436.

Becker NSA, Touraille P, Froment A, Heyer E, Courtiol A. 2012. Short stature in African pygmies is not explained by sexual selection. Evolution and Human Behavior 33: 615–622.

Beckett S, Lovell NC. 1994. Dental disease evidence of agricultural intensification in the Nubian C-Group. International Journal of Osteoarchaeology 4: 223–240.

Bedford ME, Russell KF, Lovejoy CO, Meindl RS, Simpson SW, Stuart-Macadam PL. 1993. Test of the multifactorial aging method using skeletons with known ages-at-death from the Grant Collection. American Journal of Physical Anthropology 91: 287–297.

376

Behrman JR. 1988. Intrahousehold allocation of nutrients in rural India: are boys favoured? Do parents exhibit inequality aversion? Oxford Economic Papers 40: 32–54.

Behrman JR. 1993. The economic rationale for investing in nutrition in developing countries. World Development 21: 1749–1771.

Bejarano IF, Dipierri JE, Andrade A, Alfaro EL. 2009. Geographic altitude, surnames, and height variation of Jujuy (Argentina) conscripts. American Journal of Physical Anthropology 138: 158–163.

Bello SM, Thomann A, Signoli M, Dutour O, Andrews P. 2006. Age and sex bias in the reconstruction of past population structures. American Journal of Physical Anthropology 129: 24–38.

Bender R, Grouven U. 1996. Logistic regression models used in medical research are poorly presented. BMJ 313: 628.

Bender R, Lange S. 2001. Adjusting for multiple testing – when and how? Journal of Clinical Epidemiology 54: 343–349.

Bennett KA, 1981. On the expression of sex dimorphism. American Journal of Physical Anthropology 56: 59–62.

Bennett KA, Osborne RH. 1986. Interobserver measurement reliability in anthropometry. Human Biology 58: 751–759.

Berdnikovs S, Bernstein M, Metzier A, German RZ. 2007. Pelvic growth: ontogeny of size and shape sexual dimorphism in rat pelves. Journal of Morphology 268: 12–22.

Berge C, Penin X. 2004. Ontogenetic allometry, heterochrony, and interspecific differences in the skull of African apes, using tridimensional procrustes analysis. American Journal of Physical Anthropology 124: 124–138.

Bermejo-Álvarez P, Rizos D, Rath D, Lonergan P, Gutierrez-Adan A. 2008. Epigenetic differences between male and female bovine blastocysts produced in vitro. Physiological Genomics 32: 264–272.

Bernstein RM. 2010. The big and small of it: how body size evolves. Yearbook of Physical Anthropology 53: 46–62.

Berrizbeitia EL. 1989. Sex determination with the head of the radius. Journal of Forensic Sciences 34: 1206–1213.

Berry AC, Berry RJ, Ucko PJ. 1967. Genetical change in ancient Egypt. Man 2: 551–568.

Berry AC, Berry RJ. 1972. Origins and relationships of the ancient Egyptians. Based on a study of non-metrical variations in the skull. Journal of Human Evolution 1: 199–208.

Betti L. 2014. Sexual dimorphism in the size and shape of the os coxae and the effects of microevolutionary processes. American Journal of Physical Anthropology 153: 167–177.

Bewick V, Cheek L, Ball J. 2005. Statistics review 14: logistic regression. Critical Care 9: 112–118.

Bianucci R, Mattutino G, Lallo R, Charlier P, Jouin-Spriet H, Peluso A, Higham T, Torre C, Massa ER. 2008. Immunological evidence of Plasmodium falciparum infection in an Egyptian child mummy from the early Dynastic period. Journal of Archaeological Science 35: 1880–1885.

Bidmos MA, Asala SA. 2003. Discriminant function sexing of the calcaneus of the South African Whites. Journal of Forensic Sciences 48: 1213–1218.

377

Bidmos MA, Dayal MR. 2003. Sex determination from the talus of South African Whites by discriminant function analysis. American Journal of Forensic Medicine and Pathology 24: 322–328.

Bidmos MA, Dayal MR. 2004. Further evidence to show population specificity of discriminant function equations for sex determination using the talus of South African blacks. Journal of Forensic Sciences 49: 1165–1170.

Bierry G, Le Minor J-M, Schmittbuhl M. 2010. Oval in males and triangular in females? A quantitative evaluation of sexual dimorphism in the human obturator foramen. American Journal of Physical Anthropology 141: 626–631.

Bigoni L, Velemínská J, Brůžek J. 2010. Three-dimensional geometric morphometric analysis of cranio-facial sexual dimorphism in a Central European sample of known sex. HOMO – Journal of Comparative Human Biology 61: 16–32.

Bilfeld MF, Dedouit F, Sans N, Rousseau H, Rougé D, Telmon N. 2013. Ontogeny of size and shape sexual dimorphism in the ilium: a multislice computed tomography study by geometric morphometry. Journal of Forensic Sciences 58: 303–310.

Biljsma R, Loeschcke V. 2005. Environmental stress, adaptation and evolution: an overview. Journal of Evolutionary Biology 18: 744–749.

Binder M, Roberts CA. 2014. Calcified structures associated with human skeletal remains: possible atherosclerosis affecting the population buried at Amara West, Sudan (1300– 800 BC). International Journal of Paleopathology 6: 20–29.

Binford LR. 1971. Mortuary practices: their study and their potential. Memoirs of the Society for American Archaeology 25: 6–29.

Black TK. 1978a. A new method for assessing the sex of fragmentary skeletal remains: femoral shaft circumference. American Journal of Physical Anthropology 48: 227–232.

Black TK. 1978b. Sexual dimorphism in the tooth-crown diameters of the deciduous teeth. American Journal of Physical Anthropology 48: 77–82.

Blanckenhorn WU. 2005. Behavioral causes and consequences of sexual size dirmophism. Ethology 111: 977–1016.

Blanckenhorn WU, Stillwell RC, Young KA, Fox CW, Ashton KG. 2006. When Rensch meets Bergmann: does sexual size dimorphism change systematically with latitude? Evolution 60: 2004–2011.

Blanckenhorn WU, Dixon AFG, Fairbairn DJ, Foellmer MW, Gibert P, van der Linde K, Meier R, Nylin S, Pitnick S, Schoff C, Signorelli M, Teder T, Wiklund C. 2007. Proximate causes of Rensch’s rule: does sexual size dimorphism in arthropods result from sex differences in development time? The American Naturalist 169: 245–257.

Blankers M, Koeter MW, Schippers GM. 2010. Missing data approaches in eHealth research: simulation study and a tutorial for nonmathematically inclined researchers. Journal of Medical Internet Research 12: e54.

Blighe K. 2013. Haplotype classification using copy number variation and principal components analysis. The Open Bioinformatics Journal 7: 19–24.

Boano R, Fulcheri E, Martina MC, Ferraris A, Grilletto R, Cremo R, Cesarani F, Gandini G, Massa ER. 2009. Neural tube defect in a 4000-year-old Egyptian infant mummy: a case of meningocele from the museum of anthropology and ethnography of Turin (Italy). European Journal of Paediatric Neurology 13: 481–487.

378

Boattini A, CAstri L, Samo S, Useli A, Cioffi M, Sazzini M, Garagnani P, De Fanti S, Pettener D, Luiselli D. 2013. mtDNA variation in East Africa unravels the history of Afro-Asiatic groups. American Journal of Physical Anthropology 150: 375–385.

Bocquet-Appel JP, Masset C. 1982. Farewell to palaeodemography. Journal of Human Evolution 11: 321–333.

Bogin BA. 1978. Seasonal pattern in the rate of growth in height of children living in Guatemala. American Journal of Physical Anthropology 49: 205–210.

Bogin B, Rios L. 2003. Rapid morphological change in living humans: implications for modern human origins. Comparative Biochemistry and Physiology Part A 136: 71–84.

Bogin B, Wall M, MacVean RB. 1992. Longitudinal analysis of adolescent growth of Ladino and Mayan school children in Guatemala: effects of environment and sex. American Journal of Physical Anthropology 89: 447–457.

Bogin B, Smith P, Orden AB, Variela Silva MI, Loucky J. 2002. Rapid change in height and body proportions of Maya American children. American Journal of Human Biology 14: 753– 761.

Bosch E, Calafell F, Pérez-Lezaun A, Comas D, Mateu E, Bertranpetit J. 1997. Population history of North Africa: evidence from classical genetic markers. Human Biology 69: 295– 311.

Bongiovanni R, Spradley MK. 2012. Estimating sex of the human skeleton based on metrics of the sternum. Forensic Science International 219: 290.e1–290.e7.

Bourke JB. 1971. The palaeopathology of the vertebral column in ancient Egypt and Nubia. Medical History 15: 363–375.

Brace CL, Tracer DP, Yaroch A, Robb J, Brandt K, Nelson AR. 1993. Clines and clusters versus “race”: a test in ancient Egypt and the case of a death on the Nile. Yearbook of Physical Anthropology 36: 1–31.

Brickley M. 2004. Compliling a skeletal inventory: articulated inhumed bone. In: Brickley M, McKinley JI (eds). Guidelines to the Standards for Recording Human Remains. IFA Paper No. 7. BABAO & IFA.

Bridges PS. 1989. Changes in activites with the shift to agriculture in the Southeastern United States. Current Anthropology 30: 385–394.

Brock SL, Ruff CB. 1988. Diachronic patterns of changes in structural properties of the femur in the prehistoric American southwest. American Journal of Physical Anthropology 75: 113– 127.

Brook FA, Estibeiro JP, Copp AJ. 1994. Female predisposition to cranial neural tube defects is not because of a difference between the sexes in the rate of embryonic growth or development during neurulation. Journal of Medical Genetics 31: 383–387.

Brookfield M. 2011. Chapter 6: The desertification of the Egyptian Sahara during the Holocene (the last 10,000 years) and its influence on the rise of Egyptian civilization. In: Martini IP, Chesworth W (eds.) Landscapes and Societies. Selected Cases. Springer: London.

Brooks S, Suchey JM. 1990. Skeletal age determination based on the os pubis: a comparison of the Ascadi-Nemeskeri and Suchey-Brooks methods. Human Evolution 5: 227–238.

Brothwell DR. 1981. Digging up Bones. 3rd edtn. Cornell University Press: Ithaca, New York.

Brown CM, Arbour JH, Jackson DA. 2012. Testing the effect of missing data estimation and distribution in morphometric multivariate data analyses. Systematic Biology 61: 941–954.

379

Browne MW. 2000. Cross-validation methods. Journal of Mathematical Psychology 44: 108– 132.

Bruzek J. 2002. A method for visual determination of sex, using the human hip bone. American Journal of Physical Anthropology 117: 157–168.

Bruzek J, Murail P. 2006. Methodology and reliability of sex determination from the skeleton. In: Schmitt A, Cunha E, Pinheiro J (eds). Forensic Anthropology and Medicine. Complementary Sciences from Recovery to Cause of Death. Humana Press: Totowa, New Jersey.

Buckberry JL, Chamberlain AT. 2002. Age estimation from the auricular surface of the ilium: a revised method. American Journal of Physical Anthropology 119: 231–239.

Budaev SV. 2010. Using principal components and factor analysis in animal behaviour research: caveats and guidelines. Ethnology 116: 472–480.

Budge EAW. 1890. The Nile. Notes for Travellers in Egypt. Thos. Cook & Son: London.

Buikstra JE, Konigsberg LW. 1985. Paleodemography: critiques and controversies. American Anthropologist 87: 316–333.

Buikstra JE, Ubelaker DH. 1994. Standards for Data Collection from Human Skeletal Remains. Arkansas Archaeological Survey Research Series No. 44: Fayetteville, North Carolina.

Bullock M, Márquez L, Hernández P, Ruíz F. 2013. Paleodemographic age-at-death distributions of two Mexican skeletal collections: a comparison of transition analysis and traditional aging methods. American Journal of Physical Anthropology 152: 67–78.

Bulygina E, Mitteroecker P, Aiello L. 2006. Ontogeny of facial dimorphism and patterns of individual development within one human population. American Journal of Physical Anthropology 131: 432–443.

Burgoyne PS, Thornhill AR, Kalmus Boudrean S, Darling SM, Bishop CE, Evans EP, Capel B, Mittwoch U. 1995. The genetic basis of XX-XY differences present before gonadal sex differentiation in the mouse. Philosophical Transactions of the Royal Society of Biological Sciences 350: 253–261.

Burns RP, Burns R. 2009. Business Research Methods and Statistics using SPSS. Sage Publications Ltd: California. Advanced chapters available from companion website: http://www.uk.sagepub.com/burns/plecture.htm. Accessed February 2014.

Burr DB, Milgrom C, Fyhrie D, Forwood M, Nyska M, Finestone A, Hoshaw S, Saiag E, Simkin A. 1996. In vivo measurement of human tibial strains during vigorous activity. Bone 18: 405–410.

Burr DB, Robling AG, Turner CH. 2002. Effects of biomechanical stress on bones in animals. Bone 30: 781–786.

Burris BG, Harris EF. 1998. Identification of race and sex from palate dimensions. Journal of Forensic Sciences 43: 959–963.

Burrows AM, Zanella VP, Brown TM. 2003. Testing the validity of metacarpal use in sex assessment of human skeletal remains. Journal of Forensic Sciences 48: 17–20.

Butzer KW. 1974. Modern Egyptian pottery clays and Predynastic buff ware. Journal of Near Eastern Studies 33: 377–382.

Buzon MR. 2005. Two cases of pelvic osteochondroma in New Kingdom Nubia. International Journal of Osteoarchaeology 15: 377–382.

Buzon MR. 2006. Health of the non-elites at Tombos: nutritional and disease stress in New Kingdom Nubia. American Journal of Physical Anthropology 130: 26–37.

380

Buzon MR, Bombak A. 2010. Dental disease in the Nile Valley during the New Kingdom. International Journal of Osteoarchaeology 20: 371–387.

Buzon MR, Richman R. 2007. Traumatic injuries and imperialism: the effects of Egyptian colonial strategies at Tombos in Upper Nubia. American Journal of Physical Anthropology 133: 783–791.

Byers SN. 2008. Introduction to Forensic Anthropology. 3rd edtn. Pearson Education Inc: Boston.

Cabral HJ. 2008. Muliple comparisons procedures. Circulation 117: 698–701.

Callewaert F, Sinnesael M, Gielen E, Boonen S, Vanderschueren D. 2010a. Skeletal sexual dimorphism: relative contribution of sex steroids, GH-IGH-1, and mechanical loading. Journal of Endocrinology 207: 127–134.

Callewaert F, Venken K, Kopchick J, Torcasio A, van Lenthe GH, Boonen S, Vanderschueren D. 2010b. Sexual dimorphism in cortical bone size and strength but not density is determined by independent and time-specific actions of sex steroids and IGF-I: evidence from pubertal mouse models. Journal of Bone and Mineral Research 25: 617–626.

Capelli C, Redhead N, Romano V, Cali F, Lefranc G, Delague V, Megarbane A, Felice AE, Pascali VL, Neophytou PI, Pouli Z, Novelletto A, Malaspina P, Terrenato L, Berebbi A, Fellous M, Thomas MG, Goldstein DB. 2006. Population structure in the Mediterranean Basin: a Y chromosome perspective. Annals of Human Genetics 70: 207–225.

Carani C, Qin K, Simoni M, Faustini-Fustini M, Serpente S, Boyd J, Korach KS, Simpson ER. 1997. Effect of testosterone and estradiol in a man with aromatase deficiency. New England Journal of Medicine 337: 91–95.

Cardoso HFV. 2007a. Environmental effects on skeletal versus dental development: using a documented subadult skeletal sample to test a basic assumption in human osteological research. American Journal of Physical Anthropology 132: 223–233.

Cardoso HFV. 2007b. Differential sensitivity in growth and development of dental and skeletal tissue to environmental quality. Arquivos de Medicina 21: 19–23.

Cardoso HFV. 2010. Testing discriminant functions for sex determination from deciduous teeth. Journal of Forensic Sciences 55: 1557–1560.

Cardoso HFV, Gomes JEA. 2009. Trends in adult stature of peoples who inhabited the modern Portugese territory from the Mesolithic to the late 20th century. International Journal of Osteoarchaeology 19: 711–725.

Cardoso HFV, Garcia S. 2009. The not-so-dark ages: ecology for human growth in medieval and early Twentieth century Portugal as inferred from skeletal growth profiles. American Journal of Physical Anthropology 138: 136–147.

Cardoso FA, Henderson CY. 2010. Enthesopathy formation in the humerus: data from known age-at-death and known occupation skeletal collections. American Journal of Physical Anthropology 141: 550–560.

Carlson KJ, Grine FE, Pearson OM. 2007. Robusticity and sexual dimorphism in the postcranium of modern hunter-gatherers from Australia. American Journal of Physical Anthropology 134: 9–23.

Carr C. 1995. Mortuary practices: their social philosophical–religious, circumstantial, and physical determinants. Journal of Archaeological Method and Theory 2: 105–200.

Cartmill M. 1998. The status of the race concept in physical anthropology. American Anthropologist 100: 651–660.

381

Case DT, Ross AH. 2007. Sex determination from hand and foot bone lengths. Journal of Forensic Sciences 52: 264–270.

Cashdan EA. 1980. Egalitarianism among hunters and gatherers. American Anthropologist 82: 116–120.

Castillos JJ. 2006. Social stratification in early Egypt. Göttinger Miszellen 210: 13–17.

Castillos JJ. 2007. The beginning of class stratification in early Egypt. Göttinger Miszellen 215: 9–25.

Cave AJE. 1939. The evidence for the incidence of tuberculosis in ancient Egypt. British Journal of Tuberculosis 33: 142–152.

Cawthon PM. 2011. Gender differences in osteoporosis and fractures. Clinical Orthopaedics and Related Research 469: 1900–1905.

Center for Academic Research & Training in Anthropogeny. 2014. Skeletal Robusticity. Available online: http://carta.anthropogeny.org/moca/topics/skeletal-robusticity. Accessed April 2014.

Ceppellini R, Siniscalco M, Smith CA. 1955. The estimation of gene frequencies in a random- mating population. Annals of Human Genetics 20: 97–115.

Chakraborty R, Majumder PP. 1982. On Bennett’s measure of sex dimorphism. American Journal of Physical Anthropology 59: 295–298.

Chapman T, Lefevre P, Semal P, Moiseev F, Sholukha V, Louryan S, Rooze M, Van Sint Jan S. 2014. Sex determination using the Probabilistic Sex Diagnosis (DSP: Diagnose Sexuelle Probabiliste) tool in a virtual environment. Forensic Science International 234: 189.e1– 189.e8.

Chen LC, Huq E, D’Souza S. 1981. Sex bias in the family allocation of food and health care in rural Bangladesh. Population and Development Review 7: 55–70.

Chen X, Grisham W, Arnold AP. 2009. X chromosome number causes sex differences in gene expression in adult mouse striatum. European Journal of Neuroscience 29: 768–776.

Chidiac JJ, Shofer FS, Al-Kutoub A, Laster LL, Ghafari J. 2002. Comparison of CT scanograms and cephalometric radiographs in craniofacial imaging. Orthodontics & Craniofacial Research 5: 104–113.

Clarke R, Ressom HW, Wang A, Zuan J, Liu MC, Gehan EA, Wang Y. 2008. The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nature Reviews Cancer 8: 37–49.

Clavel J, Merceron G, Escarguel G. 2014. Missing data estimation in morphometrics: how much is too much? Systematic Biology 63: 203–218.

Clegg EJ, Pawson IG, Ashton EH, Flinn RM. 1972. The growth of children at different altitudes in Ethiopia. Philosophical Transactions of the Royal Society of London Biological Sciences 264: 403–437.

Clutton-Brock TH, Hucjard E. 2013. Social competition and selection in males and females. Philosophical Transactions of the Royal Society Biological Sciences 368: 20130074.

Cobb SN, O’Higgins P. 2007. The ontogeny of sexual dimorphism in the facial skeleton of the African apes. Journal of Human Evolution 53: 176–190.

Cohen MN, Armelagos GJ (eds.) 1984. Paleopathology at the Origins of Agriculture. Academic Press Inc: Orlando, FL.

382

Cole TJ, Faith MS, Pietrobelli A, Heo M. 2005. What is the best measure of adiposity change in growing children: BMI, BMI %, BMI z-score or BMI centile? International Journal of Clinical Nutrition 59: 419–425.

Collins MJ, Penkman KEH, Rohland N, Shapiro B, Dobberstein RC, Ritz-Timme S, Hofreiter M. 2009. Is amino acid racemisation a useful tool for screening for ancient DNA in bone? Proceedings of the Royal Society for Bioloigcal Sciences 276: 2971–2977.

Çöloğlu AS, İşcan MY, Yavuz MF, Sari H. 1998. Sex determination from the ribs of contemporary Turks. Journal of Forensic Sciences 43: 273–276.

Colvard DS, Eriksen EF, Keeling PE, Wilson EM, Lubahn DB, French FS, Riggs BL, Spelsberg TC. 1989. Identification of androgen receptors in normal human osteoblast-like cells. Proceedings of the National Academy of Science of the USA 86: 854–857.

Conceição ELN, Cardoso HFV. 2011. Environmental effects on skeletal versus dental development II: further testing of a basic assumption in human osteological research. American Journal of Physical Anthropology 144: 463–470.

Cooney KM. 2012. Coffin reuse in the Twenty-First Dynasty. The demands of ritual transformation. Backdirt: Annual Review of the Cotsen Institute of Archaeology at UCLA 22–33.

Cope DJ, Dupras TL. 2011. Osteogenesis imperfecta in the archaeological record: an example from the Dakhleh Oasis, Egypt. International Journal of Paleopathology 1: 188–199.

Corruccini RS. 1983. Principal components for allometric analysis. American Journal of Physical Anthropology 60: 451–453.

Corruccini RS. 1987. Shape in morphometrics: comparative analyses. American Journal of Physical Anthropology 73: 289–303.

Courtland H-W, Sun H, Beth-On M, Wu Y, Elis S, Rosen CJ, Yakar S. 2011. Growth hormone mediates pubertal skeletal development independent of hepatic IGF-1 production. Journal of Bone and Mineral Research 26: 761–768.

Cowal LS, Pastor RF. 2008. Dimensional variation in the proximal ulna: evaluation of a metric method of sex assessment. American Journal of Physical Anthropology 135: 469–478.

Cowgill LW, Eleazer CD, Auerbach BM, Temple DH, Okazaki K. 2012. Developmental variation in ecogeographic body proportions. American Journal of Physical Anthropology 148: 557–570.

Crimmins EM, Finch CE. 2006. Infection, inflammation, height, and longevity. Proceedings of the National Academy of Sciences USA 103: 498–503.

Crubézy É, Ludes B, Poveda J-D, Clayton J, Crouau-Roy B, Montagnon D. 1998. Identification of Mycobacterium DNA in an Egyptian Pott’s disease of 5,400 years old. Comptes Rendus de L’Académie des Sciences 321: 941–951.

Cruciani F, La Fratta R, Trombetta B, Santolamazza P, Sellitto D, Colomb EB, Dugoujon J-M, Crivellaro F, Benincasa T, Pascone R, Moral P, Watson E, Melegh B, Barbujani G, Fuselli S, Vona G, Zagradisnik B, Assum G, Brdicka R, Kozlov AI, Efremov GD, Coppa A, Novelletto A, Scozzari R. 2007. Tracing past human male movements in Northern/Eastern Africa and Western Eurasia: new clues from Y-chromosomal haplogroups E-M78 and J-M12. Molecular Biology and Evolution 24: 1300–1311.

Dabbs G. 2010. Sex determination using the scapula in New Kingdom skeletons from Tell El- Amarna. HOMO – Journal of Comparative Human Biology 61: 413–420.

Dabbs GR, Moore-Jansen PH. 2010. A method for estimating sex using metric analysis of the scapula. Journal of Forensic Sciences 55: 149–152.

383

Dancey CP, Reidy J. 2004. Statistics without Maths for Psychology: Using SPSS for Windows. 3rd edtn. Pearson Education Ltd: Harlow, UK.

D’Angelo GM, Luo J, Xiong C. 2012. Missing data methods for partial correlations. Journal of Biometrics & Biostatistics 3: 155.

Darwin C. 1871. The Descent of Man and Selection in Relation to Sex. John Murray: London.

Das Gupta M. 1987. Selective discrimination against female children in rural Punjab, India. Population and Development Review 13: 77–100.

Daskalaki E, Anderung C, Humphrey L, Götherström A. 2011. Further developments in molecular sex assignment: a blind test of 18th and 19th century human skeletons. Journal of Archaeological Science 38: 1326–1330.

David AR, Kershaw A, Heagerty A. 2010. Atherosclerosis and diet in ancient Egypt. Lancet 375: 718–719.

Davide D. 1972. Survey of the skeletal and mummy remains of ancient Egyptians available in research collections. Journal of Human Evolution 1: 155–159.

Decker SJ, Davy-Jow SL, Ford JM, Hilbelink DR. 2011. Virtual determination of sex: metric and nonmetric traits of the adult pelvis from 3D computed tomography models. Journal of Forensic Sciences 56: 1107–1114.

De La Fuente R, Hahnel A, Basrur PK, King WA. 1999. X inactive-specific transcript (Xist) expression and X chromosome inactivation in the preattachment bovine embryo. Biology of Reproduction 60: 769–775.

Delrue P. 2001. The Predynastic cemetery at Nag’a ed-Dêr. A re-evaluation. In: Willems H (ed.) Social Aspects of Funerary Culture in the Egyptian Old and Middle Kingdoms: Proceedings of the International Symposium Held at Leiden University, 6–7 June, 1996. Peeters Publishers: Leuven, Belgium.

De Meer K, Bergman R, Kusner JS, Woorhoeve HWA. 1993. Differences in physical growth of Aymara and Quechua children living at high altitude in Peru. American Journal of Physical Anthropology 90: 59–75.

Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39: 1–38.

Derevenski JRS. 2000. Sex differences in activity-related osseous changes in the spine and the gendered division of labor at Ensay and Wharram Percy, UK. American Journal of Physical Anthropology 111: 333–354.

Der Manuelian P. 1992. George Andrew Reisner on archaeological photography. Journal of the American Research Center in Egypt XXIX: 1–34.

Der Manuelian P. 2009. Mastabas of Nucleus Cemetery G2100. Part I: Major Mastabas of G2100–2200. Museum of Fine Arts: Boston, US.

Dequeker J, Ortner DJ, Stix AI, Cheng XG, Brys P, Boonen S. 1997. Hip fracture and osteoporosis in a XIIth Dynasty female skeleton from , Upper Egypt. Journal of Bone and Mineral Research 12: 881–888.

Derry DE. 1907. Notes on predynastic Egyptian tibiae. Journal of Anatomy and Physiology 41: 123–130.

Derry DE. 1909. Notes on the innominate bone as a factor in the determination of sex: with special reference to the sulcus præauriculus. Journal of Anatomy and Physiology 43: 266–276.

384

Derry DE. 1913. A case of hydrocephalus in an Egyptian of the Roman Period. Journal of Anatomy and Physiology 47: 436–458.

Derry DE. 1914. Parietal perforation accompanied with flattening of the skull in an ancient Egyptian. Journal of Anatomy and Physiology 48: 417–429.

Der Sarkissian C, Balanovsky O, Brandt G, Khartanovich V, Buzhilova A, Koshel S, Zaporozhchenko V, Gronenborn D, Moiseyev V, Kolpakov E, Shumkin V, Alt KW, Balanovska E, Cooper A, Haak W. 2013. Ancient DNA reveals prehistoric gene-flow from Siberia in the complex human population history of north east Europe. PLoS Genetics 9: e1003296.

DeWitte SN. 2009. The effect of sex on risk of mortality during the Black Death in London, A.D. 1349–1350. American Journal of Physical Anthropology 139: 222–234.

Diamond JM. 1991. Why are pygmies small? Nature 354: 111–112.

Díaz-Muñoz SL, DuVal EH, Krakauer AH, Lacey EA. 2014. Cooperating to compete: altruism, sexual selection and causes of male reproductive cooperation. Animal Behaviour 88: 67–78.

Dibley MJ, Staehling N, Nieburg P, Trowbridge FL. 1987. Interpretation of Z-score anthropometric indicators derived from the international growth reference. American Journal of Clincial Nutrition 46: 749–762.

DiBennardo R, Taylor JV. 1979. Sex assessment of the femur: a test of a new method. American Journal of Physical Anthropology 50: 635–638.

Dittrick J, Suchey JM. 1986. Sex determination of prehistoric central Californian skeletal remains using discriminant analysis of the femur and humerus. American Journal of Physical Anthropology 70: 3–9.

Donoghue HD, Lee OY-C, Minnikin DE, Besra GS, Taylor JH, Spigelman M. 2010. Tuberculosis in Dr Granville’s mummy: a molecular re-examination of the earliest known Egyptian mummy to be scientifically examined and given a medical diagnosis. Proceedings of the Royal Society of Biological Sciences 277: 51–56.

Dong Y, Peng C-YJ. 2013. Principled missing data methods for researchers. SpringerPlus 2: 222.

Dreizen S, Spirakis CN, Stone RE. 1967. A comparison of skeletal growth and maturation in undernourished and well-nourished girls before and after menarche. The Journal of Pediatrics 70: 256–263.

Drevenstedt GK, Crimmins EM, Vasunilashorn S, Finch CE. 2008. The rise and fall of excess male infant mortality. Proceedings of the National Academy of Sciences USA 105: 5016–5021.

Dumond DE. 1961. Swidden agriculture and the rise of Maya civilization. Southwestern Journal of Anthropology 17: 301–316.

Dumond DE. 1975. The limitation of human population: a natural history. Science 187: 713– 721.

Dunham AE, Maitner BS, Razafindratsima OH, Simmons MC, Roy CL. 2013. Body size and sexual size dimorphism in primates: influences of climates and net primary productivity. Journal of Evolutionary Biology 26: 2312–2320.

Dupras TL, Schwarcz HP. 2001. Strangers in a strange land: stable isotope evidence for human migration in the Dakhleh Oasis, Egypt. Journal of Archaeological Science 28: 1199–1208.

Dupras TL, Schwarcz HP, Fairgreave SI. 2001. Infant feeding and weaning practices in . American Journal of Physical Anthropology 115: 204–212.

385

Dupras TL, Williams LJ, Willems H, Peeters C. 2010. Pathological skeletal remains from ancient Egypt: the earliest case of diabetes mellitus? Practical Diabetes International 27: 358– 363.

Duren DL, Seselj M, Froehle AW, Nahhas RW, Sherwood RJ. 2013. Skeletal growth and the changing genetic landscape during childhood and adulthood. American Journal of Physical Anthropology 150: 48–57.

Đurić M, Rakočević Z, Đonić D. 2005. The reliability of sex determination of skeletons from forensic context in the Balkans. Forensic Science International 147: 159–164.

Dzierzykray-Rogalski T. 1980. Paleopathology of the Ptolemaic inhabitants of Dakhleh Oasis (Egypt). Journal of Human Evolution 9: 71–74.

Dziuban CD, Shirkey EC. 1974. When is a correlation matrix appropriate for factor analysis? Some decision rules. Psychological Bulletin 81: 358–361.

Ehret C. 2002. The Civilizations of Africa: A History to 1800. University Press of Virginia: USA.

Ehrlich PR, Ahrlich AH. 2013. Can a collapse of global civilization be avoided? Proceedings of the Royal Scoiety Biological Sciences 280: 20122845.

Elliott M, Collard M. 2009. FORDISC and the determination of ancestry from cranial measurements. Biology Letters Phylogeny 5: 849–852.

El Morsi DA, Al Hawary AA. 2013. Sex determination by the length of metacarpals and phalanges: X-ray study on Egyptian population. Journal of Forensic and Legal Medicine 20: 6–13.

El-Sherbeney SAA, Ahmed EA, Ewis AA. 2012. Estimation of sex of Egyptian population by 3D computerized tomography of the pars petrosa ossis temporalis. Egyptian Journal of Forensic Sciences 2: 29–32.

Emery WB. 1972. Archaic Egypt. Penguin Books: Aylesbury, UK.

Endo A, omoe K, Ishikawa H. 1993. Ecological factors affecting body size of Japanese adolescents. American Journal of Physical Anthropology 91: 299–303.

Erfan M, El-Sawaf A, Soliman MA-T, El-Din AS, Kandeel WA, El-Banna R, Azab A. 2009. Cranial trauma in ancient Egyptians from the Bahriyah Oasis, Greco-Roman Period. Research Journal of Medicine and Medical Sciences 4: 78–84.

Ericksen MF. 1982. How “representative” is the Terry Collection? Evidence from the proximal femur. American Journal of Physical Anthropology 59: 345–350.

Eriksen EF, Colvard DS, Berg NJ, Graham ML, Mann KG, Spelsberg TC, Riggs BL. 1988. Evidence of estrogen receptors in normal human osteoblast-like cells. Science 241: 84– 86.

Eshak GA, Ahmed HM, Abdel EAM. 2011. Gender determination from hand bone length and volume using multidetector computed tomography: a study in Egyptian people. Journal of Forensic and Legal Medicine 18: 246–252.

Eshed V, Gopher A, Galili E, Hershkovitz I. 2004. Musculoskeletal stress markers in Natufian hunter-gatherers and Neolithic farmers in the Levant: the upper limb. American Journal of Physical Anthropology 123: 303–315.

Eshed V, Gopher A, Hershkovitz I. 2006. Tooth wear and dental pathology at the advent of agriculture: new evidence from the Levant. American Journal of Physical Anthropology 130: 145–159.

386

Eubanks JD, Cheruvu VK. 2009. Prevalence of sacral spina bifida occulta and its relationship to age, sex, race, and the sacral table angle: an anatomic, osteologic study of three thousand one hundred specimens. Spine 34: 1539–1543.

Evteev A, Cardini AL, Morozova I, O’Higgins P. 2014. Extreme climate, rather than population history, explains mid-facial morphology of northern Asians. American Journal of Physical Anthropology 153: 449–462.

Fadhlaoui-Zid K, Rodríguez-Botigué L, Naoui N, Benammar-Elgaaied A, Calafell F, Comas D. 2011. Mitochondrial DNA structure in North Africa reveals a genetic discontinuity in the Nile Valley. American Journal of Physical Anthropology 145: 107–117.

Faerman M, Filon D, Kahila G, Greenblatt CL, Smith P, Oppenheim A. 1995. Sex identification of archaeological human remains based on amplification of the X and Y amelogenin alleles. Gene 167: 327–332.

Faerman M, Kahila Bar-Gal G, Filon D, Grenblatt CL, Stager L, Oppenheim A, Smith P. 1998. Determining the sex of infanticide victims from the Late Roman era through ancient DNA analysis. Journal of Archaeological Science 25: 861–865.

Fairbairn DJ. 1997. Allometry for sexual size dimorphism: pattern and process in the coevolution of body size in males and females. Annual Review of Ecology, Evolution & Systematics 28: 659–687.

Fairgrieve SI, Molto JE. 2000. Cribra orbitalia in two temporally disjunct population samples from the Dakhleh Oasis, Egypt. American Journal of Physical Anthropology 111: 319–331.

Falsetti AB. 1995. Sex assessment from metacarpals of the human hand. Journal of Forensic Sciences 40: 774–776.

Farid H, Farid S. 2001. Unfolding Sennedjem’s tomb. KMT: A Modern Journal of Ancient Egypt 12(1): 46–59.

Feldtkeller E, Lemmel E-M, Russell AS. 2003. Ankylosing spondylitis in the pharaohs of ancient Egypt. Rheumatology International 23: 1–5.

Ferembach D, Schwidetzky I, Stloukal M. 1980. Recommendations for age and sex diagnoses of skeletons. Journal of Human Evolution 9: 517–549.

Fernanda Laus M, Ferreira Vales LDM, Braga Costa TM, Sousa Almeida S. 2011. Early and postnatal protein-calorie malnutrition and cognition: a review of human and animal studies. International Journal of Environmental Research and Public Health 8: 590–612.

Fidas A, MacDonald HL, Elton , Wild SR, Chisholm GD, Scott R. 1987. Prevalence and patterns of spina bifida occulta in 2707 normal adults. Clincial Radiology 38: 537–542.

Fields SJ, Spiers M, Hershkovitz I, Livshits G. 1995. Reliability of reliability coefficients in the estimation of asymmetry. American Journal of Physical Anthropology 96: 83–87.

Fienberg SE. 1997. Statistics. In: Freiman MJ, Berenblut ML (eds.) The Litigator’s Guide to Expert Witnesses. Canadian Cataloguing in Publication Data: Canada.

Filce Leek F. 1966. Observations on the dental pathology seen in ancient Egyptian skulls. Journal of Egyptian Archaeology 52: 59–64.

Filce Leek F. 1980. Observations on a collection of crania from the mastabas of the reign of Cheops at Giza. Journal of Egyptian Archaeology 66: 36–47.

Filce Leek F. 1984. Reisner’s collection of human remains from the mastaba tombs at Giza. Zeitschrift für Ägyptische Sprache und Altertumskunde 111: 11–18.

387

Filer JM. 1992. Head injuries in Giza and Nubia: a comparison of skulls from Giza and Kerma. Journal of Egyptian Archaeology 78: 281–285.

Finch CE. 2011. Atherosclerosis is an old disease: summary of the Ruffer Centenary Symposium, the Paleocardiology of ancient Egypt, a meeting report of the Study team. Experimental Gerontology 46: 843–846.

Finkelstein JS, Klianski A, Neer RM, Greenspan SL, Rosenthal DI, Growley WF. 1987. Osteoporosis in men with idiopathic hypogonadotrophic hypogonadism. Annals of Internal Medicine 106: 354–361.

Finkelstein JS, Klibanski A, Neer RM, Doppelt SH, Rosenthal DI, Segre GV, Crowley WF. 1989. Increase in bone density during treatment of men with idiopathic hypogonadotropic hypogonadism. Journal of Clinical Endocrinology & Metabolism 69: 776–793.

Fisher RA. 1936. Statistical Methods for Research Workers. 6th edtn. Oliver and Boyd: Edinburgh.

Flander LB. 1978. Univariate and multivariate methods for sexing the sacrum. American Journal of Physical Anthropology 49: 103–110.

Flohr S, Leckelt J, Kierdorf U, Kierdorf H. 2010. How reproducibly can human ear ossicles be measured? A study of inter-observer error. The Anatomical Record 293: 2094–2106.

Foster HJ. 1974. The ethnicity of the ancient Egyptians. Journal of Black Studies 5: 175–191.

Fox CL. 1997. mtDNA analysis in ancient Nubians supports the existence of gene flow between sub-Sahara and North Africa in the Nile Valley. Annals of Human Biology 24: 217–227.

Franklin D, Freedman L, Milne N. 2005. Sexual dimorphism and discriminant function sexing in indigenous South African crania. HOMO – Journal of Comparative Human Biology 55: 213–228.

Franklin D, Cardini A, Flavel A, Kuliukas A. 2012. The application of traditional and geometric morphometric analyses for forensic quantification of sexual dimorphism: preliminary investigations in a Western Australian population. International Journal of Legal Medicine 126: 549–558.

Franklin D, Cardini A, Flavel A, Kuliukas A. 2013. Estimation of sex from cranial measurements in a Western Australian population. Forensic Science International 229: 158.e1–158.e8.

Frayer DW. 1980. Sexual dimorphism and cultural evolution in the late Pleistocene and Holocene of Europe. Journal of Human Evolution 9: 399–415.

Frayer DW, Wolpoff MH. 1985. Sexual dimorphism. Annual Review of Anthropology 14: 429– 473.

Frelat MA, Katina S, Weber GW, Bookstein FL. 2012. Technical note: a novel geometric morphometric approach to the study of long bone shape variation. American Journal of Physical Anthropology 149: 628–638.

Friedman JH, Tukey JW. 1974. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers 23: 881–890.

Frieß M, Baylac M. 2003. Exploring artificial cranial deformation using elliptic Fourier analysis of Procrustes aligned outlines. American Journal of Physical Anthropology 122: 11–22.

Frisancho AR. 1969. Human growth and pulmonary function of a high altitude Peruvian Quechua population. Human Biology 41: 365–379.

Frisancho AR. 1970. Developmental responses to high altitude hypoxia. American Journal of Physical Anthropology 32: 401–408.

388

Frisancho AR, Baker PT. 1970. Altitude and growth: a study of the patterns of physical growth of a high altitude Peruvian Quechua population. American Journal of Physical Anthropology 32: 279–292.

Fritzsche K, Arnqvist G. 2013. Homage to Batemna: sex roles predict sex differences in sexual selection. Evolution 67: 1926–1936.

Frood E. 2010. Chapter 25: social structure and daily life: pharaonic. In: Lloyd AB (ed.) A Companion to Ancient Egypt. Wiley-Blackwell: Oxford.

Frost HM. 1990. Skeletal structural adaptations to mechanical usage (SATMU): 2. Redefining Wolff’s Law: the remodelling problem. The Anatomical Record 226: 414–422.

Fukase H, Wakebe T, Tsurumoto T, Saiki K, Fujita M, Ishida H. 2012. Geographic variation in body form of prehistoric Jomon males in the Japanese Archipelago: its ecogeographic implications. American Journal of Physical Anthropology 149: 125–135.

Galarza A, Hidalgo J, Ocio G, Rodriguez P. 2008. Sexual dimorphism and determination of sex in Atlantic yellow-legged gulls Larus michahellis luisitanius from northern Spain. Ardeola 55: 41–47.

Gamza T, Irish J. 2012. A comparison of archaeological and dental evidence to determine diet at a Predynastic Egyptian site. International Journal of Osteoarchaeology 22: 398–408.

Gapert R, Black S, Last J. 2009. Sex determination from the occipital condyle: discriminant function analysis in an Eighteenth and Nineteenth century British sample. American Journal of Physical Anthropology 138: 384–394.

Garn SM, Burdi AR, Babler WJ. 1974. Male advancement in prenatal hand development. American Journal of Physical Anthropology 41: 353–360.

Garvin HM, Passalacqua NV. 2012. Current practices by forensic anthropologists in adult skeletal age estimation. Journal of Forensic Sciences 57: 427–433.

Gaulin JC, Sailer LD. 1984. Sexual dimorphism in weight among the primates: the relative impact of allometry and sexual selection. International Journal of Primatology 5: 515–535.

Gavan JA. 1952. “Growth of Guamanian children” – some methodological considerations. American Journal of Physical Anthropology 10: 132–135.

Geary DC, Winegard B, Winegard B. 2014. Reflections on the evolution of human sex differences: social selection and the evolution of competition among women. In: Weekes- Shackelford VA & Shackelford TK (eds.) Evolutionary Perspectives on Human Sexual Psychology and Behavior. Springer: New York.

Geisser S. 1975. The predictive sample reuse method with applications. Journal of the American Statistical Association 70: 320–328.

Genovés ST. 1970. Anthropometry of Late Prehistoric human remains. In: Stewart TD (ed.) Handbook of Middle American Indians: Physical Anthropology. University of Texas Press: Austin.

Gerloni A, Cavalli F, Costantinides F, Costantinides F, Bonetti S, Paganelli C. 2009. Dental status of three Egyptian mummies: radiological investigation by multislice computerized tomography. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and Endodontics 107: e58–e64.

Gibbon V, Paximadis M, Štrkalj G, Ruff P, Penny C. 2009. Novel methods of molecular sex identification from skeletal tissue using the amelogenin gene. Forensic Science International: Genetics 3: 74–79.

Gilbert MTP, Bandelt H-J, Hofreiter M, Barnes I. 2005. Assessing ancient DNA studies. Trends in Ecology and Evolution 20: 541–544.

389

Gilchrist R. 1999. Gender and Archaeology: Contesting the Past. Routledge: London.

Giles E. 1964. Sex determination by discriminant function analysis of the mandible. American Journal of Physical Anthropology 22: 129–136.

Giles E, Elliot O. 1963. Sex determination by discriminant function analysis of crania. American Journal of Physical Anthropology 21: 53–68.

Gillet CR. 1898. The Metropolitan Museum of Art. Hand-Book No. 4. Catalogue of the Egyptian Antiquities in Halls 3 & 4. Metropolitan Museum of Art: New York.

Godde K. 2009. An examination of Nubian and Egyptian biological distances: support for biological diffusion or in situ development. HOMO – Journal of Comparative Human Biology 60: 389–404.

Gómez-Valdés JA, Quinto-Sánchez M, Menéndez Garmendia A, Veleminska J, Sánchez- Mejorada G, Brizek J. 2012. Comparison of methods to determine sex by evaluating the greater sciatic notch: visual, angular and geometric morphometrics. Forensic Science International 221: 156.e1–156.7.

Gonzales G, Crespo-Retes I, Guerra-Garcia R. 1982. Secular change in growth of native children and adolescents at high altitude I. Puno, Peru (3800 meters). American Journal of Physical Anthropology 58: 191–195.

Gonzalez RA. 2012. Determination of sex from juvenile crania by means of discriminant function analysis. Journal of Forensic Sciences 57: 24–34.

Gonzales GF, Valera J, Rodriguez L, Vega A, Guerra-Garcia R. 1984. Secular change in growth of native children and adolescents at high altitude Huancayo, Peru (3,280 meters). American Journal of Physical Anthropology 64: 47–51.

González PN, Bernal V, Perez SI, Barrientos G. 2007. Analysis of dimorphic structures of the human pelvis: its implications for sex estimation in samples without reference collections. Journal of Archaeological Science 34: 1720–1730.

González-Reimers E, Velasco-Vázquez J, Arnay-de-la-Roas M, Santolaria-Fernández F. 2000. Sex determination by discriminant function analysis of the right tibia in the prehispanic popualtion fo the Canary Islands. Forensic Science International 108: 165–172.

Good PI. 2001. Applying Statistics in the Courtroom: A new Approach for Attorneys and Expert Witnesses. Chapman & Hall/CRC: USA.

Goodman SN. 1999. Toward evidence-based medical statistics. 1: The P value fallacy. Annals of Internal Medicine 130: 995–1004.

Goodman MJ, Griffin PB, Estioko-Griffin AA, Grove JS. 1985. The compatibility of hunting and mothering among the Agta hunter-gatherers of the Philippines. Se Roles 12: 1199–1209.

Gopalan C. 1983. ‘Measurement of undernutrition’: biological considerations. Economic and Political Weekly 18: 591–595.

Gordon AD. 2006. Scaling of size and dimorphism in primates II: macroevolution. International Journal of Primatology 27: 63–105.

Gordon CC, Bradtmiller B. 1992. Interobserver error in a large scale anthropometric survey. American Journal of Human Biology 4: 253–263.

Gordon AD, Green DJ, Richmond BG. 2008a. Strong postcranial size dimorphism in Australopithecis afarensis: results from two new resampling methods for multivariate data sets with missing data. American Journal of Physical Anthropology 135: 311–328.

390

Gordon AD, Nevell L, Wood B. 2008b. The Homo floresiensis cranium (LB1): size, scaling, and early Homo affinities. Proceedings of the National Academy of Sciences of the USA 105: 4650–4655.

Götherström A, Lidén K, Ahlström T, Källersjö M, Brown TA. 1997. Osteology, DNA and sex identification: morphological and molecular sex identification of five Neolithic individuals from Ajvide, Gotland. International Journal of Osteoarchaeology 7: 71–81.

Goto R, Mascie-Taylor CGN. 2007. Precision of measurement as a component of human variation. Journal of Physical Anthropology 26: 253–256.

Grabherr S, Cooper C, Ulrich-Bochsler S, Uldin T, Ross S, Oesterhelweg L, Bolliger S, Christie A, Schnyder P, Mangin P, Thali MJ. 2009. Estimation of sex and age of “virtual skeletons” – a feasibility study. European Journal of Radiology 19: 419–429.

Graitcer PL, Gentry EM. 1981. Measuring children: one reference for all. Lancet 2: 297–299.

Graw M, Czarnetzki A, Haffner H-T. 1999. The form of the supraorbital margin as a criterion in identification of sex from the skull: investigations based on modern human skulls. American Journal of Physical Anthropology 108: 91–96.

Gray P. 2013. Hunter-gatherer egalitarianism as a force for decline in sexual dimorphism. Psychological Inquiry 24: 192–194.

Gray JP, Wolfe LD. 1980. Height and sexual dimorphism of stature among human societies. American Journal of Physical Anthropology 53: 441–456.

Green H, Curnoe D. 2009. Sexual dimorphism in Southeast Asian crania: a geometric morphometric approach. HOMO – Journal of Comparative Human Biology 60: 517–534.

Greene TR, Kuba CL, Irish JD. 2005. Quantifying calculus: a suggested new approach for recording an important indicator of diet and dental health. HOMO – Journal of Comparative Human Biology 56: 119–132.

Greksa LP, Spielvogel H, Paredes-Fernandez L, Paz-Zamora M, Caceres E. 1984. The physical growth of urban children at high altitude. American Journal of Physical Anthopology 65: 315–322.

Greulich WW. 1951. The growth and developmental status of Guamanian school children in 1947. American Journal of Physical Anthropology 9: 55–70.

Grine FE, Jungers WL, Tobias PV, Pearson OM. 1995. Fossil Homo femur from Berg Aukas, northern Namibia. American Journal of Physical Anthropology 97: 151–185.

Gross TS, Edwards JL, McLeos KJ, Rubin CT. 1997. Strain gradients correlate with sites of periosteal bone formation. Journal of Bone and Mineral Research 12: 982–988.

Grosvenor AE, Laws ER. 2008. The evolution of extracranial approaches to the pituitary and anterior skull base. Pituitary 11: 337–345.

Guld S. 1995. Das Altägyptische Gräberfeld von Gizeh. Anthropologische Untersuchungen mit dem Schwerpunkt der Paläpathologie. Diplomarbeit; Universität Wien.

Gülekon IN, Turgut HB. 2003. The external occipital protuberance: can it be used as a criterion in the determination of sex? Journal of Forensic Sciences 48: 513–516.

Gupta KB, Gupta R, Atreia A, Verma M, Vishvkarma S. 2009. Tuberculosis and nutrition. Lung India 26: 9–16.

Gustafsson A, Lindenfors P. 2009. Latitudinal patterns in human stature and sexual stature dimorphism. Annals of Human Biology 36: 74–87.

391

Guyomarc’h P, Bruzek J. 2011. Accuracy and reliability in sex determination from skulls: a comparison of Fordisc® 3.0 and the discriminant function analysis. Forensic Science International 208: 180.e1–180.e6.

Haas JD, Frongillo EA, Stepick CD, Beard JL, Hurtado L. 1980. Altitude, ethnic and sex difference in birth weight and length in Bolivia. Human Biology 52: 459–477.

Haas JE, Moreno-Black G, Frongillo EA, Pabon JA, Pareja GL, Ybarnegaray JU, Hurtado LG. 1982. Altitude and infant growth in Bolivia: a longitudinal study. American Journal of Physical Anthropology 59: 251–262.

Haas CJ, Zink A, Pálfi G, Szeimies U, Nerlich AG. 2000. Detection of leprosy in ancient human skeletal remains by molecular identification of Mycobacterium leprae. American Journal of Clinical Pathology 114: 428–436.

Habicht JP, Martorell R, Yarbrough C, Malina RM, Klein RE. 1974. Height and weight standards for preschool children. How relevant are ethnic differences in growth potential? Lancet 1: 611–614.

Habitat for Humanity. 2013. Health and Housing in Rural Egypt. Available online: http://www.habitatforhumanity.org.uk/projects/egypt-health-housing. Accessed May 2014.

Hadley C, Lindstrom D, Tessema F, Belachew T. 2008. Gender bias in the food insecurity experience of Ethiopian adolescents. Social Science & Medicine 66: 427–438.

Hagelberg E, Clegg JB. 1991. Isolation and characterization of DNA from archaeological bone. Proceedings of the Royal Society for Biological Sciences 244: 45–50.

Hammer O. 2002. Morphometrics – brief notes. Available online: http://folk.uio.no/ohammer/past/morphometry.pdf. Accessed February 2015.

Hanihara T. 2000. Frontal and facial flatness of major human populations. American Journal of Physical Anthropology 111: 105–134.

Harrell Jr. FR, Lee KL, Mark DB. 1996. Multivariate prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 15: 361–387.

Harris EF, Bailit HL. 1988. A principal components analysis of human odontometrics. American Journal of Physical Anthropology 75: 87–99.

Harris EF, Lease LR. 2005. Mesiodistal tooth crown dimensions of the primary dentition: a worldwide survey. American Journal of Physical Anthropology 128: 593–607.

Harris EF, Smith N. 2009. Accounting for measurement error: a critical but often overlooked process. Archives of Oral Biology 54s: s107–s117.

Harris SM, Case DT. 2012. Sexual dimorphism in the tarsal bones: implications for sex determination. Journal of Forensic Sciences 57: 295–305.

Harvati K. 2003. Quantitative analysis of Neanderthal temporal bone morphology using three- dimensional geometric morphometrics. American Journal of Physical Anthropology 120: 323–338.

Harvey PH, Kavanagh M, Clutton-Brock TH. 1978. Sexual dimorphism in primate teeth. Journal of Zoology 186: 475–485.

Hassan FA. 1988. The Predynastic of Egypt. Journal of World Prehistory 2: 135–185.

Hassan FA. 1997. The dynamics of a riverine civilization: a geoarchaeological perspective on the Nile Valley, Egypt. World Archaeology 29: 51–74.

392

Hassett B. 2011. Technical note: estimating sex using cervical canine odonometrics: a test using a known sex sample. American Journal of Physical Anthropology 146: 486–489.

Havelková P, Villotte S, Velemínský P, Poláček L, Dobisíkova M. 2011. Enthesopathies and activity patterns in the early Medieval Great Moravian population: evidence of division of labour. International Journal of Osteoarchaeology 21: 487–504.

Hawkey DE, Merbs CF. 1995. Activity-induced musculoskeletal stress markers (MSM) and subsistence strategy changes among ancient Hudson Bay Eskimos. International Journal of Osteoarchaeology 5: 324–338.

Hawkins DM, Basak SC, Mills D. 2003. Assessing model fit by cross-validation. Journal of Chemical Information and Modeling 43: 579–586.

Hayek L-AC, Heyer WR, Gascon C. 2001. Frog morphometrics: a cautionary tale. Alytes 18: 153–177.

He J, Rosen CJ, Adams DJ, Kream BE. 2006. Postnatal growth and bone mass in mice with IGH-1 haploinsufficiency. Bone 38: 826–835.

Hendrickx S, Vermeersch P. 2000. Prehistory: from the Palaeolithic to the Badarian Culture. In: Shaw I (ed.) The Oxford . Oxford University Press: Oxford.

Henn BM, Botigué LR, Gravel S, Wang W, Brisbin A, Byrnes JK, Fadhlaoui-Zid K, Zalloua PA, Moreni-Estrada A, Bertranpetit J, Bustamante CD, Comas D. 2012. Genomic ancestry of North Africans supports back-to-Africa migrations. PLoS Genetics 8: e1001297.

Hershkovitz I, Ring B, Kobyliansky E. 1990. Efficiency of cranial bilateral measurements in separating human populations. American Journal of Physical Anthropology 83: 307–319.

Higuchi R, Bowman B, Freiberger M, Ryder OA, Wilson AC. 1984. DNA sequence from the Quagga, an extinct member of the horse family. Nature 312: 282–284.

Hillson SW. 1979. Diet and dental disease. World Archaeology 11: 147–162.

Hillson S. 1996. Dental Anthropology. Cambridge University Press: Cambridge, UK.

Hincak Z, Drmić-Hofman I, Mihelić D. 2007. Anthropological analysis of Neolithic and early bronze age skeletons – a classical and molecular approach (East Slavonia, Croatia). Collegium Antropologicum 31: 1135–1141.

Hoffman H, Hudgins PA. 2002. Head and skull base features of nine Egyptian mummies: evaluation with high-resolution CT and reformation techniques. American Journal of Roentgenology 178: 1367–1376.

Hoffman MA, Hamroush HA, Allen RO. 1986. A model of urban development for the Hierakonpolis region from Predynastic through Old Kingdom times. Journal of the American Research Center in Egypt 23: 175–187.

Holden C, Mace R. 1999. Sexual dimorphism in stature and women’s work: a phylogenetic cross-cultural analysis. American Journal of Physical Anthropology 110: 27–45.

Holland TD. 1991. Sex assessment using the proximal tibia. American Journal of Physical Anthropology 85: 221–227.

Holliday TW. 1997. Postcranial evidence of cold adaptation in European Neandertals. American Journal of Physical Anthropology 104: 245–258.

Holliday TW, Ruff CB. 2001. Relative variation in human proximal and distal limb segment lengths. American Journal of Physical Anthropology 116: 26–33.

Holman DJ, Bennett KA. 1991. Determination of sex from arm bone measurements. American Journal of Physical Anthropology 84: 421–426.

393

Holroyd C, Cooper C, Dennison E. 2008. Epidemiology of osteoporosis. Best Practice & Research Clinical Endocrinology & Metabolism 22: 671–685.

Holt BM. 2003. Mobility in upper Paleolithic and Mesolithic Europe. Evidence from the lower limb. American Journal of Physical Anthropology 122: 200–215.

Hosken DJ, House CM. 2011. Sexual selection. Current Biology 21: R62–65.

Hosmer DW, Lemeshow S. 2000. Applied Logistic Regression. 2nd edition. John Wiley & Sons, Inc: New York.

Hosmer DW, Taber S, Lemeshow S. 1991. The importance of assessing the fit of logistic regression models: a case study. American Journal of Publis Health 81: 1630–1635.

Howells WW. 1957. The cranial vault: factors of size and shape. American Journal of Physical Anthropology 15: 159–192.

Howells WW. 1973. Cranial Variation in Man. A Study by Multivariate Analysis of Patterns of Differences among Recent Human Populations. Papers of the Peabody Museum of Archeology and Ethnology, Volume 67. Peabody Museum: Cambridge, MA.

Hubbe M, Neves WA. 2007. On the misclassification of human crania: are there any implications for assumptions about human variation. Current Anthropology 48: 285–288.

Hubbe M, Hanihara T, Harvati K. 2009. Climate signatures in the morphological differentiation of worldwide modern human populations. The Anatomical Record 292: 1720–1733.

Huber PJ. 1985. Projection pursuit. The Annals of Statistics 13: 435–475.

Huberty CJ. 1984. Issues in the use and interpretation of discriminant analysis. Psychological Bulletin 95: 156–171.

Huiskes R, Ruimerman R, van Lenthe GH, Janssen JD. 2000. Effects of mechanical forces on maintenance and adaptation of form in trabecular bone. Nature 405: 704–706.

Human Relations Area Files. 2014. Cultural Information for Education and Research. Available online: http://hraf.yale.edu/. Accessed May 2014.

Humphrey LT. 1998. Growth patterns in the modern human skeleton. American Journal of Physical Anthropology 105: 57–72.

Humphrey L, Belo S, Rousham E. 2012. Sex differences in infant mortality in Spitalfields, London, 1750–1839. Journal of Biosocial Science 44: 95–119.

Hunt DR, Albanese J. 2005. History and demographic composition of the Robert J. Terry anatomical collection. American Journal of Physical Anthropology 127: 406–417.

Hunter C, Clegg EJ. 1973. Changes in skeletal proportions of the rat in response to hypoxic stress. Journal of Anatomy 114: 201–219.

Hurtado AM, Hawkes K, Hill K, Kaplan H. 1985. Female subsistence strategies among Ache hunter-gatherers of eastern Paraguay. Human Ecology 13: 1–28.

Hussien FH, Sarry El-Din AM, El Samie Kandeel WA, El Banna RAE-S. 2009. Spinal pathological findings in ancient Egyptians of the Greco-Roman Period living in Bahriyah Oasis. International Journal of Osteoarchaeology 19: 613–627.

Hussien K, Matin E, Nerlich AG. 2013. Paleopathology of the juvenile Tutankhamun – 90th anniversary of discovery. Virchows Archiv 463: 475–479.

Iacumin P, Bocherens H, Chaix L, Marioth A. 1998. Stable carbon and nitrogen isotopes as dietary indicators of ancient Nubian populations (Northern Sudan). Journal of Archaeological Science 25: 293–301.

394

Ingemarsson I. 2003. Gender aspects of preterm birth. British Journal of Obstetrics & Gynaecology 110: 34–38.

Inoue M. 1990. Fourier analysis of the forehead shape of skull and sex determination by use of computer. Forensic Science International 47: 101–112.

Irish JD. 2006. Who were the ancient Egyptians? Dental affinities among Neolithic through Postdynastic peoples. American Journal of Physical Anthropology 129: 529–543.

Irish JD, Friedman R. 2010. Dental affinities of the C-group inhabitants of Hierakonpolis, Egypt: Nubian, Egyptian, or both? HOMO – Journal of Comparative Human Biology 61: 81–101.

Irish JD, Konigsberg L. 2007. The ancient inhabitants of Jebel Moya Redux: measures of population affinity based on dental morphology. International Journal of Osteoarchaeology 17: 138–156.

Isaac JL. 2005. Potential causes and life-history consequences of sexual size dimorphism in mammals. Mammal Review 35: 101–115.

İşcan MY. 1985. Osteometric analysis of sexual dimorphism in the sternal end of the rib. Journal of Forensic Sciences 30: 1090–1099.

İşcan MY, Kedici PS. 2003. Sexual variation in bucco-lingual dimensions in Turkish dentition. Forensic Science International 137: 160–164.

İşcan MY, Miller-Shaivitz P.1984a. Determination of sex from the femur in blacks and whites. Collegium Antropologicum 8: 169–175.

İşcan MY, Miller-Shaivitz P.1984b. Discriminant function sexing of the tibia. Journal of Forensic Sciences 29: 1087–1093.

İşcan MY, Shihai D. 1995. Sexual dimorphism in the Chinese femur. Forensic Science International 74: 79–87.

İşcan MY, Loth SR, Wright RK. 1984a. Age estimation from the rib by phase analysis: white males. Journal of Forensic Sciences 29: 1094–1104.

İşcan MY, Loth SR, Wright RK. 1984b. Metamorphosis at the sternal rib end: a new method to estimate age at death in white males. American Journal of Physical Anthropology 65: 147–156.

İşcan MY, Loth SR, Wright RK. 1985. Age estimation from the rib by phase analysis: white females. Journal of Forensic Sciences 30: 853–863.

İşcan MY, Loth SR, Wright RK. 1987. Racial variation in the sternal extremity of the rib and its effect on age determination. Journal of Forensic Sciences 32: 452–466.

İşcan MY, Yoshino M, Kato S. 1994. Sex determination from the tibia: standards for contemporary Japan. Journal of Forensic Science 39: 785–792.

Jantz RL, Meadows Jantz L. 2000. Secular change in craniofacial morphology. American Journal of Human Biology 12: 327–338.

Jeong K-S, Park J-H, Lee S. 2005. The analysis of X-chromosome inactivation-related gene expression from single mouse embryo with sex-determination. Biochemical and Biophysical Research Communications 333: 803–807.

Johnson AL. 2014. Exploring adaptive variation among hunter-gatherers with Binford’s frames of reference. Journal of Archaeological Research 22: 1–42.

Johnson AL, Lovell NC. 1994. Biological differentiation at Predynastic Naqada, Egypt: an analysis of dental morphological traits. American Journal of Physical Anthropology 93: 427–433.

395

Johnston FE, Laughlin WS, Harper AB, Ensroth AE. 1982. Physical growth of St. Lawrence Island Eskimos: body size, proportion, and composition. American Journal of Physical Anthropology 58: 397–401.

Joint Committee for Guides in Metrology Working Group. 2008. International Vocabulary of Metrology – Basic and General Concepts and Associated Terms. 3rd edition. JCGM 200: 2012.

Jolliffe IT. 2002. Principal Component Analysis. 2nd edition. Springer: New York.

Judd M. 2004. Trauma in the city of Kerma: ancient versus modern injury patterns. International Journal of Osteoarchaeology 14: 34–51.

Judd MA. 2006. Continuity of interpersonal violence between Nubian communities. American Journal of Physical Anthropology 131: 324–333.

Julian CG, Wilson MJ, Moore LG. 2009. Evolutionary adaptation to high altitude: a view from in utero. American Journal of Human Biology 21: 614–622.

Jungers WL. 1984. Aspects of size and scaling in primate biology with special reference to the locomotor skeleton. Yearbook of Physical Anthropology 27: 73–97.

Jungers WL. 1985. Body size and scaling of limb proportions in primates. In: Jungers WL (ed.). Size and Scaling in Primate Biology. Plenum Press: New York.

Jungers WL. Falsetti AB, Wall CE. 1995. Shape, relative size, and size-adjustments in morphometrics. Yearbook of Physical Anthropology 38: 137–161.

Junker H. 1914. The Austrian excavations, 1914. Excavations of the Vienna Imperial Academy of Sciences at the pyramids of Gizah. The Journal of Egyptian Archaeology 1: 250–253.

Kaestle FA, Horsburgh KA. 2002. Ancient DNA in anthropology: methods, applications and ethics. Yearbook of Physical Anthropology 45: 92–130.

Kaiser HF. 1960. The application of electronic computers to factor analysis. Educational and Psychologival Measurement 20: 141–151.

Kajanoja P. 1966. Sex determination of Finnish crania by discriminant function analysis. American Journal of Physical Anthropology 24: 29–34.

Kalmey JK, Rathbun TA. 1996. Sex determination by discriminant function analysis of the petrous portion of the temporal bone. Journal of Forensic Sciences 41: 865–867.

Kang H. 2013. The prevention and handling of the missing data. Korean Journal of Anesthesiology 64: 402–406.

Kanis JA, Pitt FA. 1992. Epidemiology of osteoporosis. Bone 13: S7–S15.

Kappeler PM. 1990. The evolution of sexual size dimorphism in prosimian primates. American Journal of Primatology 21: 201–214.

Karakas HM, Harma A, Alicioglu B. 2013. The subpubic angle in sex determination: anthropomentric measurements and analysis on Anatolian Caucasians using multidetector computed tomography datasets. Journal of Forensic and Legal Medicine 20: 1004–1009.

Karlberg J, Jalil F, Lam B, Low L, Yeung CY. 1994. Linear growth retardation in relation to the three phases of growth. European Journal of Clinical Nutrition 48: S25–S44.

Katona P, Katona-Apte J. 2008. The interaction between nutrition and infection. Clinical Infectious Diseases 46: 1582–1588.

396

Katsavrias EG, Halazonetis DJ. 2005. Condyle and fossa shape in class II and class II skeletal patterns: a morphometric tomographic study. American Journal of Orthodontics and Dentofacial Orthopedics 128: 337–346.

Katz HG. 1980. The influence of under nutrition on learning performance in rodents. Nutrition Research Reviews 50: 767–783.

Katzmarzyk PT, Leonard WR. 1998. Climcatic influences on human body size and proportions: ecological adaptations and secular trends. American Journal of Physical Anthropology 106: 483–503.

Kaushal S, Patnaik VVG, Agnihotri G. 2003. Mandibular canines in sex determination. Journal of the Anatomical Society of India 52: 119–124.

Keita SOY. 1988. An analysis of crania from Tell-Duweir using multiple discriminant functions. American Journal of Physical Anthropology 75: 375–390.

Keita SOY. 1990. Studies of ancient crania from Northern Africa. American Journal of Physical Anthropology 83: 35–48.

Keita SOY. 1992. Further studies of crania from ancient northern Africa: an analysis of crania from first Dynasty Egyptian tombs, using multiple discriminant functions. American Journal of Physical Anthropology 87: 245–254.

Keita SOY. 1993. Studies and comments on ancient Egyptian biological relationships. History in Africa 20: 129–154.

Keita SOY. 2003. A study of vault porosities in early Upper Egypt from the Badarian through Dynasty I. World Archaeology 35: 210–222.

Keita SOY. 2004. Exploring northeast African metric craniofacial variation at the individual level: a comparative study using principal components analysis. American Journal of Human Biology 16: 679–689.

Keita SOY, Boyce AJ. 2001. Diachronic patterns of dental hypoplasias and vault porosities during the Predynastic in the Naqada region, Upper Egypt. American Journal of Human Biology 13: 733–743.

Keita SOY, Boyce AJ. 2006. Variation in porotic hyperostosis in the Royal Cemetery complex at Abydos, Upper Egypt: a social interpretation. Antiquity 80: 64–73.

Kelaita M, Dias PAD, Aguilar-Cucurachi MS, Canales-Espinosa D, Cortés-Ortiz L. 2011. Impact of intrasexual selection on sexual dimorphism and testes size in the Mexican howler monkeys Alouatta palliate and A. pigra. American Journal of Physical Anthropology 146: 179–187.

Kelley MA. 1978. Phenice’s visual sexing technique for the os pubis: a critique. American Journal of Physical Anthropology 48: 121–122.

Kelley MA. 1979. Sex determination with fragmentary skeletal remains. Journal of Forensic Sciences 24: 154–158.

Kemkes A, Göbel T. 2006. Metric assessment of the “mastoid triangle” for sex determination: a validation study. Journal of Forensic Sciences 51: 985–989.

Kemkes-Grottenthaler A, Löbig F, Stock F. 2002. Mandibular ramus flexure and gonial eversion as morphologic indicators of sex. HOMO – Journal of Comparative Human Biology 53: 97–111.

Kemp BJ. 2006. Ancient Egypt: Anatomy of a Civilization. 2nd edition. Routledge: Oxon.

Kennedy KAR. 1995. But Professor, why teach race identification if races don’t exist? Journal of Forensic Sciences 40: 797–800.

397

Kennedy KAR, Chiment J, Disotell T, Meyers D. 1984. Principal-components analysis of Prehistoric south Asian crania. American Journal of Physical Anthropology 64: 105–118.

Kenyhercz MW, Klales AR, Kenyhercz WE. 2014. Molar size and shape in the estimation of biological ancestry: a comparison of relative cusp location using geometric morphometrics and interlandmark distances. American Journal of Physical Anthropology 153: 269–279.

Khamis MF, Taylor JA, Malik SN, Townsend GC. 2014. Odonotometric sex variation in Malaysians with application to sex prediction. Forensic Science International 234: 183.e1–183.e7.

Khangura RK, Sircar K, Singh S, Rastogi V. 2011. Sex determination using mesiodistal dimension of permanent maxillary incisors and canines. Journal of Forensic Dental Sciences 3: 81–85.

Kharoshah MA, Almadani O, Ghaleb SS, Zaki MK, Fattah YA. 2010. Sexual dimorphism of the mandible in a modern Egyptian population. Journal of Forensic and Legal Medicine 17: 213–215.

Khera R, Jain S, Lodha R, Ramakrishnan S. 2013. Gender bias in child care and child health: global patterns. Archives of Disease in Childhood: ePub ahead of print.

Khoury MJ, Marks JS, McCarthy BJ, Zaro SM. 1985. Factors affecting the sex differential in neonatal mortality: the role of respiratory distress syndrome. American Journal of Obstetrics & Gynecology 151: 777–782.

Kieser JA, Moggi-Cecchi J, Groeneveld HT. 1992. Sex allocation of skeletal material by analysis of the proximal tibia. Forensic Science International 56: 29–36.

Kim Y-S, Seok Oh C, Lee SJ, Bum Park J, Ju Kim M, Hoon Shin D. 2011. Sex determination of Joseon people skeletons based on anatomical, cultural and molecular biological clues. Annals of Anatomy 193: 539–543.

Kim H-Y. 2013. Statistical notes for clinical researchers: assessing normal distribution (2) using skewness and kurtosis. Restorative Dentistry & Endodontics 38: 52–54.

Kimmerle EH, Jantz RL, Konigsberg LW, Baraybar JP. 2008a. Skeletal estimation and identification in American and east European populations. Journal of Forensic Sciences 53: 524–532.

Kimmerle EH, Ross A, Slice D. 2008b. Sexual dimorphism in America: geometric morphometric analysis of the craniofacial region. Journal of Forensic Sciences 53: 54–57.

Kindschuh SC, Dupras TL, Cowgill LW. 2010. Determination of sex from the hyoid bone. American Journal of Physical Anthropology 143: 279–284.

King CA, İşcan MY, Loth SR. 1998. Metric and comparative analysis of sexual dimorphism in the Thai femur. Journal of Forensic Sciences 43: 954–958.

Kinnear PR, Gray CD. 2009. SPSS 16 Made Simple. Psychology Press Ltd: Sussex.

Kjellström A. 2004. Evaluations of sex assessment using weighted traits on incomplete skeletal remains. International Journal of Osteoarchaeology 14: 360–373.

Klales AR, Ousley SD, Vollner JM. 2012. A revised method of sexing the human innominate using Phenice’s nonmetric traits and statistical methods. American Journal of Physical Anthropology 149: 104–114.

Klemm DD, Klemm R. 2001. The building stones of ancient Egypt – a gift of its geology. African Earth Sciences 33: 631–642.

398

Kloos H, David R. 2002. The paleoepidemiology of schistosomiasis in ancient Egypt. Human Ecology Review 9: 14–25.

Knudson KJ, Stojanowski CM. 2008. New directions in bioarchaeology: recent contributions to the study of human social identities. Journal of Archaeological Research 16: 397–432.

Kobyliansky E, Livshits G, Pavlovsky O. 1995. Population biology of human aging: methods of assessment and sex variation. Human Biology 67: 87–109.

Komar DA, Grivas C. 2008. Manufactured populations: what do contemporary reference skeletal collections represent? A comparative study using the Maxwell Museum documented collection. American Journal of Physical Anthropology 137: 224–233.

Komlos J. 1990. Height and social status in Eighteenth-century Germany. The Journal of Interdisciplinary History 20: 607–621.

Konigsberg LW, Frankenberg SR. 1992. Estimation of age structure in anthropological demography. American Journal of Physical Anthropology 89: 235–256.

Konigsberg LW, Hens SM. 1998. Use of ordinal categorical variables in skeletal assessment of sex from the cranium. American Journal of Physical Anthropology 107: 97–112.

Kranioti EF, Michalodimitrakis M. 2009. Sexual dimorphism of the humerus in contemporary Cretans – a population-specific study and a review of the literature. Journal of Forensic Sciences 54: 996–1000.

Kranioti EF, Bastir M, Sánchez-Meseguer A, Rosas A. 2009. A geometric morphometric study of the Cretan humerus for sex identification. Forensic Science International 189: 111.e1– 111.e8.

Krings M. 1996. Moleculargenetic analysis of contemporary and ancient Nile Valley populations. Bulletin de la Société Suisse d'Anthropologie 2: 23–29.

Krings M, Salem AH, Bauer K, Geisert H, Malek AK, Chaix L, Simon C, Welsby D, Di Rienzo A, Utermann G, Sajantila A, Pääbo S, Stoneking M. 1999. mtDNA analysis of Nile River Valley populations: a genetic corridor or a barrier to migration? American Journal of Human Genetics 64: 1166–1176.

Kroenke KR. 2010. The Provincial Cemeteries of Naga ed-Deir: A Comprehensive Study of Tomb Models Dating from the Late Old Kingdom to the Late Middle Kingdom. Thesis submitted for the degree of Doctor of Philosophy in the Graduate Division of the University of California, Berkeley.

Krogman WM, İşcan MY. 1986. The Human Skeleton in Forensic Medicine. 2nd ed. Charles Thomas: Springfield, IL.

Kronenberg HM. 2003. Developmental regulation of the growth plate. Nature 423: 332–336.

Kujanová M, Pereira L, Fernandes V, Pereira JB, Černy V. 2009. Near Eastern Neolithic genetic input in a small oasis of the Egyptian Western Desert. American Journal of Physical Anthropology 140: 336–346.

Kuraszkiewicz, K (ed.) In press. Saqqara V. Old Kingdom Structures between the Step Pyramid Complex and the Dry Moat, Part II: Studies. Centre of the Mediterranean Archaeology of the Polish Academy of Sciences: Warsaw.

Kurki HK. 2007. Protection of obstetric dimensions in a small-bodied human sample. American Journal of Physical Anthropology 133: 1152–1165.

Kuswandari S, Nishino M. 2004. The mesiodistal crown diameters of primary dentition in Indonesian Javanese children. Archives of Oralal Biology 49: 217–222.

399

Kusuoka H, Hoffman JIE. 2002. Advice on statistical analysis for circulation research. Circulation Research 91: 662–671.

L’Abbé EN, Kenyhercz M, Stull KE, Keough N, Nawrocki S. 2013. Application of Fordisc 3.0 to explore differences among crania of North American and South African blacks and whites. Journal of Forensic Sciences 58: 1579–1583.

Lacey KA, Parkin JM, Steel GH. 1973. Relationship between bone age and dental development. Lancet 302(7831): 736–737.

Lachenbruch PA, Mickey MR. 1968. Estimation of error rates in discriminant analysis. Technometrics 10: 1–11.

Lalremruata A, Ball M, Bianucci R, Welte B, Nerlich AG, Kun JFJ, Pusch CM. 2013. Molecular identification of falciparum malaria and human tuberculosis co-infection in mummies from the Fayum Depression (Lower Egypt). PLoS ONE 8: e60307.

Landauer CA. 1962. A factor analysis of the facial skeleton. Human Biology 34: 239–253.

Landis JR, Koch GG. 1977. The measurement of observer agreement for categorical data. Biometrics 33: 159–174.

Langley-Shirley N, Jantz RL. 2010. A Bayesian approach to age estimation in modern American from the clavicle. Journal of Forensic Sciences 55: 571–583.

Lanyon LE. 1992. Control of bone architecture by functional load bearing. Journal of Bone and Mineral Research 7: S369–S375.

Larsen CS. 1982. The anthropology of St. Catherines island 3. Prehistoric human biological adaptation. Anthropological Papers of the American Museum of Natural History 57: 157– 276.

Larsen CS. 1995. Biological changes in human populations with agriculture. Annual Review of Anthropology 24: 185–213.

Larsen CS. 2006. The agricultural revolution as an environmental catastrophe: implications for health and lifestyle in the Holocene. Quarternary International 150: 12–20.

Larson SC. 1931. The shrinkage of the coefficient of multiple correlation. Journal of Educational Psychology 22: 45–55.

Larson MG. 2008. Analysis of variance. Circulation 117: 115–121.

LaValley MP. 2008. Logistic regression. Circulation 117: 2395–2399.

Lawling AM, Polly PD. 2010. Geometric morphometrics: recent applications to the study of evolution and development. Journal of Zoology 280: 1–7.

Lazenby RA. 1998. Second metacarpal midshaft geometry in an historic cemetery sample. American Journal of Physical Anthropology 106: 157–167.

Leach S, Lewis M, Chenery C, Müldner G, Eckardt H. 2009. Migration and diversity in Roman Britain: a multidisciplinary approach to the identification of immigrants in Roman York, England. American Journal of Physical Anthropology 140: 546–561.

Leatherman TL, Carey JW, Thomas RB. 1995. Socioeconomic change and patterns of growth in the Andes. American Journal of Physical Anthropology 97: 307–321.

Leathers A, Edwards J, Armelagos GJ. 2002. Assessment of classification of crania using Fordisc 2.0: Nubian X-Group test. American Journal of Physical Anthropology Supplement 34: 99–100.

400

Lee E-K, Cook D, Klinke S, Lumley T. 2005. Projection pursuit for exploratory supervised classification. SFB 649 Discussion Paper 2005-06. SFB: Berlin.

Lehner M. 1997. End of Field Seson Full Report. Pyramids: The Inside Story. Available online: http://www.pbs.org/wgbh/nova/pyramid/excavation/fullreport.html. Accessed June 2014.

Lehner M, Kamel M, Tavares A. 2008. Giza Plateau Mapping Project. Season 2008 Preliminary Report. Giza Occassional Papers 4. Ancient Egypt Research Associates, Inc: Boston.

Leigh SR. 1992. Patterns of variation in the ontogeny of primate body size dimorphism. Journal of Human Evolution 23: 27–50.

Leigh SR. 1995. Socioecology and the ontogeny of sexual size dimorphism in anthropoid primates. American Journal of Physical Anthropology 97: 339–356.

Leigh SR, Shea BT. 1995. Ontogeny and the evolution of adult body size dimorphism in apes. American Journal of Primatology 36: 37–60.

Leigh SR, Shea BT. 1996. Ontogeny of body size variation in African apes. American Journal of Physical Anthropology 99: 43–65.

Leonard WR, DeWalt KM, Standbusry JP, McCaston MK. 1995. Growth differences between children of highland and coastal Ecuador. American Journal of Physical Anthropology 98: 47–57.

Leonard WR, Spencer GJ, Galloway VA, Osipova L. 2002. Declining growth status of indigenous Siberian children in post-Soviet Russia. Human Biology 74: 179–209.

Lesko BS. 1991. Women’s monumental mark on ancient Egypt. The Biblical Archaeologist 54: 4–15.

Lesko LH (ed). 1994. Pharaoh’s Workers: The Villagers of Deir El Medina. Cornell University Press: USA.

Leutenegger W, Cheverud J. 1982. Correlates of sexual dimorphism in primates: ecological and size variables. International Journal of Primatology 3: 387–402.

LeVeau, Bernhardt DB. 1984. Developmental biomechanics: effect of forces on the growth, development, and maintenance of the human body. Physical Therapy 64: 1874–1882.

Lewis AB. 1991. Comparisons between dental and skeletal ages. The Angle Orthodontist 61: 87–92.

Lewis AB, Garn SM. 1960. The relationship between tooth formation and other maturational factors. The Angle Orthodontist 30: 70–77.

Lieberman DE, Devlin MJ, Pearson OM. 2001. Articular area responses to mechanical loading: effects of exercise, age, and skeletal location. American Journal of Physical Anthropology 116: 266–277.

Lillie FR. 1917. Sex-determination and sex-differentiation in mammals. Proceedings of the National Academy of Sciences USA 3: 464–470.

Lin Z, Kondo T, Minamino T, Ohtsuji M, Nishigami J, Takayasu T, Sun R, Ohshima T. 1995. Sex determination by polymerase chain reaction on mummies discovered at Taklamakan desert in 1912. Forensic Science International 75: 197–205.

Lindenfors P. 2002. Sexually antagonistic selection on primate size. Journal of Evolutionary Biology 15: 595–607.

Lindenfors P, Tullberg BS. 1998. Phylogenetic analyses of primate size evolution: the consequences of sexual selection. Biological Journal of the Linnean Society 64: 413–447.

401

Ling JYK, Wong RWK. 2007. Tooth dimensions of Southern Chinese. HOMO – Journal of Comparative Human Biology 58: 67–73.

Little RJA. 1988. A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association 83: 1198–1202.

Little BB, Malina RM, Pena Reyes ME, Chavez GB. 2013. Altitude effects on growth of indigenous children in Oaxaca, Southern Mexico. American Journal of Physical Anthropology 152: 1–10.

Liu K. 1988. Measurement error and its impact on partial correlation and multiple linear regression analyses. American Journal of Epidemiology 127: 864–874.

Liu J-L, LeRoith D. 1999. Insulin-like growth factor I is essential for postnatal growth in response to growth hormone. Endocrinology 140: 5178–5184.

Liversidge HM, Molleson TI. 1999. Deciduous tooth size and morphogenetic fields in children from Christ Church, Spitalfields. Archives of Oral Biology 44: 7–13.

Livshits G, Roset A, Yakovenko K, Trofimov S, Kobyliansky E. 2002. Genetics of human body size and shape: body proportions and indices. Annals of Human Biology 29: 271–289.

Lockwood CA, Lynch JM, Kimbel WH. 2002. Quantifying temporal bone morphology of great apes and humans: an approach using geometric morphometrics. Journal of Anatomy 201: 447–464.

Lockwood CA, Menter CG, Moggi-Cecchi J, Keyser AW. 2007. Extended male growth in a fossil hominin species. Science 318: 1443–1446.

Lopez-Contreras ME, Farid-Coupal N, Landaeta de Jimenez M, Laxague G. 1983. Sex dimorphism of height in two Venezuelan populations. In: Borms J, Sand R, Susanne C, Hebbelinck M (eds.) Human Growth and Development. Plenum: New York.

López-Martin JM, Ruiz-Olmo J, Padró I. 2006. Comparison of skull measurements and sexual dimorphism between the Minorcan pine marten (Martes martes minoricensis) and the Iberian pine marten (M. m. martes): a case of insularity. Mammalian Biology 71: 13–24.

López Camelo JS, Campaña H, Santos R, Poletta FA. 2006. Effect of the interaction between high altitude and socioeconomic factors on birth weight in a large sample from South America. American Journal of Physical Anthropology 129: 305–310.

Loth SR, Henneberg M. 2001. Sexually dimorphic mandibular morphology in the first few years of life. American Journal of Physical Anthropology 115: 179–186.

Lovejoy CO. 1985. Dental wear in the Libben population: its functional pattern and role in the determination of adult skeletal age at death. American Journal of Physical Anthropology 68: 47–56.

Lovell NC. 1989. Test of Phenice’s technique for determining sex from the os pubis. American Journal of Physical Anthropology 79: 117–120.

Lovell NC, Whyte I. 1999. Patterns of dental enamel defects at ancient Mendes, Egypt. American Journal of Physical Anthropology 110: 69–80.

Lucas A, Morley E, Cole TJ. 1998. Randomised trial of early diet in preterm babies and later intelligence quotient. British Medical Journal 317: 1481–1487.

Lucotte G, Mercier G. 2003. Brief communication: Y-chromosome haplotypes in Egypt. American Journal of Physical Anthropology 121: 63–66.

Luis JR, Rowland DJ, Regueiro M, Caeiro B, Cinnioğlu C, Roseman C, Underhill PA, Cavalli- Sforza LL, Herrera RJ. 2004. The Levant versus the Horn of Africa: evidence for

402

bidirectional corridors of human migrations. American Journal of Human Genetics 74: 532–544.

Lukacs JR. 1996. Sex differences in dental caries rates with the origin of agriculture in South Asia. Current Anthropology 37: 147–153.

Luptáková L, Bábelová A, Omelka R, Kolena B, Vondrákova M, Bauerová M. 2011. Sex determination of early medieval individuals through nested PCR using a new primer set in the SRY gene. Forensic Science International 207: 1–5.

Lynnerup N, Kjeldsen H, Zweihoff R, Heegaard S, Jacobsen C, Heinemeier J. 2010. Ascertaining year of birth/age at death in forensic cases: a review of conventional methods and methods allowing for absolute chronology. Forensic Science International 201: 74–78.

Lyon BE, Montgomerie R. 2012. Sexual selection is a form of social selection. Philosophical Transactions of the Royal Society Biological Sciences 367: 22–66.

Maat GJR, Mastwijk RW, Van der Velde EA. 1997. On the reliability of non-metrical morphological sex determination of the skull compared with that of the pelvis in the Low Countries. International Journal of Osteoarchaeology 7: 575–580.

Macaluso PJ. 2010. Sex determination potential of permanent maxillary molar cusp diameters. Journal of Forensic Odonto-Stomatology 28: 22–31.

Macaluso PJ. 2011. Investigation on the utility of permanent maxillary molar cusp areas for sex estimation. Forensic Science Medicine and Pathology 7: 233–247.

Mace AC. 1909. The Early Dynastic Cemeteries of Naga-ed-Dêr. Part II. University of California Publications: Egyptian Archaeology, Volume III. J.C. Hinrichs: Leipzig.

MacLaughlin SM, Bruce MF. 1985. A simple univariate technique for determining sex from fragmentary femora: its application to a Scottish short cist population. American Journal of Physical Anthropology 67: 413–417.

MacLaughlin SM, Bruce MF. 1986. Population variation in sexual dimorphism in the human innominate. Human Evolution 1: 221–231.

MacLaughlin SM, Bruce MF. 1990. The accuracy of sex identification in European skeletal remains using the Phenice characters. Journal of Forensic Sciences 35: 1384–1392.

Mahmoud LA, Ibrahim AA, Ghonem HR, Jouvenceaux A. 1987. Human blood groups in Dakahlya, Egypt. Annals of Human Biology 14: 487–493.

Mahoney P. 2006. Dental microwear from Natufian hunter-gatherers and early Neolithic farmers: comparisons within and between samples. American Journal of Physical Anthropology 130: 308–319.

Majumder PP, Gupta R, Mukhopadhyay B, Bharati P, Rpy SK, Masali M, Sloan AW, Basu A. 1986. Effects of altitude, ethnicity-religion, geographical distance, and occupation on adult antropometric characters of Eastern Himalayan populations. American Journal of Physical Anthropology 70: 377–393.

Malik SL, Singh IP. 1978. Growth trends among male Bods of Ladakh – a high altitude population. American Journal of Physical Anthropology 48: 171–176.

Malina RM, Little BB, Buschang PH, DeMoss J, Selby HA. 1985. Socioeconomic variation in the growth status of children in a subsistence agricultural community. American Journal of Physical Anthropology 68: 385–391.

Mall G, Graw M, Gehring K, Hubig M. 2000. Determination of sex from femora. Forensic Science International 113: 315–321.

403

Mall G, Hubig M, Büttner A, Kuznik J, Penning R, Graw M. 2001. Sex determination and estimation of stature from the longbones of the arm. Forensic Science International 117: 23–30.

Malmström H, Storå J, Dalén L, Holmlund G, Götherström A. 2005. Extensive human DNA contamination in extracts from ancient dog bones and teeth. Molecular Biology and Evolution 22: 2040–2047.

Manni F, Leonardi P, Barakat A, Rouba H, Heyer E, Klintschar M, McElreavey K, Quintana- Murci L. 2002. Y-chromosome analysis in Egypt suggests a genetic regional continuity in Northeastern Africa. Human Biology 74: 645–658.

Marchi D. 2008. Relationships between lower limb cross-sectional geometry and mobility: the case of a Neolithis sample from Italy. American Journal of Physical Anthropology 137: 188–200.

Marchi D, Sparacello VS, Holt BM, Formicola V. 2006. Biomechanical approach to the reconstruction of activity patterns in Neolithic Western Liguria, Italy. American Journal of Physical Anthropology 131: 447–455.

Marcus J. 2008. The archaeological evidence for social evolution. Annual Review of Anthropology 37: 251–266.

Marino EA. 1995. Sex estimation using the first cervical vertebra. American Journal of Physical Anthropology 97: 127–133.

Marlow EJ, Pastor RF. 2011. Sex determination using the second cervical vertebra – a test of the method. Journal of Forensic Sciences 56: 165–169.

Marlowe F. 2000. Paternal investment and the human mating system. Behavioural Processes 51: 45–61.

Marota I, Basile C, Ubaldi M, Rollo F. 2002. DNA decay rate in papyri and human remains from Egyptian archaeological sites. American Journal of Physical Anthropology 117: 310–318.

Márquez-Grant N. 2005. The presence of African individuals in punic populations from the island of Ibiza (Spain): contributions from physical anthropology. Mayurqa 30: 611–637.

Martin ES. 1936. A study of an Egyptian series of mandibles with special reference to mathematical methods of sexing. Biometrika 28: 149–178.

Martin DL, Armelagos GJ, Goodman AH, Van Gerven DP. 1984. The effects of socioeconomic changes in prehistoric Africa: Sudanese Nubia as a case study. In: Cohen MN, Armelagos GJ (eds.) Paleopathology at the Origins of Agriculture. Academic Press Inc: Orlando, FL.

Martorell R, Yarbrough C, Klein RE, Lechtig A. 1979. Maturation, body size, and skeletal maturation: interrelationships and implications for catch-up growth. Human Biology 51: 371–389.

Masali M. 1972. Body size and proportions as revealed by bone measurements and their meaning in environmental adaptation. Journal of Human Evolution 1: 187–197.

Masali M, Chiarelli B. 1972. Demographic data on the remains of ancient Egyptians. Journal of Human Evolution 1: 161–169.

Masnicová S, Beňuš R. 2003. Developmental anomalies in skeletal remains from the Great Moravia and Middle Ages cemeteries at Devín (Slovakia). International Journal of Osteoarchaeology 13: 266–274.

Mastrangelo P, De Luca S, Sánchez-Mejorada G. 2011a. Sex assessment from carpals bones: discriminant function analysis in a contemporary Mexican sample. Forensic Science International 209: 196.e1–196.e15.

404

Mastrangelo P, De Luca S, Alemán I, Botela MC. 2011b. Sex assessment from the carpals bones: discriminant function analysis in a 20th century Spanish sample. Forensic Science International 206: 216.e1–216.e10.

Matheson CD, Loy TH. 2001. Genetic sex identification of 9,400-year-old human skull samples from Çayönü Tepesi, Turkey. Journal of Archaeological Science 28: 569–575.

Mauras N, Rogol AD, Haymond MW, Veldhuis JD. 1996. Sex steroids, growth hormone, insulin- like growth factor-1: neuroendocrine and metabolic regulation in puberty. Hormone Research 45: 74–80.

May R, Kim D, Mote-Watson D. 2013. Change in weight-for-length status during the first three months: relationships to birth weight and implications for metabolic risk. American Journal of Physical Anthropology 150: 5–9.

Mays S, Faerman M. 2001. Sex identification in some putative infanticide victims from Roman Britain using ancient DNA. Journal of Archaeological Science 28: 555–559.

McGuire RH. 1983. Breaking down cultural complexity: inequality and heterogeneity. Advances in Archaeological Method and Theory 6: 91–142.

McHenry HM. 1992. Body size and proportions in early hominids. American Journal of Physical Anthropology 87: 407–431.

McMillen MM. 1979. Differential mortality by sex in fetal and neonatal deaths. Science 204: 89– 90.

McMullan B. 2013. Comparative study of metric sexing software using the os coxa. Poster presentation at the 82nd Annual Meeting of the American Association of Physical Anthropologists. 9–13 April, 2013; Knoxville, Tennessee.

McNulty KP, Vinyard CJ. 2015. Morphometry, geometry, function, and the future. The Anatomical Record 298: 328–333.

Meadows L, Jantz RL. 1995. Allometric secular change in the long bones from the 1800s to the present. Journal of Forensic Sciences 40: 762–767.

Meadows -Jantz L, Jantz RL. 1999. Secular change in long bone length and proportion in the United States, 1800–1970. American Journal of Physical Anthropology 110: 57–67.

Meggers BJ. 1954. Environmental limitation on the development of culture. American Anthropologist 56: 801–824.

Meindl RS, Lovejoy CO. 1985. Ectocranial suture closure: a revised method for the determination of skeletal age at death based on the lateral-anterior sutures. American Journal of Physical Anthropology 68: 57–66.

Meindl RS, Russell KF. 1998. Recent advances in method and theory in palaeodemography. Annual Review of Anthropology 27: 375–399.

Meindl RS, Lovejoy CO, Mensforth RP, Don Carlos L. 1985. Accuracy and direction of error in the sexing of the skeleton: implications for palaeodemography. American Journal of Phhysical Anthropology 68: 79–85.

Menard S. 2010. Logistic Regression: From Introductory to Advanced Concepts and Applications. Sage Publications Inc; Thousand Oaks, CA.

Mendes M, Pala A. 2003. Type I error rate and power of three normality tests. Pakistan Journal of Information and Technology 2: 135–139.

Mensforth RP. 1990. Paleodemography of the Carlston Annis (Bt-5) Late Archaic skeletal population. American Journal of Physical Anthropology 82: 81–99.

405

Meskell L. 1998. An archaeology of social relations in an Egyptian village. Journal of Archaeological Method and Theory. 5: 209–243.

Meskell L. 1999. Archaeologies of life and death. American Journal of Archaeology 103: 181– 199.

Messer E. 1986. The “small but healthy” hypothesis: historical, political, and ecological influences on nutritional standards. Human Ecology 14: 57–75.

Meyer E, Wiese M, Brichhaus H, Claussen M, Klein A. 2000. Extraction and amplification of authentic DNA from ancient human remains. Forensic Science International 113: 87–90.

Meyer C, Nicklisch N, Held P, Fritsch B, Alt KW. 2011. Tracing patterns of activity in the human skeleton: an overview of methods, problems, and limits of interpretation. HOMO – Journal of Comparative Human Biology 62: 202–217.

Meyers LS, Gamst G, Guarino AJ. 2006. Applied Multivariate Research. Design and Interpretation. Sage Publications: Thousand Oaks, California.

Migliano AB, Vinicius L, Lahr MM. 2007. Life history trade-offs explain the evolution of human pygmies. Proceedings of the National Academy of Sciences USA 104: 20216–20219.

Milella M, Belcastro MG, Zollikofer CPE, Mariotti V. 2012. The effect of age, sex, and physical activity on entheseal morphology in a contemporary Italian skeletal collection. American Journal of Physical Anthropology 148: 379–388.

Miles J, Shevlin M. 2001. Applying Regression and Correlation. A Guide for Students. Sage Publications Ltd: London.

Miller J, Haden P. 2006. Statistical Analysis with the General Linear Model. Creative Commons: California.

Milner GR, Wood JW, Boldsen JL. 2000. Chapter 16: Paleodemography. In: Katzenberg MA, Saunders SR (eds.) Biological Anthropology of the Human Skeleton. Wiley-Liss Inc: New York.

Mitani JC, Gros-Louis J, Richards AF. 1996. Sexual dimorphism, the operational sex ratio, and the intensity of male competition in polygynous primates. The American Naturalist 147: 966–980.

Mitteroecker P, Gunz P, Winhhager S, Schaefer K. 2013. A brief review of shape, form, and allometry in geometric morphometrics, with applications to human facial morphology. Hystrix: The Italian Journal of Mammalogy 24: 59–66.

Mohan S, Richman C, Guo R, Amaar Y, Donahue LR, Wergedal J, Baylink DJ. 2003. Insulin-like growth factor regulates peak bone mineral density in mice by both growth hormone- dependent and -independent mechanisms. Endocrinology 144: 929–936.

Molnar P. 2006. Tracing prehistoric activities: musculoskeletal stress marker analysis of a Stone-Age population on the island of Gotland in the Baltic Sea. American Journal of Physical Anthropology 129: 12–23.

Moneim AWA, Hady ARH, Maaboud ARM, Fathy HM, Hamed AM. 2008. Identification of sex depending on radiological examination of foot and patella. American Journal of Forensic Medicine and Pathology 29: 136–140.

Moorad JA. 2013. Multi-level sexual selection: individual and family-level selection for mating success in a historical human population. Evolution 67: 1635–1648.

Moore LG. 2000. Human genetic adaptation to high altitude. High Altitude Medicine & Biology 2: 257–279.

406

Moore LG, Niermeyer S, Zamudio S. 1998. Human adaptation to high altitude: regional and life- cycle perspectives. Yearbook of Physical Anthropology 41: 25–64.

Moore KL, Persaud TVN. 2008. The Developing Human: Clinically Oriented Embryology. 8th edtn. Saunders Elsevier: Philadelphia, PA.

Morton SG. 1844. Crania Ægyptiaca. Transactions of the American Philosophical Society, volume 9. Madden & Co: London.

Moshiri M, Scarfe WC, Hilgers ML, Scheetz JP, Silveira AM, Farman AG. 2007. Accuracy of linear measurements from imaging plate and lateral cephalometric images derived from cone-beam computed tomography. American Journal of Orthodontics and Dentofacial Orthopedics 132: 550–560.

Mosier CI. 1951. Symposium: The need and means of cross-validation. I. Problems and designs of cross-validation. Educational and Psychological Measurement 11: 5–11.

Mueller U, Mazur A. 2001. Evidence of unconstrained directional selection for male tallness. Behavioral Ecology and Sociobiology 50: 302–311.

Mueller WH, Wohlleb JC. 1981. Anatomical distribution of subcutaneous fat and its description by multivariate methods: how valid are principal components? American Journal of Physical Anthropology 54: 25–35.

Muhe L, Lulseged S, Mason KE, Simoes EAF. 1997. Case-control study of the role of nutritional rickets in the risk of developing pneumonia in Ethiopian children. Lancet 349: 1801–1804.

Mummert A, Esche E, Robinson J, Armelagos GJ. 2011. Stature and robusticity during the agricultural transition: evidence from the bioarchaeological record. Economics and Human Biology 9: 284–301.

Murail P, Bruzek J, Houĕt H, Cunha E. 2005. DSP: a tool for probabilistic sex diagnosis using worldwide variability in hip-bone measurements. Bulletins et Mémoires de la Société d’Anthropologie de Paris 17: 167–176.

Murdock GP, White DR. 1969. Standard cross-cultural sample. Ethnology 8: 329–369.

Murphy AM. 2005. The femoral head: sex assessment of prehistoric New Zealand Polynesian skeletal remains. Forensic Science International 154: 210–213.

Murray MA. 1956. Burial customs and beliefs in the hereafter in Predynastic Egypt. The Journal of Egyptian Archaeology 42: 86–96.

Murray CM, Eberly LE, Pusey AE. 2006. Foraging strategies as a function of season and rank among wild female chimpanzees (Pan troglodytes). Behavioral Ecology 17: 1020–1028.

Musil CM, Warner CB, Yobas PK, Jones SL. 2002. A comparison of imputation techniques for handling missing data. Western Journal of Nursing Research 24: 815–829.

Musilová E, Fernandes V, Silva NM, Soares P, Alshamali F, Harich N, Chernj L, El Gaaied ABA, Al-Meeri A, Pereira L, Černy V. 2011. Population history of the Red Sea – genetic exchanges between the Arabian Peninsula and East Africa signalled in the mitochondrial DNA HV1 haplogroup. American Journal of Physical Anthropology 145: 592–598.

Myśliwiec, K, Kuraszkiewicz, K, Czerwik, D, Rzeuska, T, Kaczmarek, M, Kowalska, A, Radomska, M, Godziejewski, Z. 2004. Saqqara I. Tomb of Merefnebef. Centre of the Mediterranean Archaeology of the Polish Academy of Sciences: Warsaw.

Myśliwiec, K (ed.) 2008. Saqqara III. The Upper Necropolis. Centre of the Mediterranean Archaeology of the Polish Academy of Sciences: Warsaw.

Myśliwiec, K, Kuraszkiewicz, K, Kowalska, A, Radomska, M, Rzeuska, TI, Kaczmarek, M, Kozieradzka, I, Godziejewski, Z. Ikram, S, Zatorska, A. 2010. Saqqara IV. The Funerary

407

Complex of Nyankhnefertem. Centre of the Mediterranean Archaeology of the Polish Academy of Sciences: Warsaw.

Nagy A. 2008. An osteological analysis of ten human crania from Costa Rica. Annals of Carnegie Museum 76: 265–278.

Nakagawa S. 2004. A farewell to Bonferroni: the problems of low statistical power and publication bias. Behavioral Ecology 15: 1044–1045.

Nerlich AG, Haas CJ, Zink A, Szeimies U, Hagedorn HG. 1997. Molecular evidence for tuberculosis in an ancient Egyptian mummy. Lancet 350: 1404.

Nerlich AG, Schraut B, Dittrich S, Jelinek T, Zink AR. 2008. Plasmodium falciparum in ancient Egypt. Emerging Infectious Diseases 14: 1317–1318.

Nettle D. 2002. Women’s height, reproductive success and the evolution of sexual dimorphism in modern humans. Proceedings of the Royal Society Biological Sciences 269: 1919–1923.

Neves WA, Hubbe M, Correal G. 2007. Human skeletal remains from Sabana de Bogotá, Colombia: a case of paleoamerican morphology late survival in South America? American Journal of Physical Anthropology 133: 1080–1098.

Neves WA, Pucciarelli HM. 1991. Morphological affinities of the first Americans: an exploratory analysis based on early South American human remains. Journal of Human Evolution 21: 261–273.

Newman RW, Munro EH. 1955. The relation of climate and body size in U.S. males. American Journal of Physical Anthropology 13: 1–17.

Nickens PR. 1976. Stature reduction as an adaptive response to food production in Mesoamerica. Journal of Archaeological Science 3: 31–41.

Nielsen J. 2011. Does human sexual dimorphism influence fracture frequency, types and distribution? Anthropological Review 74: 13–23.

Nikita E, Siew YY, Stock J, Mattingly D, Mirazón Lahr M. 2011. Activity patterns in the Sahara Desert: an interpretation based on cross-sectional geometric properties. American Journal of Physical Anthropology 146: 423–434.

Nikita E, Mattingly D, Mirazón Lahr M. 2012a. Sahara: barrier or corridor? Nonmetric cranial traits and biological affinities of North African late Holocene populations. American Journal of Physical Anthropology 147: 280–292.

Nikita E, Mattingly D, Lahr MM. 2012b. Three-dimensional cranial shape analyses and gene flow in North Africa during the Middle to Late Holocene. Journal of Anthropological Archaeology 31: 564–572.

Nikitovic D, Bogin B. 2014. Ontogeny of sexual size dimorphism and environmental quality in Guatemalan children. American Journal of Human Biology 26: 117–123.

Noback ML, Harvati K, Spoor F. 2011. Climate-related variation of the human nasal cavity. American Journal of Physical Anthropology 145: 599–614.

Nowaczewska W, Dąbrowski P, Kużmiński L. 2011. Morphological adaptation to climate in modern Homo sapiens crania: the importance of basicranial breadth. Collegium Antropologicum 35: 625–636.

Oettlé AC, Pretorius E, Steyn M. 2009. Geometric morphometric analysis of the use of mandibular gonial eversion in sex determination. HOMO – Journal of Comparative Human Biology 60: 29–43.

408

Ogawa Y, Imaizumi K, Miyasaka S, Yoshino M. 2013. Discriminant functions for sex estimation of modern Japanese skulls. Journal of Forensic and Legal Medicine 20: 234–238.

O’Neil D. 2014. Human Biological Adaptability: an Introduction to Human Responses to Common Environmental Stresses. Avaialble online: http://anthro.palomar.edu/adapt/Default.htm. Accessed April 2014.

O’Rourke DH, Hayes MG, Carlyle SW. 2000. Ancient DNA studies in physical anthropology. Annual Review of Anthropology 29: 217–242.

Ortner DJ. 2011. What skeletons tell us. The story of human paleopathology. Virchows Archiv 459: 247–254.

Özer I, Katayama K. 2008. Sex determination using the femur in an ancient Japanese population. Collegium Antropologicum 32: 67–72.

Pääbo S. 1985a. Molecular cloning of ancient Egyptian mummy DNA. Nature 314: 644–645.

Pääbo S. 1985b. Preservation of DNA in ancient Egyptian mummies. Journal of Archaeological Science 12: 411–417.

Pahl WM. 1986. Tumors of bone and soft tissue in ancient Egypt and Nubia: a synopsis of the detected cases. International Journal of Anthropology 1: 267–275.

Palomino H, Mueller WH, Schull WJ. 1979. Altitude, heredity and body proportions in Northern Chile. American Journal of Physical Anthropology 50: 39–50.

Panter-Brick C. 2002. Sexual division of labor: energetic and evolutionary scenarios. American Journal of Human Biology 14: 627–640.

Papaioannou VA, Kranioti EF, Joveneaux P, Nathena D, Michalodimitrakis M. 2012. Sexual dimorphism of the scapula and the clavicle in a contemporary Greek population: applications in forensic identification. Forensic Science International 217: 231.e1–231.e7.

Parsons KJ, Cooper WJ, Albertson RC. 2009. Limits of principal components analysis for producing a common trait space: implications for inferring selection, contingency, and chance in evolution. PLoS One 4: e7957.

Patil KR, Mody RN. 2005. Determination of sex by discriminant function analysis and stature by regression analysis: a lateral cephalometric study. Forensic Science International 147: 175–180.

Patriquin ML, Loth SR, Steyn M. 2003. Sexually dimorphic pelvic morphology in South African whites and blacks. HOMO – Journal of Comparative Human Biology 53: 255–262.

Paul P, Pennell ML, Lemeshow S. 2013. Standardizing the power of the Hosmer–Lemeshow goodness of fit test in large data sets. Statistics in Medicine 32: 67–80.

Pawlowski B, Dunbart RIM, Lipowicz A. 2000. Tall men have more reproductive success. Nature 403: 156.

Pawson IG. 1977. Growth characteristics of populations of Tibetan origin in Nepal. American Journal of Physical Anthropology 47: 473–482.

Paynter R. 1989. The archaeology of equality and inequality. Annual Review of Anthropology 18: 369–399.

Pearson K. 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2: 559–572.

Pearson K, Davin AG. 1924. On the biometric constants of the human skull. Biometrika 16: 328–363.

409

Pearson K, Stoessiger BN. 1927. On further formulae for the reconstruction of cranial capacity from external measurements of the skull. Biometrika 19: 211–214.

Pearson K, Woo TL. 1935. Further investigation of the morphometric characters of the individual bones of the human skull. Biometrika 27: 424–465.

Pederson D, Gore C. 1996. Chapter 3: Anthropometry measurement error. In: Norton K, Old T (eds.) Anthropometrica: A Textbook of Body Measurement for Sports and Health Courses. University of New South Wales Press Ltd: Sydney.

Pelto GH, Pelto PJ. 1989. Small but healthy. Human Organization 48: 11–15.

Pensler JM, Radosevich JA, Higbee R, Langman CB. 1990. Osteoclasts isolated from membranous bone in children exhibit nuclear estrogen and progesterone receptors. Journal of Bone and Mineral Research 5: 797–802.

Pereira C, Pestana D, Costa Santos J, de Mendonça MC. 2010. Contribution of teeth in human forensic identification – discriminant function sexing odontometrical techniques in Portuguese population. Journal of Forensic and Legal Medicine 17: 105–110.

Perini TA, de Oliveira GL, dos Santos Ornellas J, de Oliveira FP. 2005. Technical error of measurement in anthropometry. Revista Brasileira de Medicina do Esporte 11: 86–90.

Perneger TV. 1998. What’s wrong with Bonferroni adjustments. BMJ 316: 1256–1238.

Petit MA, Beck TJ, Lin H-M, Bentley C, Legro RS, Lloyd T. 2004. Femoral bone structural geometry adapts to mechanical loading and is influenced by sex sterids: the Penn State Young Women’s Health Study. Bone 35: 750–759.

Petrie WMF. 1907. Gizeh and Rifeh. British School of Archaeology in Egypt and Egyptian Research Account: London.

Petrie WMF. 1939. The Making of Egypt. The Sheldon Press: London.

Pettenati-Soubayroux I, Signoli M, Dutour O. 2002. Sexual dimorphism in teeth: discriminatory effectiveness of permanent lower canine size observed in a XVIIIth century osteological series. Forensic Science International 126: 227–232.

Phenice TW. 1969. A newly developed visual method of sexing the os pubis. American Journal of Physical Anthropology 30: 297–302.

Pietrusewsky M. 2008. Metric analysis of skeletal remains: methods and applications. In: Katzenberg MA, Saunders SR (eds.) Biological Anthropology of the Human Skeleton. 2nd edition. Wiley-Liss Inc: New Jersey.

Pilli E, Modi A, Serpico C, Achilli A, Lancioni H, Lippi B, Bertoldi F, Gelichi S, Lari M, Caramelli D. 2013. Monitoring DNA contamination in handled vs. directly excavated ancient human skeletal remains. PLOS One 8: e52524.

Pinhasi R, Stefanović S, Papathanasiou A, Stock JT. 2011. Variability in long bone growth patterns and limb proportions within and amongst Mesolithic and Neolithic populations from southeast Europe. In: Pinhasi R, Stock JT (eds). Human Bioarchaeology of the Transition to Agriculture. Wiley-Blackwell: Oxford.

Plavcan MJ. 2001. Sexual dimorphism in primate evolution. Yearbook of Physical Anthropology 44: 25–53.

Plavcan MJ. 2011. Understanding dimorphism as a function of changes in male and female traits. Evolutionary Anthropology 20: 143–156.

Plavcan MJ. 2012a. Sexual size dimorphism, canine dimorphism and male-male competition in primates. Where do humans fit in? Human Nature 23: 45–67.

410

Plavcan MJ. 2012b. Body size, size variation, and sexual size dimorphism in early Homo. Current Anthropology 53: S409–S423.

Plavcan MJ, van Schaik CP. 1992. Intrasexual competition and canine dimorphism in anthropoid primates. American Journal of Physical Anthropology 87: 461–477.

Podzorski P. 1990. Their Bones Shall Not Perish: An Examination of Predynastic Human Skeletal Remains from Naga-ed-Dêr in Egypt. SIA Publishing: Surrey, Kent.

Pohar M, Blas M, Turk S. 2004. Comparison of logistic regression and linear discriminant analysis: a simulation study. Metodološki Zvezki 1: 143–161.

Poinar HN, Höss M, Bada JL, Pääbo S. 1996. Amino acid racemisation and the preservation of ancient DNA. Science 272: 864–866.

Pomeroy E. 2013. Biomechanical insights into activity and long distance trade in the south- central Andes (AD 500–1450). Journal of Archaeological Science 40: 3129–3140.

Pomeroy E, Zakrzewski SR. 2009. Sexual dimoprhism in diaphyseal cross-sectional shape in the medieval Muslim population of Écija, Spain, and Anglo-Saxon Great Chesterford, UK. International Journal of Osteoarchaeology 19: 50–65.

Pomeroy E, Wells JCK, Stanojevic S, Miranda JJ, Cole TJ, Stock JT. 2014. Birth month associations with height, head circumference, and limb lengths among Peruvian children. American Journal of Physical Anthropology 154: 115–124.

Porter B, Moss RLB. 1927. Topographical Bibliography of Ancient Egyptian Hieroglyphic Texts, Reliefs, and Paintings. I. The Theban Necropolis. Clarendon Press: Oxford.

Prabhu S, Acharya AB. 2009. Odontometric sex assessment in Indians. Forensic Science International 192: 129.e1–129.e5.

Prentice A, Schoenmakers I, Laskey MA, de Bono S, Ginty F, Goldberg GR. 2006. Nutrition and bone growth and development. Proceedings of the Nutrition Society 65: 348–360.

Press J, Wilson S. 1978. Choosing between logistic regression and discriminant analysis. Journal of the American Statistical Association 73: 699–705.

Prince DA, Konigsberg LW. 2008. New formulae for estimating age-at-death in the Balkans utilizing Lamendin’s dental technique and Bayesian analysis. Journal of Forensic Sciences 53: 578–587.

Pryor JW. 1923. Differences in the time of development of centers of ossification in the male and female skeleton. The Anatomical Record 25: 257–273.

Purkait R. 2001. Measurements of the ulna – a new method for determination of sex. Journal of Forensic Sciences 46: 924–927.

Purkait R. 2003. Sex determination from femoral head measurements: a new approach. Legal Medicine 5: S347–S350.

Puts DA. 2010. Beauty and the beast: mechanisms of sexual selection in humans. Evolution and Human Behavior 31: 157–175.

Quinn GP, Keough MJ. 2002. Experimental Design and Data Analysis for Biologists. Cambridge University Press: Cambridge, UK.

Raff JA, Bolnick DA, Tackney J, O‘Rourke DH. 2011. Ancient DNA perspectives on American colonization and population history. American Journal of Physical Anthropology 146: 503–514.

411

Ramsthaler F, Kreutz K, Verhoff MA. 2007. Accuracy of metric sex analysis of skeletal remains using Fordisc® based on a recent skull collection. International Journal of Legal Medicine 121: 477–482.

Ramsthaler F, Kettner M, Gehl A, Verhoff MA. 2010. Digital forensic osteology: morphological sexing of skeletal remains using volume-rendered cranial CT scans. Forensic Science International 195: 148–152.

Rausch R. 2013. Nutrition and academic performance in school-age children: the relation to obesity and food insufficiency. Journal of Nutrition and Food Science 3: 190.

Raxter MH. 2007. Metric sex estimation in an ancient Egyptian skeletal sample. SAS Bulletin: Newsletter of the Society for Archaeological Sciences 30(4): 9–12.

Raxter MH. 2011. Egyptian Body Size: A Regional and Worldwide Perspective. University of South Florida Graduate Theses and Dissertations. Available online: http://scholarcommons.usf.edu/etd/3305/. Accessed May 2014.

Raxter MH, Ruff CB, Azab A, Erfan M, Soliman M, El-Sawaf A. 2008. Stature estimation in ancient Egyptians: a new technique based on anatomical reconstruction of stature. American Journal of Physical Anthropology 136: 147–155.

Razali NM, Wah YB. 2011. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modelling and Analytics 2: 21–33.

Redford DB (ed.) 2001. The Oxford Encyclopedia of ancient Egypt. Volume 3. Oxford University Press Inc: Oxford.

Reeves N. 2000. Ancient Egypt: The Great Discoveries: A Year-by-Year Chronicle. Thames & Hudson: London.

Reeves N, Wilkinson RH. 1996. The Complete Valley of the Kings. Tombs and Treasures of Egypt’s Greatest Pharaohs. Thames & Hudson: London.

Reisner GA. 1901. Work of the University of California at El-Ahaiwah and Naga-ed-Dêr. In: Griffith F (ed.) Egypt Exploration Fund Achaeological Report 1900–1901. The Egypt Exploration Fund: London.

Reisner, GA. 1905. The work of the Hearst Egyptian Expedition of the University of California in 1903–4. Records of the Past IV(V): 131–141.

Reisner GA. 1908. The Early Dynastic Cemeteries of Naga-ed-Dêr. Part I. J.C. Hinrichs: Leipzig.

Reisner, GA. 1911. The Harvard University – Museum of Fine Arts Egyptian Expedition. Museum of Fine Arts Bulletin IX(50): 13–20.

Reisner, GA. 1915. Accessions to the Egyptian Department during 1914. Museum of Fine Arts Bulletin XIII(76): 13–20.

Reisner GA. 1930. Chapter XIV: Egyptology. In: Morison SE (ed). The Development of Harvard University since the Inauguration of President Eliot 1869–1929. Harvard University Press: Cambridge, USA.

Reisner GA. 1942. The History of the Giza Necropolis, Volume 1. Harvard University Press: Cambridge, USA.

Reisner, GA. 1955. A History of the Giza Necropolis. Volume II. The Tomb of Hetep-Heres the Mother of Cheops. Harvard University Press: Cambridge, MA.

Relethford JH, Hodges DC. 1985. A statistical test for differences in sexual dimorphism between populations. American Journal of Physical Anthropology 66: 55–61.

412

Renfree MB, Short RV, Lyon MF, Mittwoch U, Grocock A, Ferguson MWJ. 1988. Sex determination in marsupials: evidence for a marsupial-eutherian dichotomy. Philosophical Transactions of the Royal Society Biological Sciences 322: 41–53.

Rhoads JG, Trinkaus E. 1977. Morphometrics of the Neandertal talus. American Journal of Physical Anthropology 46: 29–44.

Rίos Frutos L. 2003. Brief communication: sex determination accuracy of the minimum supero- inferior femoral neck diameter in a contemporary rural Guatemala population. American Journal of Physical Anthropology 122: 123–126.

Rίos Frutos L. 2005. Metric determination of sex from the humerus in a Guatemalan forensic sample. Forensic Science International 147: 153–157.

Roberts DF. 1953. Body weight, race and climate. American Journal of Physical Anthropology 11: 533–558.

Robins G. 1994. Some principles of compositional dominance and gender hierarchy in Egyptian art. Journal of the American Research Centre in Cairo 31: 33–40.

Robins G, Shute CCD. 1983. The physical proportions and living stature of New Kingdom Pharaohs. Journal of Human Evolution 12: 455–465.

Robins G, Shute CCD. 1986. Predynastic Egyptian stature and physical proportions. Human Evolution 1: 313–324.

Robinson C, Eisma R, Morgan B, Jeffery A, Graham EAM, Black S, Rutty GN. 2008. Anthropological measurement of lower limb and foot bones using multi-detector computed tomography. Journal of Forensic Sciences 53: 1289–1295.

Robling AG, Ubelaker DH. 1997. Sex estimation from the metatarsals. Journal of Forensic Sciences 42: 1062–1069.

Rodríguez L, Cervantes E, Ortiz R. 2011. Malnutrition and gastrointestinal and respiratory infections in children: a public health problem. International Journal of Environmental Research and Public Health 8: 1174–1205.

Rogers TL. 1999. A visual method of determining the sex of skeletal remains using the distal humerus. Journal of Forensic Sciences 44: 57–60.

Rogers TL. 2005. Determining the sex of human remains through cranial morphology. Journal of Forensic Sciences 50: 493–500.

Rogers TL, Saunders S. 1994. Accuracy of sex determination using morphological traits of the human pelvis. Journal of Forensic Sciences 39: 1047–1056.

Rogol AD, Roemmich JN, Clark PA. 2002. Growth at puberty. Journal of Adolescent Health 31: 192–200.

Rose JC. 2006. Paleopathology of the commoners at Tell Amarna, Egypt, Akhenaten’s capital city. Memórias do Instituto Oswaldo Cruz 101 (suppl. 2): 73–76.

Rösing FW, Graw M, Marré B, Ritz-Timme S, Rothschild MA, Rötzscher K, Schmeling A, Schröder I, Geserick G. 2007. Recommendations for the forensic diagnosis of sex and age from skeletons. HOMO – Journal of Comparative Human Biology 58: 75–89.

Rosvall KA. 2011. Intrasexual competition in females: evidence for sexual selection? Behavioral Ecology 22: 1131–1140.

Rothhammer F, Silva C. 1990. Craniometrical variation among South American prehistoric populations: climatic, altitudinal, chronological, and geographic contributions. American Journal of Physical Anthropology 82: 9–17.

413

Rothhammer F, Spielman RS. 1972. Anthropometric variation in the Aymará: genetic, geographic, and topographic contributions. American Journal of Human Genetics 24: 371–380.

Rothschild BM, Matin LD, Lev G, Bercovier H, Bar-Gal GK, Greenblatt C, Donoghue H, Spigelman M, Brittain D. 2001. Mycobacterium tuberculosis complex DNA from an extinct bison dated 17,000 years before the present. Clinical Infectious Diseases 33: 305–311.

Rowland DJ, Luis JR, Terreros MC, Herrera RJ. 2007. Mitochondrial DNA geneflow indicates preferred usage of the Levant corridor over the Horn of Africa passageway. Journal of Human Genetics 52: 436–447.

Rubin C, McLeod K, Basin S. 1990. Functional strains and cortical bone adaptation: epigenetic assurance of skeletal integrity. Journal of Biomechanics 23: 43–54.

Ruff C. 1987. Sexual dimorphism in human lower limb bone structure: relationship to subsistence strategy and sexual division of labor. Journal of Human Evolution 16: 391– 416.

Ruff CB. 1991. Climate and body shape in hominid evolution. Journal of Human Evolution 21: 81–105.

Ruff CB. 1993. Climatic adaptation and hominid evolution: the thermoregulatory imperative. Evolutionary Anthropology 2: 53–60.

Ruff CB. 1994. Morphological adaptation to climate in modern and fossil hominids. Yearbook of Physical Anthropology 37: 65–107.

Ruff C. 2002. Variation in human body size and shape. Annual Review of Anthropology 31: 211–232.

Ruff CB, Scott WW, Liu AY-C. 1991. Articular and diaphyseal remodeling of the proximal femur with changes in body mass in adults. American Journal of Physical Anthropology 86: 397–413.

Ruff CB, Trinkaus E, Walker A, Larsen CS. 1993. Postcranial robusticity in Homo. I: temporal trends and mechanical interpretation. American Journal of Physical Anthropology 91: 21–53.

Ruff CB, Trinkaus E, Holliday TW. 1997. Body mass and encephalization in Pleistocene Homo. Nature 387: 173–176.

Ruff CB, Holt B, Trinkaus E. 2006. Who’s afraid of the big bad Wolff?: “Wolff’s law” and bone functional adaptation. American Journal of Physical Anthropology 129: 484–498.

Ruff CB, Garofalo E, Holmes MA. 2013. Interpreting skeletal growth in the past from a functional and physiological perspective. American Journal of Physical Anthropology 150: 29–37.

Ruffer MA. 1911. On arterial lesions found in Egyptian mummies (1580 BC – 525 AD). The Journal of Pathology and Bacteriology 15: 453–462.

Ruffer MA. 1913. On pathological lesions found in Coptic bodies (AD 400–500). The Journal of Pathology and Bacteriology 18: 149–162.

Ruffer MA. 1919. Arthritis deformans and spondylitis in ancient Egypt. The Journal of Pathology and Bacteriology 22: 152–196.

Ruffer MA. 1920. Study of abnormalities and pathology of ancient Egyptian teeth. American Journal of Physical Anthropology 3: 335–382.

Ruffer MA, Willmore JG. 1913. Note on a tumour of the pelvis dating from Roman times (250 AD), and found in Egypt. The Journal of Pathology and Bacteriology 18: 480–484.

414

Sadler TW. 2004. Langman’s Medical Embryology. 9th edtn. Lippincott Williams & Wilkins: Philadelphia, PA.

Saggese G, Bertelloni S, Baroncelli GI. 1997. Sex steroids and the acquisition of bone mass. Hormone Research 48(suppl. 5): 65–71.

Saini V, Srivastava R, Shamal SN, Singh TB, Pandey AK, Tripathi SK. 2011. Sex determination using mandibular ramus flexure: a preliminary study on Indian population. Journal of Forensic and Legal Medicine 18: 208–212.

Salem AH, Bahri R, Jarjanazi H, Chaabani H. 2014. Geographical and social influences on genetic diversity within the Egyptian population: analyses of Alu insertion polymorphisms. Annals of Human Biology 41: 61–66.

Salkind NJ. 2010. Encyclopedia of Research Design. Sage Publications Inc: Thousand Oaks, CA.

Satyanarayana K, Naidu AN, Chatterjee B, Rao BS. 1977. Body size and work output. American Jounal of Clinical Nutrition 30: 322–325.

Saunders SR. 2000. Subadult skeletons and growth-related studies. In: Katzenberg MA, Saunders SR (eds.) Biological Anthropology of the Human Skeleton. Wiley-Liss: New York.

Saunders SR, Hoppa RD. 1997. Sex allocation from long bone measurements using logistic regression. Canadian Society of Forensic Science Journal 30: 49–60.

Savage SH. 1997. Descent group competition and economic strategies in Predynastic Egypt. Journal of Anthropological Archaeology 16: 226–268.

Savage SH. 1998. AMS radiocarbon dates from the Predynastic Egyptian Cemetery, N7000, at Naga-ed-Dêr. Journal of Archaeological Science 25: 235–249.

Savage SH. 2001. Some recent rends in the archaeology of Predynastic Egypt. Journal of Arhcaeological Research 9: 101–155.

Savory A. 1994. Will we be able to sustain civilization? Population and Environment: A Journal of Interdisciplinary Studies 16: 139–147.

Schäffer MR, Tantry U, Ahrendt GM, Wasserkrug HL, Barbul A. 1997. Acute protein-calorie malnutrition impairs wound healing: a possible role of decreased wound nitric oxide synthesis. Journal of the American College of Surgeons 184: 37–43.

Scheuer JL, Elkington NM. 1993. Sex determination from metacarpals and the first proximal phalanx. Journal of Forensic Sciences 38: 769–778.

Schillaci MA, Irish JD, Wood CCE. 2009. Further analysis of the population history of ancient Egyptians. American Journal of Physical Anthropology 139: 235–243.

Schlecht SH. 2012. Understanding entheses: bridging the gap between clinical and anthropological perspectives. The Anatomical Record 295: 1239–1251.

Scholtz Y, Steyn M, Pretorius E. 2010. A geometric morphometric study into the sexual dimorphism of the human scapula. HOMO – Journal of Comparative Human Biology 61: 253–270.

Schrader SA. 2012. Activity patterns in New Kingdom Nubia: an examination of entheseal remodelling and osteoarthritis at Tombos. American Journal of Physical Anthropology 149: 60–70.

Schreider E. 1975. Morphological variations and climatic differences. Journal of Human Evolution 4: 529–539.

415

Schulter-Ellis FP, Schmidt DJ, Hayek L-A, Craig J. 1983. Determination of sex with a discriminant analysis of new pelvic bone measurements: part I. Journal of Forensic Sciences 28: 169–179.

Schulter-Ellis FP, Schmidt DJ, Hayek L-A, Craig J. 1985. Determination of sex with a discriminant analysis of new pelvic bone measurements: part II. Journal of Forensic Sciences 30: 178–185.

Schuster I. 1983. Women’s aggression: an African case study. Aggressive Behavior 9: 319–331.

Schutkowski H. 1993. Sex determination of infant and juvenile skeletons: I. Morphognostic features. American Journal of Physical Anthropology 90: 199–205.

Schutte JE, Lilljeqvist RE, Johnson RL Jr. 1983. Growth of lowland native children of European ancestry during sojourn at high altitude (3,200 m). American Journal of Physical Anthropology 61: 221–226.

Schwartz GT, Dean MC. 2005. Sexual dimorphism in modern human permanent teeth. American Journal of Physical Anthropology 128: 312–317.

Schwartz JH. 1995. Skeleton Keys: An Introduction to Human Skeletal Morphology, Development, and Analysis. Oxford University Press: Oxford.

Seidemann RM, Stojanowski CM, Doran GH. 1998. The use of the supero-inferior femoral neck diameter as a sex assessor. American Journal of Physical Anthropology 107: 305–313.

Selander RK. 1966. Sexual dimorphism and differential niche utilization in birds. The Condor 68: 113–151.

Šešelj M. 2013. Relationship between dental development and skeletal growth in modern humans and its implications for interpreting ontogeny in fossil hominins. American Journal of Physical Anthropology 150: 38–47.

Shackelford LL. 2007. Regional variation in the postcranial robusticity of later Upper Paleolithic humans. American Journal of Physical Anthropology 133: 655–668.

Shahin AA, Alhoseiny S, Aldali M. 2014. Hyperostosis frontalis interna: an Egyptian case referred to the second dynasty (2890–2650 BC) from Tarkhan-Egypt. The Egyptian Rheumatologist 36: 41–45.

Shaw CN, Stock JT. 2009. Intensity, repetitiveness, and directionality of habitual adolescent mobility patterns influence the tibial diaphysis morphology of athletes. American Journal of Physical Anthropology 140: 149–159.

Shaw CN, Stock JT. 2011. The influence of body proportions on femoral and tibial midshaft shape in hunter-gatherers. American Journal of Physical Anthropology 144: 22–29.

Shaw CN, Stock JT. 2013. Extreme mobility in the Late Pleistocene? Comparing limb biomechanics among fossil Homo, varsity athletes and Holocene foragers. Journal of Human Evolution 64: 242–249.

Shaw I. 2000. The Oxford History of Ancient Egypt. Oxford University Press: Oxford.

Shea BT. 1983. Allometry and heterochrony in African apes. American Journal of Physical Anthropology 62: 275–290.

Shea BT. 1986. Ontogenetic approaches to sexual dimorphism in anthropoids. Human Evolution 1: 97–110.

Shea BT, Bailey RC. 1996. Allometry and adaptation of body proportions and stature in African pygmies. American Journal of Physical Anthropology 100: 311–340.

416

Shields BM, Knight BA, Powell RJ, Hattersley AT, Wright DE. 2006. Assessing newborn body composition using principal components analysis: differences in the determinants of fat and skeletal size. BMC Pediatrics 6: 24.

Shine R. 1989. Ecological causes for the evolution of sexual dimorphism: a review of the evidence. The Quarterly Review of Biology 64: 419–461.

Shveta S, Patnaik VVG, Kaushal S, Sharma D. 2010. Evaluation of zygomatic bone in sexual dimorphism – a study in 60 adult human skulls. International Journal of Medical Toxicology & Legal Medicine 13: 1–8.

Singh J, Pathak RK. 2013. Morphometric sexual dimorphism of human sternum in a north Indian autopsy sample: sexing efficacy of different statistical techniques and a comparison with other sexing methods. Forensic Science International 228: 174.e1–174.e10.

Sisk CL, Foster DL. 2004. The neural basis of puberty and adolescence. Nature Neuroscience 7: 1040–1047.

Skoglund P, Storå J, Götherström A, Jakobsson M. 2013. Accurate sex identification of ancient human remains using DNA shotgun sequencing. Journal of Archaeological Science 40: 4477–4482.

Sládek V, Berner M, Sailer R. 2006. Mobility in central European late Eneolithic and early Bronze Age: femoral cross-sectional geometry. American Journal of Physical Anthropology 130: 320–332.

Slatkin M. 1984. Ecological causes of sexual dimorphism. Evolution 38: 622–630.

Šlaus M, Tomičić Ž. 2005. Discriminant function sexing of fragmentary and complete tibiae from Medieval Croatian sites. Forensic Science International 147: 147–152.

Šlaus M, Bedić Ž, Strinović D, Petrovečki V. 2013. Sex determination by discriminant function analysis of the tibia for contemporary Croats. Forensic Science International 226: 302.e1– 302.e4.

Smith GE. 1907. The alleged discovery of syphilis in prehistoric Egyptians. Lancet 170: 1788– 1789.

Smith HD. 1912. A study of pygmy crania, based on skulls found in Egypt. Biometrika 8(3 & 4): 262–266.

Smith SL. 2004. Skeletal age, dental age, and the maturation of KNM-WT 15000. American Journal of Physical Anthropology 125: 105–120.

Smith GE, Jones FW. 1910. The Archaeological Survey of Nubia, Report for 1907–1908. Volume II. Report on the Human Remains. Government Press: Cairo.

Smith P, Bar-Yosef O, Sillen A. 1984. Archaeological and skeletal evidence for dietary changes during the Late Pleistocene/Early Holocene in the Levant. In: Cohen MN, Armelagos GJ (eds.) Paleopathology at the Origins of Agriculture. Academic Press Inc: Orlando, FL.

Smith EP, Boyd J, Frank GR, Takahashi H, Cohen RM, Specker B, Williams TC, Lubahn DB, Korach KS. 1994. Estrogen resistance caused by a mutation in the estrogen-receptor gene in a man. New England Journal of Medicine 331: 1056–1061.

Smith HF, Terhune CE, Lockwood CA. 2007. Genetic, geographic, and environmental correlates of human temporal bone variation. American Journal of Physical Anthropology 134: 312–322.

Smythe PM, Brereton-Stiles GG, Grace HJ, Mafoyane A, Schonland M, Coovadia AA, Loening WEK, Parent MA, Vos GH. 1971. Thymolymphatic deficiency and depression of cell- mediated immunity in protein-calorie malnutrition. Lancet 298(7731): 939–944.

417

Snape S. 2011. Chapter 16: Sennedjem. In: Ancient Egyptian Tombs. The Culture of Life and Death. Wiley-Blackwell: West Sussex.

Snow CC, Hartman S, Giles E, Young FA. 1979. Sex and race determinations of crania by calipers and computer: a test of the Giles and Elliot discriminant functions in 52 forensic science cases. Journal of Forensic Sciences 24: 448–460.

Sognnaes RF. 1956. Histological evidence of developmental lesions in teeth originating from paleolithic, prehistoric, and ancient man. American Journal of Pathology 32: 547–577.

Soni G, Dhall U, Chhabra S. 2010. Determination of sex from femur: discriminant analysis. Journal of Anatomical Society of India 59: 216–221.

Sonika V, Harshaminder K, Madhushankari GS, Sri Kennath JAA. 2011. Sexual dimorphism in the permanent molar: a study of the Haryana population (India). Journal of Forensic Odonto-Stomatology 29: 37–43.

Sparacello V, Marchi D. 2008. Mobility and subsistence economy: a diachronic comparison between two groups settled in the same geographical area (Liguria, Italy). American Journal of Physical Anthropology 136: 485–495.

Sparacello VS, Pearson OM, Coppa A, Marchi D. 2011. Changes in skeletal robusticity in an Iron Age agropastoral group: the Samnites from the Alfedena Necropolis (Abruzzo, central Italy). American Journal of Physical Anthropology 144: 119–130.

Spatz C. 2001. Basic Statistics: Tales of Distributions. Wadsworth/Thomson Learning: Belmont, CA.

Spradley MK, Jantz RL. 2011. Sex estimation in forensic anthropology: skull versus postcranial elements. Journal of Forensic Sciences 56: 289–296.

Spicer J. 2005. Making Sense of Multivariate Analysis. Sage Publications Inc: Thousand Oaks, CA.

Starling AP, Stock JT. 2007. Dental indicators of health and stress in early Egyptian and Nubian agriculturalists: a difficult transition and gradual recovery. American Journal of Physical Anthropology 134: 520–528.

Stata.com. 2015. PCA postestimation. Available online: http://www.stata.com/manuals13/mvpcapostestimation.pdf. Accessed February 2014.

Steckel RH. 1995. Stature and the standard of living. Journal of Economic Literature 33: 1903– 1940.

Steegmann AT, Cerny FJ, Holliday TW. 2002. Neandertal cold adaptation: physiological and energetic factors. American Journal of Human Biology 14: 566–583.

Stein GJ. 1998. Heterogeneity, power, and political economy: some current research issues in the archaeology of Old World complex societies. Journal of Archaeological Research 6: 1–44.

Stevanovitch A, Gilles A, Bouzald E, Kefi R, Paris F, Gayraud RP, Spadoni JL, El-Chenawi F, Béraud-Colomb E. 2004. Mitochondrial DNA sequence diversity in a sedentary population from Egypt. Annals of Human Genetics 68: 23–39.

Stevenson A. 2009. Social relationships in predynastic burials. The Journal of Egyptian Archaeology 95: 175–192.

Stewart TD. 1979. Essentials of Forensic Anthropology. Charles C Thomas: Springfield, Illinois.

Steyn M, İşcan MY. 1997. Sex determination from the femur and tibia in South African whites. Forensic Science International 90: 111–119.

418

Steyn M, İşcan MY. 1999. Osteometric variation in the humerus: sexual dimorphism in South Africans. Forensic Science International 106: 77–85.

Steyn M, İşcan MY. 2008. Metric sex determination from the pelvis in modern Greeks. Forensic Science International 179: 86.e1–86.e6.

Steyn M, Pretorius E, Hutten L. 2004. Geometric morphometric analysis of the greater sciatic notch in South Africans. HOMO – Journal of Comparative Human Biology 54: 197–206.

Stini WA. 1969. Nutritional stress and growth: sex difference in adaptive response. American Journal of Physical Anthropology 31: 417–426.

Stini WA. 1971. Evolutionary implications of changing nutritional patterns in human populations. American Anthropologist 73: 1019–1030.

Stini WA. 1972. Reduced sexual dimorphism in upper arm muscle circumference associated with protein-deficient diet in a South American population. American Journal of Physical Anthropology 36: 341–352.

Stinson S. 1980. The physical growth of high altitude Bolivian Aymara children. American Journal of Physical Anthropology 52: 377–385.

Stinson S. 1982. The effect of high altitude on the growth of children of high socioeconomic status in Bolivia. American Journal of Physical Anthropology 59: 61–71.

Stinson S. 1985. Sex differences in environmental sensitivity during growth and development. Yearbook of Physical Anthropology 28: 123–147.

Stock JT. 2006. Hunter-gatherer postcranial robusticity relative to patterns of mobility, climatic adaptation, and selection for tissue economy. American Journal of Physical Anthropology 131: 194–204.

Stock JT, O’Neill MC, Ruff CB, Zabecki M, Shackelford L, Rose JC. 2011. Body size, skeletal biomechanics, mobility and habitual activity from the Late Palaeolithic to the mid-Dynastic Nile Valley. In: Pinhasi R, Stock JT (eds.) Human Bioarchaeology of the Transition to Agriculture. Wiley-Blackwell: Oxford.

Stone AC, Milner GR, Pääbo S, Stoneking M. 1996. Sex determination of ancient human skeletons using DNA. American Journal of Physical Anthropology 99: 231–238.

Strauss J. 1986. Does better nutrition raise farm productivity? Journal of Political Economy 94: 297–320.

Strouhal E. 1976. Tumors in the remains of ancient Egyptians. American Journal of Physical Anthropology 45: 613–620.

Strouhal E, Němečkova A. 2004. Paleopathological find of a sacral neurilemmoma from ancient Egypt. American Journal of Physical Anthropology 125: 320–328.

Suazo GIC, Zavando MDA, Smith RL. 2008. Accuracy of palate shape as sex indicator in human skull with maxillary teeth loss. International Journal of Morphology 26: 989–993.

Suazo GIC, Zavando MDA, Smith RL. 2009. Performance evaluation as a diagnostic test for traditional methods for forensic identification of sex. International Journal of Morphology 27: 381–386.

Sukhatme PV, Margen S. 1978. Models for protein deficiency. American Journal of Clinical Nutrition 31: 1237–1256.

Sukhatme PV, Margen S. 1982. Autoregulatory homeostatic nature of energy balance. American Journal of Clinical Nutrition 35: 355–365.

419

Sutherland LD, Suchey JM. 1991. Use of the ventral arc in pubic sex determination. Journal of Forensic Sciences 36: 501–511.

Swanson E. 2014. Validity, reliability, and the questionable role of psychometrics in plastic surgery. Plastic and Reconstructive Surgery Global Open 2(6): e161.

Sylvester AD, Organ JM. 2010. Curvature scaling in the medial tibial condyle of large bodied hominoids. The Anatomical Record 293: 671–679.

Tabachnick BG, Fidell LS. 1996. Using Multivariate Statistics. 3rd edition. Harper Collins College Publishers: New York.

Tainter JA. 1978. Mortuary practices and the study of prehistoric social systems. Advances in Archaeological Method and Theory 1: 105–141.

Tanner JM. 1976. Population differences in body size, shape, and growth rate. Archives of Disease in Childhood 51: 1–3.

Taylor AB. 1997. Relative growth, ontogeny, and sexual dimorphism in Gorilla (Gorilla gorilla gorilla and G. g. beringei): evolutionary and ecological considerations. American Journal of Primatology 43: 1–31.

Taylor MS. 2010. Adult stature and health among early foragers of the western Gulf coastal plains. Plains Anthropologist 55: 55–65.

Taylor AK, Cao W, Vora KP. De La Cruz J, Shieh W-J, Zaki SR, Katz JM, Sambhara S, Gangappa S. 2013. Protein energy malnutrition decreases immunity and increases susceptibility to influenza infection in mice. The Journal of Infectious Diseases 207: 501– 510.

Taylor Fitz-Gibbon C, Lyons Morris L. 1987. How to Analyze Data. Sage Publications Inc: California.

Temple DH, Auerbach BM, Nakatsukasa M, Sciulli PW, Larsen CS. 2008. Variation in limb proportions between Jomon foragers and Yayoi agriculturalists from prehistoric Japan. American Journal of Physical Anthropology 137: 164–174.

Tena-Sempere M. 2012. Deciphering puberty: novel partners, novel mechanisms. European Journal of Endocrinology 167: 733–747.

Terreros MC, Martinez L, Herrera RJ. 2005. Polymorphic Alu insertions and genetic diversity among African populations. Human Biology 77: 675–704.

Testart A, Forbis RG, Hayden B, Ingold T, Perlman SM. 1982. The significance of food storage among hunter-gatherers: residence patterns, population densities, and social inequalities. Current Anthropology 23: 523–537.

Thali MJ, Jackowski C, Oesterhelweg L, Ross SG, Dirnhofer R. 2007. VIRTOPSY – the Swiss virtual autopsy approach. Legal Medicine 9: 100–104.

Thapar R, Angadi PV, Hallikerimath S, Kale AD. 2012. Sex assessment using odontometry and cranial anthropometry: evaluation in an Indian sample. Forensic Science, Medicine, and Pathology 8: 94–100.

Thomas A. 2014. Bioarchaeology of the middle Neolithic: evidence for archery among early European farmers. American Journal of Physical Anthropology; ePub ahead of print.

Thompson DJ. 1988. Memphis under the Ptolemies. Princeton University Press: Princeton, New Jersey.

Thompson AH, Richards MP, Shortland A, Zakrzewski SR. 2005. Isotopic palaeodiet of ancient Egyptian fauna and humans. Journal of Archaeological Science 32: 451–463.

420

Thompson AH, Chaix L, Richards MP. 2008. Stable isotopes and diet at ancient Kerma, Upper Nubia (Sudan). Journal of Archaeological Science 35: 376–387.

Thompson RC, Allam AH, Lombardi GP, Wann LS, Sutherland ML, Sutherland JD, Soliman M, Frohlich D, Mininberg DT, Monge JM, Vallodolid CM, Cox SL, el-Maksoud G, Badr I, Miyamoto M, Nur el-Din AH, Narula J, Finch CE, Thomas GS. 2013. Atherosclerosis across 4000 years of human history: the Horus study of four ancient populations. Lancet 381: 1211–1222.

Thomson A. 1905. Composite photographs of early Egyptian skulls. Man 5: 65–67.

Tilkens MJ, Wall-Scheffler C, Weaver TD, Steudel-Numbers K. 2007. The effects of body proportions on thermoregulation: an experimental assessment of Allen’s rule. Journal of Human Evolution 53: 286–291.

Tise ML, Spradley MK, Anderson BE. 2013. Postcranial sex estimation of individuals considered Hispanic. Journal of Forensic Science 58: S9–S14.

Tobias PV. 1962. On the increasing stature of the Bushmen. Anthropos 57: 801–810.

Tobias JA, Montgomerie R, Lyon BE. 2012. The evolution of female ornaments and weaponry: social selection, sexual selection and ecological competition. Philosophical Transactions of the Royal Society 367: 2274–2293.

Toivari-Viitala J. 2011. Deir el-Medina. UCLA Encyclopedia of Egyptology. Available online: http://escholarship.org/uc/item/6kt9m29r. Accessed February 2014.

Torre C, Giacobini G, Sicuro A. 1980. The skull and vertebral column pathology of ancient Egyptians. A study of the Marro Collection. Journal of Human Evolution 9: 41–44.

TourEgypt. 2013. City of El Mansoura in Egypt. Available online: http://www.touregypt.net/elmansur.htm. Accessed June 2014.

Touzeau A, Amiot R, Blichert-Toft J, Flandrois J-P, Fourel F, Grossi V, Martineau F, Richardin P, Lécuyer C. 2014. Diet of ancient Egyptians inferred from stable isotope systematic. Journal of Archaeological Science 46: 114–134.

Trancho GJ, Robledo B, López-Bueis I, Sánchez JA. 1997. Sexual determination of the femur using discriminant functions. Analysis of a Spanish population of known sex and age. Journal of Forensic Sciences 42: 181–185.

Trigger BG, Kemp BJ, O’Connor D, Lloyd AB. 1983. Ancient Egypt: A Social History. Cambridge University Press: Cambridge.

Trinkaus E. 1975. Squatting among the Neandertals: a problem in the behavioural interpretation of skeletal morphology. Journal of Archaeological Science 2: 327–351.

Turner CH. 1998. Three rules for bone adaptation to mechanical stimuli. Bone 23: 399–407.

Turner RT, Wakley GK, Hannon KS. 1990. Differential effects of androgens on cortical bone histomorphometry in gonadectomized male and female rats. Journal of Orthopaedic Research 8: 612–617.

Ubelaker DH. 1984. Chapter 19: Prehistoric human biology of Ecuador: possible temporal trends and cultural correlations. In: Cohen MN, Armelagos GJ (eds.) Paleopathology at the Origins of Agriculture. Academic Press, Inc: Orlando, FL.

Ubelaker DH, Volk CG. 2002. A test of the Phenice method for the estimation of sex. Journal of Forensic Sciences 47: 19–24.

Ubelaker DH, Ross AH, Graver SM. 2002. Application of forensic discriminant functions to a Spanish cranial sample. Forensic Science Communications 4(3): 1–6.

421

Uhl NM, Rainwater CW, Konigsberg LW. 2013. Testing for size and allometric differences in fossil hominin body mass estimation. American Journal of Physical Anthropology 151: 215–229.

Ulijaszek SJ, Kerr DA. 1999. Anthropometric measurement error and the assessment of nutritional status. British Journal of Nutrition 82: 165–177.

Ulijaszek SJ, Lourie JA. 1994. Chapter 3: Intra- and inter-observer error in anthropometric measurement. In: Ulijaszek SJ, Mascie-Taylor CGN (eds.) Anthropometry: The Individual and the Population. Cambridge University Press: Cambridge.

Umemura Y, Ishiko T, Yamauchi T, Kurono M, Mashiko S. 1997. Five jumps per day increase bone mass and breaking force in rats. Journal of Bone and Mineral Research 12: 1480– 1485.

University of Cambridge. 2012. The Duckworth Laboratory. Available online: http://www.human- evol.cam.ac.uk/duckworth.html. Accessed January 2014.

Urbanová P, Hejna P, Zátopková L, Šafr M. 2013. What is the appropriate approach in sex determination of hyoid bones. Journal of Forensic and Legal Medicine 20: 996–1003.

Ursi WJS, Trotman C-A, McNamara Jr. JA, Behrents RG. 1993. Sexual dimorphism in normal craniofacial growth. The Angle Orthodontist 63: 47–56.

Utermohle CJ, Zegura SL. 1982. Intra- and interobserver error in craniometry: a cautionary tale. American Journal of Physical Anthropology 57: 303–310.

Vacca E, Di Vella G. 2012. Metric characterization of the human coxal bone on a recent Italian sample and multivariate discriminant analysis to determine sex. Forensic Science International 222: 410.e1–401.e9.

Vance VL, Steyn M. 2013. Geometric morphometric assessment of sexually dimorphic characteristics of the distal humerus. HOMO – Journal of Comparative Human Biology 64: 329–340.

Van Gerven DP. 1972. The contribution of size and shape variation to patterns of sexual dimorphism of the human femur. American Journal of Physical Anthropology 37: 49–60.

Van Gerven DP, Armelagos GJ. 1983. “Farewell to paleodemography?“ Rumors of its death have been greatly exaggerated. Journal of Human Evolution 12: 353–360.

Van Gerven DP, Armelagos GT, Rohr A. 1977. Continuity and change in cranial morphology of three Nubian archaeological populations. Man 12: 270–277.

Venken K, Callewaert F, Boonen S, Vanderschueren D. 2008. Sex hormones, their receptors and bone health. Osteoporosis International 19: 1717–1525.

Vercellotti G, Stout SD, Boano R, Sciulli PW. 2011. Intrapopulation variation in stature and body proportions: social status and sex differences in an Italian Medieval population (Trino Vercellese, VC). American Journal of Physical Anthropology 145: 203–214.

Verhoff MA, Ramsthaler F, Krähahn J, Deml U, Gille RJ, Grabherr S, Thali MJ, Kreutz K. 2008. Digital forensic osteology – possibilities in cooperation with the Virtopsy® project. Forensic Science International 174: 152–156.

Verweij KJH, Burri AV, Zietsch BP. 2012. Evidence for genetic variation in human mate preferences for sexually dimorphic physical traits. PLOS One 7(11): e49294.

Viciano J, López-Lázaro S, Alemán I. 2013. Sex estimation based on deciduous and permanent dentition in a contemporary Spanish population. American Journal of Physical Anthropology 152: 31–43.

422

Vico L, Vanacker J-M. 2010. Sex hormones and their receptors in bone homeostasis: insights from genetically modified mouse models. Osteoporosis International 21: 365–372.

Villemure I, Stokes IAF. 2009. Growth plate mechanics and mechanobiology. A survey of present understanding. Journal of Biomechanics 42: 1793–1803.

Villotte S, Chrichill SE, Dutour OJ, Henry-Gambier D. 2010. Subsistence activites and the sexual division of labor in the European Upper Paleolithic and Mesolithic: evidence from upper limb enthesopathies. Journal of Human Evolution 59: 35–43.

Viðarsdótir US, O’Higgins P, Stringer C. 2002. A geometric morphometric study of regional differences in the ontogeny of the modern human facial skeleton. Journal of Anatomy 201: 211–229.

Vodanović M, Demo Ž, Njemirovskij V, Keros J, Brkić H. 2007. Journal of Archaeological Science 34: 905–913.

Waber DP, Bryse CP, Girard JM, Zichlin M, Fitzmaurice GM, Galler JR. 2014. Impaired IQ and academic skills in adults who experienced moderate to severe infantile malnutrition: a 40- year study. Nutritional Neuroscience 17: 58–64.

Wade MJ, Shuster SM. 2004. Estimating the strength of sexual selection from Y-chromosome and mitochondrial DNA diversity. Evolution 58: 1613–1616.

Wain HM, Bruford EA, Lovering RC, Lush MJ, Wright MW, Povey S. 2002. Guidelines for human gene nomenclature. Genomics 79: 464–470.

Waitzman AA, Posnick JC, Armstrong DC, Pron GE. 1992. Craniofacial skeletal measurements based on computed tomography: part I. Accuracy and reproducibility. The Cleft Palate– Craniofacial Journal 29: 112–117.

Waldron I. 1976. Why do women live longer than men? Social Science & Medicine 10: 349–362.

Waldron I. 1983. Sex differences in human mortality: the role of genetic factors. Social Science & Medicine 17: 321–333.

Walker PL. 1995. Problems of preservation and sexism in seing: some lessons from historical collections fro palaeodemographers. In: Saunders SR, Herring A (eds.) Grave Reflections: Portraying the Past through Cemetery Studies. Canadian Scholars’ Press: Toronto.

Walker PL. 2006. Greater sciatic notch morphology: sex, age, and population differences. American Journal of Physical Anthropology 127: 385–391.

Walker PL. 2008. Sexing skulls using discriminant function analysis of visually assessed traits. American Journal of Physical Anthropology 136: 39–50.

Walker PL, Johnson JR, Lambert PM. 1988. Age and sex biases in the preservation of human skeletal remains. American Journal of Physical Anthropology 76: 183–188.

Wallace IJ, Demes B, Jungers WL, Alvero M, Su A. 2008. The bipedalism of the Dmanisi hominins: pigeon-toed early Homo? American Journal of Physical Anthropology 136: 375–378.

Walrath DE, Turner P, Bruzek J. 2004. Reliability test of the visual assessment of cranial traits for sex determination. American Journal of Physical Anthropology 125: 132–137.

Warren MA, Bedi KS. 1985. The effects of a lengthy period of undernutrition on the skeletal growth of rats. Journal of Anatomy 141: 53–64.

Waterlow JC. 1986. Metabolic adaptation to low intakes of energy and protein. Annual Reviews Nutrition 6: 495–526.

423

Weaver TD. 2009. The meaning of Neandertal skeletal morphology. Proceedings of the National Academy of Sciences USA 106: 16028–16033.

Weaver TD, Steudel-Numbers K. 2005. Does climate or mobility explain the differences in body proportions between Neandertals and their Upper Paleolithic successors? Evolutionary Anthropology 14: 218–223.

Weinberg SM, Scott NM, Neiswanger K, Marazita ML. 2005. Intraobserver error associated with measurements of the hand. American Journal of Human Biology 17: 368–371.

Weinstein KJ. 2005. Body proportions in ancient Andeans from high and low altitudes. American Journal of Physical Anthropology 128: 569–585.

Weinstein KJ. 2007. Thoracic skeletal morphology and high-altitude hypoxia in Andean prehistory. American Journal of Physical Anthropology 134: 36–49.

Weinstein KJ. 2008. Thoracic morphology in Near Eastern Neandertals and ealy modern humans compared with recent modern humans from high and low altitudes. Journal of Human Evolution 54: 287–295.

Weiss KM. 1972. On the systematic bias in skeletal sexing. American Journal of Physical Anthropology 37: 239–250.

Weiss E. 2007. Muscle markers revisited: activity pattern reconstruction with controls in a central Californian Amerind population. American Journal of Physical Anthropology 133: 931–940.

Weiss E. 2009. Sex differences in humeral bilateral asymmetry in two hunter-gatherer populations: California Amerinds and British Columbian Amerinds. American Journal of Physical Anthropology 140: 19–24.

Weiss E, Corona L, Schultz B. 2012. Sex differences in musculoskeletal stress markers: problems with activity pattern reconstruction. International Journal of Osteoarchaeology 22: 70–80.

Weitz CA, Garruto RM, Chin C-T, Liu J-C, Liu R0L, He X. 2000. Growth of Qinghai Tibetans living at three different high altitudes. American Journal of Physical Anthropology 111: 69–88.

Wells JCK. 2012. Sexual dimorphism in body composition across human populations: associations with climate and proxies for short- and long-term energy supply. American Journal of Human Biology 24: 411–419.

Wenke RJ. 1989. Egypt: origins of complex societies. Annual Review of Anthopology 18: 129–155.

Wenke RJ. 1991. The evolution of early Egyptian civilization: issues and evidence. Journal of World Prehistory 5: 279–329.

Wescott DJ. 2000. Sex variation in the second cervical vertebra. Journal of Forensic Sciences 45: 462–466.

Wescott DJ. 2006. Effect of mobility on femur midshaft external shape and robusticity. American Journal of Physical Anthropology 130: 201–213.

White CD, Longstaffe FJ, Law KR. 1999. Seasonal stability and variation in diet as reflected in human mummy tissues from the Kharga Oasis and the Nile Valley. Palaeogeography, Palaeoclimatology, Palaeoecology 147: 209–222.

White TD, Folkens PA. 2000. Human Osteology. 2nd edtn. Academic Press: San Diego.

White TD, Folkens PA. 2005. The Human Bone Manual. Academic Press: San Diego.

424

Wilczak CA. 1998. Consideration of sexual dimorphism, age, and asymmetry in quantitative measurements of muscle insertion sites. International Journal of Osteoarchaeology 8: 311–325.

Wiley AS. 1994. Neonatal size and infant mortality at high altitude in the Western Himalaya. American Journal of Physical Anthropology 94: 289–305.

Wilkinson TAH. 1999. Early Dynastic Egypt. Routledge: London.

Williams BA, Rogers TL. 2006. Evaluating the accuracy and precision of cranial morphological traits for sex determination. Journal of Forensic Sciences 51: 729–735.

Williams FL, Belcher RL, Armelagos GJ. 2005. Forensic misclassification of ancient Nubian crania: implications for assumptions about human variation. Current Anthropology 46: 340–346.

Wolánski W, Kasprzak E. 1976. Stature as a measure of the effects of environmental change. Current Anthropology 17: 548–552.

Wolfe LD, Gray JP. 1982. Subsistence practices and human sexual dimorphism of stature. Journal of Human Evolution 11: 575–580.

Wolff JO, Perterson JA. 1998. An offspring-defense hypothesis for territoriality in female mammals. Ethology Ecology & Evolution 10: 227–239.

Woo TL. 1931. On the asymmetry of the human skull. Biometrika 22: 324–352.

Woods KA, Camacho-Hübner C, Savage MO, Clark AJL. 1996. Intrauterine growth retardation and postnatal growth failure associated with deletion of the insulin-like growth factor I gene. New England Journal of Medicine 335: 1363–1367.

World Health Organization. 1995. Physical Status: The Use and Interpretation of Anthropometry. WHO Technical Report Series 854. Available online: http://www.who.int/childgrowth/publications/physical_status/en/. Accessed June 2013.

World Health Organization. 2003. WHO definition of health. Available online: http://www.who.int/about/definition/en/print.html. Accessed April 2014.

Wu X, Schepartz LA, Norton CJ. 2010. Morphological and morphometric analysis of variation in the Zhoukoudian Homo erectus brain endocasts. Quarternary International 211: 4–13.

Xu KP, Yadav BR, King WA, Betteridge KJ. 1992. Sex-related differences in developmental rates of bovine embryos produced and cultured in vitro. Molecular Reproduction and Development 31: 249–252.

Yadev BR, King WA, Betteridge KJ. 1993. Relationships between the completion of first cleavage and the chromosomal complement, sex, and developmental rates of bovine embryos generated in vitro. Molecular Reproduction and Development 36: 434–439.

Yang DY, Watt K. 2005. Contamination controls when preparing archaeological remains for ancient DNA analysis. Journal of Archaeological Science 32: 331–336.

Yoneda M, Tanaka A, Shibata Y, Morita M. 2002. Radiocarbon marine reservoir effect in human remains from the Kitakogane Site, Hokkaido, Japan. Journal of Archeological Science 29: 529–536.

Yoneda M, Doi N, Dodo Y, Ishida H. 2011. The regional variation of maritime adaptation in prehistoric Japan. American Journal of Physical Anthropology 144: 316 (abstract).

Yuwanati M, Karia A, Yuwanati M. 2012. Canine tooth dimorphism: an adjunct for establishing sex identity. Journal of Forensic Dental Sciences 4: 80–83.

425

Zabecki M, Dabbs G. 2010. Bioarchaeological Report for the 2009 Amarna South Tombs Cemetery Skeletal Analysis. Available online: http://www.amarnaproject.com/pages/recent_projects/excavation/south_tombs_cemetery/ Accessed February 2014.

Zabecki M, Dabbs G, Montgomery T. 2012. Report on the 2011 Skeletal Analysis of the South Tombs Cemetery Project. Available online: http://www.amarnaproject.com/pages/recent_projects/excavation/south_tombs_cemetery/ Accessed February 2014.

Żądzińska E, Kerasińska M, Jedrychowska-Dańska K, Warala C, Witas HW. 2008. Sex diagnosis of subadult specimens from medieval Polish archaeological sites: metric analysis of deciduous dentition. HOMO – Journal of Comparative Human Biology 59: 175–187.

Zaki ME, Hussien FH, Abd El-Shafy El Banna R. 2009. Osteoporosis among ancient Egyptians. International Journal of Osteoarchaeology 19: 78–89.

Zakrzewski SR. 2003. Variation in ancient Egyptian stature and body proportions. American Journal of Physical Anthropology 121: 219–229.

Zakrzewski SR. 2004. Intra-population and temporal variation in ancient Egyptian crania. American Journal of Physical Anthropology 123: 215 [abstract].

Zakrzewski SR. 2007. Population continuity or population change: formation of the ancient Egyptian state. American Journal of Physical Anthropology 132: 501–509.

Zhao D, McBride D, Nandi S, McQueen HA, McGrew MJ, Hocking PM, Lewis PD, Sang HM, Clinton M. 2010. Somatic sex identity is cell autonomous in the chicken. Nature 464: 237– 242.

Zheng WX, Cheng FB, Cheng KL, Tian Y, Lai Y, Zhang WS, Zheng yJ, Li YQ. 2012. Sex assessment using measurements of the first lumbar vertebra. Forensic Science International 219: 285.e1–285.e5.

Zink A, Rohrbach H, Szeimies U, Hagedorn HG, Haas CJ, Weyss C, Bachmeier B, Nerlich AG. 1999. Malignant tumors in an ancient Egyptian population. Anticancer Research 19: 4273–4277.

Zink A, Haas CJ, Reischl U, Szeimies U, Nerlich AG. 2001. Molecular analysis of skeletal tuberculosis in an ancient Egyptian population. Journal of Medical Microbiology 50: 355– 366.

Zink AR, Grabner W, Reischl U, Wolf H, Nerlich AG. 2003a. Molecular study on human tuberculosis in three geographically distinct and time delineated populations from ancient Egypt. Epidemiology & Infection 130: 239–249.

Zink AR, Sola C, Reischl U, Grabner W, Rastogi N, Wolf H, Nerlich AG. 2003b. Characterization of Mycobacterium tuberculosis complex DNAs from Egyptian mummies by spoligotyping. Journal of Clinical Microbiology 41: 359–367.

Zink AR, Spigelman M, Schraut B, Greenblatt CL, Nerlich AG, Donoghue HD. 2006. Leishmaniasis in ancient Egypt and Upper Nubia. Emerging Infectious Disease 12: 1616–1617.

Zorba E, Moraitis K, Manolis SK. 2011. Sexual dimorphism in permanent teeth of modern Greeks. Forensic Science International 210: 74–81.

Zorba E, Moraitis K, Eliopoulos C, Spiliopoulou C. 2012. Sex determination in modern Greeks using diagonal measurements of molar teeth. Forensic Science Internationl 217: 19–26.

426

Zorba E, Vanna V, Moraitis K. 2013. Sexual dimorphism of root length on a Greek population sample. HOMO – Journal of Comparative Human Biology; ePub ahead of print.

427

7 APPENDICES

7.1 Sex estimation

7.1.1 Phenice characteristics recording form

Skeletal collection: ______

Region: ______

Period: ______

Skeleton ID: ______Date: ______

Museum location: ______

Excavation date: ______Grave no.: ______

Trait  Sex Ventral arc Absent Male

Present Female

Subpubic concavity Straight/convex Male

Concave Female

Medial aspect of Wide/dull Male ischiopubic ramus Narrow/sharp Female

Overall sex assessment: ______

Notes: ______

428

7.1.2 Pelvic morphology recording form

Skeletal collection: ______

Region: ______

Period: ______

Skeleton ID: ______Date: ______

Museum location: ______

Excavation date: ______Grave no.: ______

Scoring system 0 = Indeterminate (insufficient data); 1 = Female; 2 = Probable female; 3 = Ambiguous sex; 4 = Probable male; 5 = Male.

Trait Description Score

Size

Ilium shape

Pelvic inlet

Pubic shape

Subpubic angle

Obturator foramen

Greater sciatic notch

Preauricular sulcus

Shape of sacrum

Overall sex assessment: ______

Notes: ______

429

7.1.3 Cranial morphology recording form

Skeletal collection: ______

Region: ______

Period: ______

Skeleton ID: ______Date: ______

Museum location: ______

Excavation date: ______Grave no.: ______

Scoring system 0 = Indeterminate (insufficient data); 1 = Female; 2 = Probable female; 3 = Ambiguous sex; 4 = Probable male; 5 = Male. Trait Description Score

General size

Architecture

Occipital area

Supraorbital ridges

Glabella

Mastoid process

Frontal eminences

Parietal eminences

Orbits

Forehead

Zygomatics

Palate

Occipital condyles

Mandible

Mental eminence

Gonial angle

Gonial flare

Overall sex assessment: ______

Notes: ______

430

7.1.4 Osteometric recording form

Skeletal collection: ______

Region: ______

Period: ______

Skeleton ID: ______Date: ______

Museum location: ______

Excavation date: ______Grave no.: ______

Bone Dimensions Left (mm) Right (mm) Cranium . Glabello-occipital length (GO) . Maximum width (MW) . Basion-bregma height (BB) . Maximum diameter bizygomatic (DB) . Prosthion-nasion height (PN) . Basion-nasion (BN) . Basion-prosthion (BP) . Nasal breadth (NB) . Palate – external breadth (PB) . Opisthion-forehead length (OF) . Mastoid length (ML) C2 . Maximum sagittal length (XSL) . Maximum height of dens (XDH) . Dens sagittal diameter (DSD) . Dens transverse diameter (DTD) . Length of vertebral foramen (LVF) . Maximum breadth across superior facets (SFB) . Superior facet sagittal diameter (SFS) . Superior facet transverse diameter (SFT) Femur . Maximum diameter of femoral head (FHD)* . Supero-inferior femoral neck diameter (FND) . Femoral shaft circumference (FSC) . Maximum femoral length (XFL)* . Minimum femoral transverse diameter (FTD)* . Epicondylar breadth of femur (EBF)* Tibia . Tibial length (TL) . Circumference at nutrient foramen (CNF) . Minimum shaft circumference (MSC) . Antero-posterior diameter (APD) . Transverse breadth (TB) . Proximal epiphyseal breadth (PEB) . Distal breadth (DEB) Humerus . Vertical (maximum) humeral head diameter

(HHD) . Maximum length of humerus (XHL) . Epicondylar width of humerus (EWH) Radius . Maximum length of radius (XRL) . Radius SBB (RSBB) . Maximum head diameter (MAXD) . Minimum head diameter (MIND) Ulna . Maximum length of ulna (XUL)

431

. Ulna SBB (USBB) Metacarpal . Interarticular length (IAL) 1 . Base M/L (BML) . Base A/P (BAP) . Head M/L (HML) . Head A/P (HAP) . Midshaft (MS) Metatarsal . Length (L) 1 . Supero-inferior head height (SIH) . Medio-lateral head width (MLH) . Supero-inferior base height (SIB) . Medio-lateral base width (MLB) . Midshaft diameter (MSD) Os coxa . Ischial length (IL) . Pubic length (PL) . Height of sciatic notch (HSN) . Acetabulo-sciatic breadth (ASB) Clavicle . Maximum length of clavicle (XCL) Scapula . Max. length of scapula (XHS) . Max. length of spine (XLS) . Breadth of infraspinous body (BXB) . Height of glenoid prominence (HAX) . Breadth of glenoid prominence (BCB) *Record left and right side where possible (FHD, XFL, FTD, EBF). M/L=medio-lateral; A/P=antero-posterior.

Notes: ______

432

7.2 Age at death estimation

7.2.1 Age at death recording form

Skeletal collection: ______

Region: ______

Period: ______

Skeleton ID: ______Date: ______

Museum location: ______

Excavation date: ______Grave no.: ______

Pubic symphysis Brooks S & Suchey JM. 1990. Human Evolution 5: 227–238.

Phase

Age range (years)

Auricular surface Buckberry JL & Chamberlain AT. 2002. AJPA 119: 231–239.

Feature Score Transverse organisation Surface texture Microporosity Macroporosity Apical changes Composite score Age range (years)

Sternal rib end İşcan MY, Loth SR, Wright RK. 1984. JFS 29: 1094–1104. İşcan MY, Loth SR, Wright RK. 1985. JFS 30: 853–863.

Phase

Age range (years)

Ectocranial suture closure Meindl RS, Lovejoy CO. 1985. AJPA 68: 57–66.

Scoring system 0=open; no evidence of suture closure, 1=minimal closure, 2=significant closure, 3=completely obliterated.

System Suture (observation site) Score Ectocranial vault Midlambdoid Lambda Obelion Anterior sagittal Bregma Midcoronal Pterion

433

Composite score Age range Lateral-anterior Midcoronal Pterion Sphenofrontal Inferior sphenotemporal Superior sphenotemporal Composite score Age range Overall age assessment based on cranial sutures (years)

Dental wear Lovejoy CO. 1985. AJPA 68: 47–56.

Phase Age range (years) Maxillary Mandibular Combined age range (years)

Overall (multi-factorial) age assessment (years): ______

Notes: ______

434

7.3 Variability statistics

7.3.1 Variability statistics broken down by sex

7.3.1.1 Males

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 156 186.7 0.46 5.7 32.7 185.8 187.6

MW 153 140.5 0.43 5.3 28.6 139.6 141.3

BB 137 135.39 0.43 5.1 25.6 134.5 136.2

DB 109 129.8 0.41 4.3 18.1 129.0 130.6

PN 122 71.3 0.34 3.8 14.1 70.6 72.0

BN 136 102.7 0.32 3.7 13.9 102.0 103.3

BP 123 95.1 0.39 4.4 19.0 94.4 95.9

NB 129 25.3 0.17 2.0 3.9 24.9 25.6

PB 114 62.1 0.30 3.3 10.6 61.5 62.9

OF 132 147.8 0.36 4.1 17.2 147.1 148.5

ML 165 33.6 0.26 3.3 10.7 33.1 34.1

XSL 46 48.1 0.42 2.9 8.2 47.3 49.0

XDH 60 37.1 0.36 2.8 7.6 36.4 37.8

DSD 65 11.3 0.11 0.9 0.8 11.1 11.5

DTD 65 10.2 0.11 0.9 0.7 10.0 10.4

LVF 63 15.5 0.19 1.5 2.4 15.1 15.8

SFB 52 45.1 0.27 2.0 3.8 44.5 45.6

SFS 59 18.2 0.16 1.2 1.5 17.9 18.5

SFT 60 17.1 0.17 1.3 1.7 16.8 17.4

FHD 78 44.9 0.27 2.4 5.7 44.3 45.4

435

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

FND 85 31.8 0.27 2.5 6.1 31.3 32.3

FSC 85 89.0 0.54 5.0 25.0 87.9 90.1

XFL 64 454.3 2.85 22.8 521.2 448.6 460.0

FTD 86 25.6 0.20 1.9 3.5 25.2 26.0

EBF 59 77.3 0.45 3.5 12.0 76.4 78.2

TL 68 383.1 2.17 17.9 320.8 378.8 387.5

CNF 86 95.8 0.74 6.8 36.8 94.4 97.3

MSC 83 76.2 0.54 4.9 24.4 75.1 77.3

APD 86 34.2 0.29 2.7 7.4 33.6 34.7

TB 83 22.5 0.20 1.8 3.2 22.1 22.9

PEB 56 72.4 0.37 2.8 7.8 71.6 73.1

DEB 78 44.1 0.30 2.6 6.9 43.5 44.7

HHD 76 43.8 0.29 2.5 6.4 43.3 44.4

XHL 65 319.6 1.73 14.0 194.9 316.1 323.0

EWH 83 61.2 0.45 4.1 16.7 60.3 62.1

XRL 66 251.6 1.51 12.3 151.3 248.6 254.7

RSBB 71 30.0 0.21 1.8 3.3 29.6 30.4

MAXD 42 22.8 0.19 1.2 1.5 22.4 23.2

MIND 37 21.4 0.20 1.2 1.4 21.0 21.8

XUL 56 270.8 1.7 12.3 152.5 267.5 274.1

USBB 61 16.8 0.19 1.5 2.3 16.4 17.2

IAL 62 45.6 0.28 2.2 4.7 45.0 46.1

436

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

BML 62 15.7 0.14 1.1 1.1 15.5 16.0

BAP 62 15.1 0.19 1.5 2.3 14.8 15.5

HML 62 14.5 0.12 1.0 0.9 14.3 14.8

HAP 60 12.3 0.13 1.0 1.0 13.0 13.6

MS 63 11.8 0.13 1.0 1.0 11.5 12.0

L 74 62.5 0.37 3.2 10.0 61.8 63.2

SIH 74 18.8 0.16 1.4 1.8 18.5 19.2

MLH 72 21.2 0.2 1.4 2.0 20.8 21.5

SIB 72 28.4 0.21 1.8 3.2 28.0 28.8

MLB 58 19.1 0.21 1.6 2.6 18.7 19.5

MSD 74 13.6 0.15 1.3 1.6 13.3 13.9

IL 37 78.5 0.62 3.8 14.2 77.3 79.8

PL 35 85.3 1.0 6.0 36.4 83.2 87.4

HSN 77 52.9 0.52 4.5 20.4 51.8 53.9

ASB 79 37.1 0.29 2.6 6.9 36.5 37.7

XCL 56 150.3 1.1 8.0 63.5 148.2 152.4

XHS 15 150.8 2.5 9.8 96.2 145.4 156.2

XLS 31 139.7 1.4 7.5 56.8 137.0 142.6

BXB 36 97.1 0.95 5.7 32.3 95.1 99.0

HAX 73 39.0 0.25 2.1 4.6 38.5 39.5

BCB 74 27.6 0.20 1.7 2.9 27.2 28.0

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

437

7.3.1.2 Females

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 119 176.8 0.44 4.8 23.4 175.9 177.7

MW 118 135.6 0.45 4.9 24.2 134.7 136.5

BB 108 129.9 0.45 4.6 21.6 129.1 130.8

DB 90 121.1 0.48 4.5 20.6 120.1 122.0

PN 98 67.0 0.36 3.58 12.8 66.2 67.7

BN 106 96.9 0.35 3.6 13.4 96.2 97.6

BP 98 90.7 0.42 4.2 17.5 89.9 91.6

NB 97 24.2 0.19 1.9 3.5 23.8 24.6

PB 88 59.5 0.35 3.3 10.7 58.8 60.2

OF 102 141.4 0.51 5.1 26.5 140.4 142.4

ML 123 28.5 0.31 3.5 12.0 27.8 29.1

XSL 37 45.2 0.46 2.8 7.8 44.2 46.1

XDH 43 34.4 0.28 1.9 3.4 33.9 35.0

DSD 48 10.5 0.12 0.8 0.7 10.3 10.7

DTD 48 9.7 0.09 0.6 0.4 9.5 9.8

LVF 47 15.4 0.19 1.3 1.7 15.0 15.8

SFB 36 42.5 0.34 2.0 4.2 41.8 43.2

SFS 40 17.2 0.17 1.1 1.2 16.8 17.5

SFT 44 15.7 0.20 1.3 1.7 15.4 16.1

FHD 58 39.5 0.30 2.23 5.15 38.9 40.1

FND 58 27.5 0.26 2.0 3.9 26.9 28.0

FSC 56 78.2 0.61 4.6 21.0 77.0 79.4

438

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

XFL 48 417.6 3.0 20.9 436.8 411.5 423.6

FTD 56 23.1 0.24 1.8 3.1 22.6 23.6

EBF 38 69.7 0.56 3.4 11.8 68.6 70.8

TL 46 351.2 2.73 18.5 341.6 345.7 356.6

CNF 55 83.3 0.80 5.9 34.8 81.7 84.9

MSC 52 66.3 0.62 4.5 20.0 65.1 67.6

APD 55 28.8 0.32 2.4 5.6 28.2 29.4

TB 55 20.0 0.21 1.5 2.4 19.6 20.4

PEB 36 64.0 0.63 3.8 14.4 62.7 65.3

DEB 47 39.6 0.42 2.9 8.2 38.8 40.4

HHD 56 38.2 0.37 2.8 7.7 37.4 38.9

XHL 47 293.0 2.34 16.1 258.3 288.3 297.7

EWH 62 54.2 0.41 3.3 10.6 53.4 55.0

XRL 46 226.5 1.8 12.3 152.2 222.9 230.2

RSBB 48 26.8 0.23 1.6 2.5 26.4 27.3

MAXD 26 19.5 0.32 1.6 2.6 18.8 20.1

MIND 26 18.4 0.27 1.4 1.9 17.8 19.0

XUL 40 247.2 1.84 11.7 135.9 243.5 251.0

USBB 42 15.1 0.18 1.2 1.3 14.7 15.4

IAL 40 41.5 0.31 2.0 4.0 40.9 42.1

BML 40 13.8 0.16 1.0 1.1 13.4 14.1

BAP 38 13.5 0.18 1.1 1.2 13.2 13.9

439

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

HML 40 12.6 0.16 1.0 1.0 12.2 12.9

HAP 40 11.9 0.15 1.0 0.9 11.6 12.2

MS 40 10.2 0.14 0.9 0.7 10.0 10.5

L 41 57.6 0.43 2.7 7.4 56.8 58.5

SIH 41 16.9 0.18 1.2 1.3 16.5 17.3

MLH 40 18.5 0.23 1.4 2.0 18.0 19.0

SIB 41 25.5 0.25 1.6 2.6 25.0 26.0

MLB 30 19.1 0.25 1.4 1.8 16.6 17.6

MSD 41 11.8 0.18 1.2 1.3 11.5 12.2

IL 25 71.4 0.88 4.4 19.6 69.6 73.2

PL 19 86.8 1.28 5.6 31.2 84.1 89.5

HSN 45 49.2 0.60 4.0 16.3 48.0 50.4

ASB 46 32.8 0.45 3.1 9.4 31.9 33.7

XCL 62 136.6 1.53 9.5 90.9 133.5 139.7

XHS 12 130.8 2.20 7.6 58.3 125.9 135.6

XLS 17 123.9 1.61 6.6 43.8 120.5 127.3

BXB 19 85.1 1.30 5.7 32.2 82.4 87.9

HAX 45 34.2 0.30 1.9 3.9 33.6 34.8

BCB 45 23.8 0.23 1.6 2.5 23.3 24.3

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

440

7.3.2 Variability statistics broken down by time period

7.3.2.1 Pre-dynastic Period

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 28 182.0 1.18 6.3 39.1 179.5 184.4

MW 27 132.1 0.98 5.1 26.0 130.0 134.1

BB 29 130.7 0.78 4.2 17.5 129.1 132.3

DB 14 121.4 1.46 5.5 30.0 118.3 124.6

PN 20 68.8 1.61 7.2 52.0 65.4 72.2

BN 27 100.0 0.78 4.0 16.2 98.1 101.3

BP 20 94.7 0.94 4.2 17.7 92.7 96.7

NB 24 25.6 0.48 2.3 5.4 24.6 26.6

PB 24 61.8 0.49 2.4 5.7 60.8 62.8

OF 25 148.0 0.78 3.9 15.4 146.4 149.7

ML 31 31.6 0.82 4.6 20.8 29.9 33.3

XSL 22 47.0 0.81 3.8 14.4 45.4 48.7

XDH 24 35.6 0.47 2.3 5.3 34.7 36.6

DSD 27 10.8 0.10 0.5 0.3 10.5 11.0

DTD 27 9.8 0.13 0.7 0.5 9.5 10.1

LVF 27 15.4 0.29 1.5 2.2 14.8 16.0

SFB 23 44.1 0.44 2.1 4.4 43.2 45.0

SFS 25 17.9 0.22 1.1 1.2 17.5 18.4

SFT 26 16.8 0.25 1.3 1.7 16.3 17.3

FHD 41 41.5 0.58 3.7 13.7 40.3 42.7

FND 42 29.4 0.43 2.8 7.6 28.6 30.3

441

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

FSC 41 83.5 1.00 6.4 40.8 81.5 85.6

XFL 38 429.9 4.01 24.7 610.7 421.8 438.0

FTD 41 24.2 0.29 1.9 3.5 23.6 24.8

EBF 36 72.9 0.83 5.0 24.9 71.2 74.5

TL 37 365.3 3.69 22.4 503.1 357.8 372.7

CNF 39 87.7 1.23 7.7 58.8 85.3 90.2

MSC 39 70.4 0.90 5.6 31.5 68.6 72.3

APD 39 31.1 0.54 3.4 11.2 30.0 32.2

TB 39 20.6 0.28 1.7 3.0 20.1 21.2

PEB 31 67.8 0.98 5.4 29.6 65.8 69.8

DEB 37 41.5 0.67 4.1 16.8 40.1 42.8

HHD 39 40.3 0.57 3.6 12.6 39.2 41.5

XHL 38 303.08 2.71 16.7 279.6 297.6 308.6

EWH 40 58.0 0.73 4.6 21.0 56.5 59.4

XRL 31 238.9 2.97 16.5 272.7 232.8 245.0

RSBB 32 27.6 0.36 2.1 4.2 26.9 28.3

MAXD 24 20.8 0.40 1.9 3.8 20.0 21.6

MIND 24 19.7 0.37 1.8 3.2 19.0 20.5

XUL 26 257.7 0.86 14.6 212.7 251.8 263.6

USBB 30 15.8 0.26 1.4 2.1 15.3 16.3

IAL 20 43.4 0.66 2.9 8.6 42.1 44.8

BML 20 14.7 0.32 1.5 2.1 14.0 15.4

442

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

BAP 20 14.4 0.35 1.6 2.4 13.7 15.2

HML 20 13.5 0.37 1.7 2.8 12.8 14.3

HAP 19 12.6 0.25 1.1 1.2 12.0 13.1

MS 20 11.1 0.29 1.3 1.7 10.5 11.7

L 27 60.3 0.71 3.7 13.7 58.8 61.8

SIH 27 17.8 0.26 1.3 1.8 17.3 18.4

MLH 26 20.0 0.33 1.7 2.9 19.3 20.7

SIB 27 27.0 0.42 2.2 4.9 26.1 27.8

MLB 26 18.1 0.32 1.6 2.67 17.4 18.8

MSD 27 12.2 0.23 1.2 1.4 11.7 12.6

IL 16 75.1 1.10 4.4 19.0 72.7 77.4

PL 14 86.3 2.03 7.6 57.5 82.0 90.7

HSN 28 50.4 0.87 4.6 21.0 48.7 52.2

ASB 28 34.6 0.60 3.2 10.1 33.4 35.8

XCL 34 139.8 2.03 11.9 140.7 135.6 143.9

XHS 9 140.0 4.77 14.3 205.1 129.0 151.0

XLS 17 131.2 3.13 12.9 166.9 124.6 137.9

BXB 19 92.1 2.01 8.7 76.5 87.9 96.3

HAX 21 36.3 0.62 2.8 8.0 35.0 37.5

BCB 21 25.2 0.54 2.5 6.1 24.1 26.3

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

443

7.3.2.2 Old Kingdom

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 78 181.6 0.84 7.5 55.5 180.0 183.3

MW 75 139.4 0.69 6.0 35.9 138.0 140.8

BB 48 134.2 0.75 5.2 27.3 132.7 135.7

DB 27 126.0 1.2 6.1 36.6 123.6 128.4

PN 36 70.0 0.75 4.5 20.4 68.2 71.2

BN 46 99.9 0.68 4.6 21.6 98.5 101.3

BP 36 92.9 0.72 4.3 18.7 91.5 94.4

NB 38 25.3 0.38 2.3 5.5 24.6 26.1

PB 38 60.5 1.2 7.3 53.4 58.1 62.9

OF 45 146.6 0.78 5.2 27.1 145.0 148.2

ML 89 31.2 0.48 4.5 20.6 30.3 32.2

XSL 51 47.1 0.42 3.0 9.2 46.2 47.9

XDH 69 36.3 0.35 2.9 8.6 35.6 37.0

DSD 76 11.1 0.12 1.0 1.1 10.9 11.4

DTD 76 10.1 0.10 0.85 0.7 9.9 10.3

LVF 73 15.6 0.17 1.4 2.0 15.2 15.9

SFB 55 44.4 0.30 2.2 4.9 43.8 45.0

SFS 64 17.8 0.16 1.3 1.6 17.5 18.1

SFT 68 16.5 0.18 1.5 2.3 16.2 16.9

FHD 80 43.2 0.37 3.3 10.9 42.5 43.9

FND 86 30.4 0.34 3.2 10.2 29.7 31.1

FSC 85 85.8 0.81 7.5 55.6 84.2 87.4

444

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

XFL 59 445.5 3.87 29.7 882.0 437.8 453.2

FTD 86 25.0 0.25 2.3 5.3 24.5 25.5

EBF 48 75.5 0.72 5.0 24.7 74.1 76.9

TL 62 374.5 3.08 24.3 589.0 368.3 380.6

CNF 87 92.9 0.97 9.1 82.0 91.0 94.8

MSC 81 74.0 0.79 7.1 50.1 72.4 75.5

APD 87 32.8 0.40 3.7 13.8 32.0 33.5

TB 84 22.1 0.23 2.1 4.3 21.6 22.5

PEB 49 70.5 0.65 4.6 20.7 69.2 71.9

DEB 73 43.0 0.37 3.1 9.8 42.3 43.8

HHD 78 42.2 0.44 3.9 15.3 41.3 43.1

XHL 59 313.9 2.6 19.9 395.8 308.7 319.1

EWH 90 58.2 0.58 5.5 29.9 57.1 59.4

XRL 66 244.2 2.11 17.2 294.8 240.0 248.5

RSBB 73 29.3 0.27 2.35 5.5 28.7 29.8

MAXD 33 22.1 0.36 2.1 4.3 21.4 22.9

MIND 28 20.6 0.37 1.9 3.7 19.8 21.3

XUL 55 264.1 2.24 16.6 274.8 259.6 268.5

USBB 58 16.3 0.22 1.7 2.8 15.9 16.7

IAL 73 44.3 0.34 2.9 8.3 43.7 45.0

BML 73 15.0 0.17 1.4 2.1 14.7 15.4

BAP 71 14.5 0.19 1.6 2.7 14.1 14.9

445

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

HML 73 13.8 0.15 1.3 1.7 13.5 14.1

HAP 72 12.8 0.14 1.2 1.5 12.5 13.1

MS 74 11.2 0.14 1.2 1.5 10.9 11.4

L 78 61.2 0.44 3.9 15.1 60.3 62.0

SIH 78 18.3 0.19 1.7 2.8 17.9 18.7

MLH 76 20.3 0.22 2.0 3.8 19.9 20.8

SIB 76 27.6 0.25 2.2 4.8 27.1 28.1

MLB 54 18.5 0.26 1.9 3.7 17.9 19.0

MSD 78 13.2 0.17 1.5 2.2 12.9 13.6

IL 32 76.9 0.98 5.5 30.6 74.9 78.9

PL 29 86.2 0.97 5.2 27.2 84.2 88.1

HSN 79 52.1 0.52 4.7 21.8 51.1 53.2

ASB 82 35.7 0.40 3.6 13.3 34.9 36.5

XCL 49 148.3 1.35 9.5 89.9 145.5 151.0

XHS 9 148.4 4.9 14.8 219.5 137.0 159.8

XLS 22 137.2 1.65 7.8 60.3 133.7 140.6

BXB 26 95.7 1.2 6.0 36.2 93.3 98.2

HAX 83 37.4 0.34 3.1 9.8 36.7 38.0

BCB 84 26.5 0.27 2.5 6.1 26.0 27.0

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

446

7.3.2.3 Late Period

7.3.3 Variability statistics broken down by cemetery site

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 154 183.1 0.59 7.3 54.0 181.9 184.3

MW 154 139.3 0.39 4.9 23.9 138.5 140.0

BB 153 133.0 0.47 5.8 33.6 132.1 134.0

DB 147 126.5 0.50 6.0 36.6 125.5 127.5

PN 154 69.7 0.34 4.2 17.7 69.0 70.3

BN 154 100.4 0.39 4.8 23.4 99.6 101.1

BP 154 92.9 0.40 5.0 24.7 92.2 93.7

NB 153 24.5 0.13 1.7 2.7 24.2 24.8

PB 130 60.7 0.29 3.3 11.0 60.2 61.3

OF 149 144.1 0.47 5.8 33.5 143.2 145.1

ML 153 31.5 0.30 3.7 13.71 30.9 32.1

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

7.3.3.1 Keneh

This sample is identical to the Pre-dynastic Period sample, for which the variability statistics are given in Section 7.3.2.1.

447

7.3.3.2 Sheikh Farag

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 13 178.6 1.62 5.9 34.3 175.0 182.1

MW 13 134.7 1.42 5.1 26.3 131.6 137.8

BB 13 132.9 1.46 5.9 35.1 129.3 136.4

DB 10 121.6 2.00 6.3 39.9 117.1 126.1

PN 9 68.0 1.46 4.4 19.1 64.6 71.3

BN 13 98.4 1.10 4.0 15.8 96.0 100.8

BP 9 94.6 1.73 5.2 27.0 90.6 98.6

NB 9 24.6 0.89 2.7 7.1 22.6 26.7

PB 11 59.6 1.72 5.7 32.6 55.8 63.5

OF 13 143.9 1.34 4.8 23.2 140.9 146.8

ML 13 30.6 1.81 6.5 42.5 26.7 34.6

XSL 9 44.9 0.60 1.8 3.2 43.5 46.3

XDH 9 34.7 0.75 2.2 5.0 33.0 36.4

DSD 9 10.3 0.29 0.9 0.7 9.6 11.0

DTD 9 9.4 0.11 0.3 0.1 9.2 9.7

LVF 9 14.4 0.37 1.1 1.2 13.5 15.2

SFB 9 42.0 0.80 2.4 5.8 40.1 43.8

SFS 9 17.2 0.51 1.5 2.3 16.0 18.3

SFT 9 15.8 0.42 1.3 1.6 14.8 16.8

FHD 13 41.7 0.97 3.5 12.2 39.6 43.8

FND 13 29.3 0.92 3.3 11.0 27.3 31.3

FSC 13 80.9 1.78 6.4 41.2 77.0 84.8

448

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

XFL 13 431.7 7.63 27.5 756.4 415.1 48.3

FTD 13 23.2 0.47 1.7 2.9 22.2 24.2

EBF 12 73.4 1.28 4.4 19.8 70.6 76.2

TL 13 364.7 7.00 25.3 637.9 349.4 380.0

CNF 13 86.8 2.28 8.2 67.6 81.9 91.8

MSC 13 68.5 1.61 5.8 33.8 65.0 72.1

APD 13 30.2 0.93 3.3 11.2 28.1 32.2

TB 13 20.7 0.62 2.2 5.0 19.3 22.0

PEB 11 66.7 1.82 6.0 36.3 62.6 70.7

DEB 13 41.1 0.67 2.4 5.8 39.7 42.6

HHD 13 40.2 1.01 3.7 13.3 38.0 42.4

XHL 13 301.0 5.84 21.1 443.3 288.3 313.7

EWH 13 58.1 1.22 4.4 19.4 55.4 60.7

XRL 13 234.00 4.91 17.7 314.5 223.3 244.7

RSBB 12 28.3 0.54 1.9 3.5 27.1 29.5

MAXD 9 21.0 0.68 2.0 4.2 19.4 22.5

MIND 9 19.6 0.57 1.7 2.9 18.2 20.9

XUL 13 255.5 5.22 18.8 3.5 244.2 266.9

USBB 13 15.8 0.49 1.8 3.1 14.7 16.9

IAL 7 42.5 0.83 2.2 4.8 40.5 44.5

BML 7 14.9 0.54 1.4 2.1 13.5 16.2

BAP 7 15.0 0.44 1.2 1.4 14.0 16.1

449

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

HML 7 13.4 0.48 1.3 1.6 12.2 14.6

HAP 7 13.0 0.52 1.4 1.9 11.7 14.2

MS 7 11.2 0.33 0.9 0.7 10.4 12.0

L 8 58.9 0.91 2.6 6.6 56.7 61.0

SIH 8 17.4 0.46 1.3 1.7 16.3 18.5

MLH 8 19.4 0.75 2.1 4.5 17.6 21.2

SIB 8 26.2 0.73 2.1 4.2 24.5 27.9

MLB 8 19.0 0.67 1.6 2.7 17.3 20.8

MSD 8 12.9 0.52 1.5 2.1 11.7 14.1

IL 12 73.3 1.52 5.3 27.8 69.9 76.6

PL 10 84.1 1.8 5.6 31.1 80.1 88.1

HSN 13 49.7 1.27 4.6 20.9 46.9 52.5

ASB 13 36.1 0.78 2.8 8.0 34.4 37.8

XCL 11 145.5 2.08 6.9 47.5 140.8 150.1

XHS 8 137.8 3.38 9.6 91.6 129.7 145.8

XLS 8 133.4 3.77 10.7 113.8 124.5 142.3

BXB 9 88.2 2.81 8.4 70.9 81.8 94.7

HAX 12 37.1 0.99 3.4 11.8 34.9 39.3

BCB 12 25.3 0.64 2.2 4.9 23.9 26.7

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

450

7.3.3.3 Giza

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 232 182.6 0.49 7.4 54.7 181.6 183.6

MW 229 139.3 0.35 5.3 27.7 138.6 140.0

BB 201 133.3 0.40 5.7 32.2 132.5 134.1

DB 174 126.5 0.46 6.0 36.4 125.5 127.4

PN 190 69.7 0.31 4.3 18.1 69.1 70.3

BN 200 100.3 0.34 4.8 22.9 99.6 100.9

BP 190 92.9 0.35 4.8 23.4 92.2 93.6

NB 191 24.7 0.13 1.8 3.4 24.4 24.9

PB 167 60.9 0.27 3.4 11.9 60.4 61.4

OF 194 144.7 0.41 5.7 32.9 143.9 145.5

ML 242 31.4 0.26 4.0 16.2 30.9 31.9

XSL 51 47.1 0.42 3.0 9.2 46.2 47.9

XDH 69 36.3 0.35 2.9 8.6 35.6 37.0

DSD 76 11.1 0.12 1.0 1.1 10.9 11.4

DTD 76 10.1 0.10 0.9 0.7 9.9 10.3

LVF 73 15.6 0.17 1.4 2.0 15.2 15.9

SFB 55 44.4 0.30 2.2 4.9 43.8 45.0

SFS 64 17.8 0.16 1.3 1.6 17.5 18.1

SFT 68 16.5 0.18 1.5 2.3 16.2 16.9

FHD 80 43.2 0.40 3.3 10.9 42.5 43.9

FND 86 30.4 0.34 3.2 10.2 29.7 31.1

FSC 85 85.8 0.81 7.5 55.6 84.2 87.4

451

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

XFL 59 445.5 3.87 29.7 882.0 437.8 453.2

FTD 86 25.0 0.25 2.3 5.3 24.5 25.5

EBF 48 75.5 0.72 5.0 24.7 74.1 76.9

TL 62 374.5 3.08 24.3 589.0 368.3 380.6

CNF 87 92.9 0.97 9.1 82.0 91.0 94.8

MSC 81 74.0 0.79 7.1 50.1 72.4 75.5

APD 87 32.8 0.40 3.7 13.8 32.0 33.5

TB 84 22.1 0.23 2.1 4.3 21.6 22.5

PEB 49 70.5 0.65 4.6 20.7 69.2 71.9

DEB 73 43.0 0.37 3.1 9.8 42.3 43.8

HHD 78 42.2 0.44 3.9 15.3 41.3 43.1

XHL 59 313.9 0.59 19.9 395.8 308.7 319.1

EWH 90 58.2 0.58 5.5 29.9 57.1 59.4

XRL 66 244.2 2.11 17.2 294.8 240.0 248.5

RSBB 73 29.3 0.27 2.3 5.5 28.7 29.8

MAXD 33 22.1 0.36 2.1 4.3 21.4 22.9

MIND 28 20.6 0.37 1.9 3.7 19.8 21.3

XUL 55 264.1 2.24 16.6 274.8 259.6 268.5

USBB 58 16.3 0.22 1.7 2.8 15.9 16.7

IAL 73 44.3 0.34 2.9 8.4 43.7 45.0

BML 73 15.0 0.17 1.4 2.1 14.7 15.4

BAP 71 14.5 0.19 1.6 2.7 14.1 14.9

452

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

HML 73 13.8 0.15 1.3 1.7 13.5 14.1

HAP 72 12.8 0.14 1.2 1.5 12.5 13.1

MS 74 11.2 0.14 1.2 1.5 10.9 11.5

L 78 61.2 0.44 3.9 15.1 60.3 62.0

SIH 78 18.3 0.19 1.7 2.8 17.9 18.7

MLH 76 20.3 0.22 2.0 3.8 19.9 20.8

SIB 76 27.6 0.25 2.2 4.8 27.1 28.1

MLB 54 18.5 0.26 1.9 3.7 17.9 19.0

MSD 78 13.2 0.17 1.5 2.2 12.9 13.6

IL 32 76.9 0.98 5.5 30.6 74.9 78.9

PL 29 86.2 0.97 5.2 27.2 84.2 88.2

HSN 79 52.1 0.52 4.7 21.8 51.1 53.2

ASB 82 35.7 0.40 3.6 13.3 34.9 36.5

XCL 49 148.3 1.35 9.5 89.9 145.5 151.0

XHS 9 148.4 4.9 14.8 219.5 137.0 159.8

XLS 22 137.2 1.65 7.8 60.3 133.7 140.6

BXB 26 95.7 1.18 6.0 36.2 93.3 98.2

HAX 83 37.4 0.34 3.1 9.8 36.9 38.0

BCB 84 26.5 0.27 2.5 6.1 26.0 27.0

Definitions of the bone dimension acronyms may be found in Table 2.2K in Section 2.2.2.1. N, the number of individuals for whom the dimension could be measured; SE of mean, standard error of the mean; SD, standard deviation.

7.3.3.4 Thebes

Variability statistics were not calculated because the sample consists of two individuals only.

453

7.4 Results of intra- and inter-observer error tests

7.4.1 Intra-observer error

Precision Reliability Paired samples t-test

Dimen. N TEM, %TEM Mean per R* Pearson T-value P-value mm cent error, % correlation, r

GO 20 0.61 0.34 0.36 0.993 0.992 -0.459 0.652 MW 18 0.17 0.12 0.13 0.999 0.999 -0.487 0.633 BB 16 0.42 0.32 0.25 0.994 0.995 0.365 0.720 DB 13 0.25 0.20 0.17 0.998 0.999 -1.594 0.137 PN 16 0.51 0.73 0.83 0.986 0.990 -0.134 0.895 BN 16 0.21 0.21 0.23 0.998 0.997 -0.249 0.807 BP 16 0.56 0.61 0.64 0.986 0.982 0.061 0.952 NB 17 0.14 0.57 0.63 0.995 0.994 -0.487 0.633 PB 12 0.23 0.37 0.35 0.985 0.992 0.604 0.558 OF 16 8.35 5.81 5.24 -1.231 0.416 -1.491 0.157 ML 21 1.69 5.35 5.92 0.839 0.812 -0.975 0.341 XSL 11 0.64 1.37 1.35 0.959 0.967 -0.244 0.812 XDH 12 0.87 2.40 1.20 0.900 0.944 1.044 0.319 DSD 13 0.16 1.46 1.21 0.972 0.969 1.387 0.191 DTD 13 0.14 1.38 1.32 0.969 0.934 -0.138 0.892 LVF 13 0.29 1.93 2.27 0.959 0.982 -2.316 0.039 SFB 10 0.09 0.21 0.18 0.999 0.999 -0.882 0.401 SFS 12 0.14 0.84 0.81 0.988 0.965 0.678 0.512 SFT 13 0.21 1.30 1.10 0.979 0.982 1.826 0.093 FHD 14 0.28 0.65 0.62 0.994 0.999 -2.639 0.020 FND 15 0.42 1.40 1.60 0.982 0.978 -0.047 0.963 FSC 14 0.76 0.90 0.85 0.989 0.988 -1.000 0.336 XFL 14 6.17 1.40 0.83 0.953 0.975 0.691 0.502 FTD 15 0.18 0.72 0.86 0.993 0.995 2.681 0.018 EBF 7 0.40 0.54 0.52 0.994 0.999 -1.103 0.312 TL 12 1.47 0.40 0.38 0.996 0.997 0.000 1.000 CNF 15 1.54 1.68 1.60 0.970 0.983 1.936 0.073 MSC 15 1.43 1.96 2.32 0.956 0.965 -1.461 0.166 APD 15 0.69 2.17 1.82 0.965 0.970 -0.279 0.785 TB 15 0.24 1.10 1.00 0.987 0.987 1.101 0.289 PEB 9 0.58 0.85 0.88 0.988 0.998 0.483 0.642 DEB 14 0.82 1.96 2.50 0.945 0.952 0.115 0.910 HHD 14 0.28 0.67 0.80 0.995 0.995 0.092 0.928 XHL 13 0.44 0.14 0.12 1.000 1.000 -2.739 0.018 EWH 13 0.38 0.65 0.56 0.994 0.997 0.805 0.437 XRL 12 2.14 0.90 0.67 0.985 0.991 1.264 0.232 RSSB 13 0.27 0.95 1.08 0.987 0.991 0.724 0.483 MAXD 9 0.13 0.63 0.77 0.996 0.997 -2.022 0.078 MIND 9 0.10 0.50 0.34 0.997 0.997 -0.892 0.398 XUL 11 0.71 0.27 0.18 0.998 0.999 -0.289 0.779 USSB 11 0.27 1.74 1.93 0.972 0.937 -0.721 0.488 IAL 6 0.17 0.39 0.49 0.997 0.999 -1.080 0.329

454

BML 7 0.70 4.73 4.53 0.757 0.807 -1.654 0.149 BAP 7 1.31 9.00 8.65 0.321 -0.153 0.541 0.608 HML 6 0.15 1.13 1.27 0.988 0.990 -0.672 0.531 HAP 6 0.18 1.37 1.58 0.978 0.995 1.566 0.178 MS 7 0.14 1.25 1.52 0.987 0.987 -1.608 0.159 L 10 0.14 0.23 0.29 0.999 0.999 -1.745 0.115 SIH 10 0.30 1.65 1.61 0.964 0.922 0.850 0.417 MLH 10 0.40 2.03 2.39 0.956 0.939 -1.572 0.150 SIB 10 0.31 1.13 1.29 0.980 0.987 -1.649 0.134 MLB 6 0.48 2.60 2.28 0.929 0.907 0.326 0.757 MSD 10 0.21 1.66 1.77 0.980 0.965 0.625 0.548 IL 7 1.13 1.46 1.86 0.955 0.973 -0.612 0.563 PL 6 1.28 1.39 1.59 0.952 0.885 -1.332 0.240 HSN 12 1.50 2.83 2.50 0.897 0.902 -0.331 0.747 ASB 12 0.28 0.76 0.77 0.994 0.993 1.712 0.115 XCL 10 0.63 0.44 0.42 0.997 0.998 -0.688 0.509 XHS 2 0.50 0.34 0.35 0.999 1.000 1.222 0.437 XLS 2 0.45 0.34 0.48 0.998 1.000 -0.062 0.960 BXB 3 5.14 5.36 6.43 0.589 0.947 -1.505 0.271 HAX 11 0.62 1.66 2.55 0.960 0.917 -1.126 0.286 BCB 10 0.20 0.75 0.86 0.993 0.997 -2.177 0.057 *SD for each dimension (required for calculation) given in Table 3.1F. Dimen., dimension.

7.4.2 Inter-observer error

7.4.2.1 EJM vs. IK-O

Precision Reliability Paired samples t-test

Dimen. N TEM, %TEM Mean per R* Pearson T-value P-value mm cent error, % correlation, r

GO 2 0.95 0.52 0.50 0.988 - - - MW 2 0.63 0.47 0.64 0.965 - - - BB 2 1.01 0.79 1.05 0.988 - - - DB 2 0.07 0.06 0.08 >0.999 - - - BP 2 0.86 0.90 1.02 0.957 - - - NB 2 1.10 4.41 6.04 0.108 - - - PB 2 0.90 1.51 2.07 -0.038 - - - OF 2 3.96 2.68 2.99 0.847 - - - ML 2 2.26 7.01 10.53 0.920 - - - FHD 2 0.56 1.20 2.50 0.996 - - - FND 2 0.50 1.53 1.73 0.993 - - - FSC 2 1.12 1.21 1.56 0.997 - - - XFL 2 1.50 0.33 0.31 0.998 - - - FTD 2 0.00 0.00 0.00 1.000 - - - EBF 2 0.56 0.68 0.95 0.998 - - - TL 1 4.95 1.28 1.91 - - - - CNF 2 5.22 5.18 5.87 0.933 - - - MSC 2 1.50 1.88 1.69 0.990 - - - APD 2 0.71 1.96 2.80 0.960 - - - TB 2 0.56 2.12 2.25 0.991 - - -

455

PEB 2 0.43 0.55 0.79 0.997 - - - DEB 2 1.41 3.13 4.37 0.956 - - - HHD 2 0.50 1.07 2.30 0.983 - - - XHL 2 2.50 0.73 0.77 0.992 - - - EWH 2 0.00 0.00 0.00 1.000 - - - XRL 1 1.41 0.53 0.75 - - - - RSSB 1 0.35 0.96 1.35 - - - - MAXD 1 0.99 3.78 5.49 - - - - MIND 1 1.41 5.89 8.70 - - - - IAL 1 0.00 0.00 0.00 - - - - BML 1 0.35 2.32 3.33 - - - - BAP 1 0.35 2.18 3.13 - - - - HML 1 1.41 9.43 14.29 - - - - HAP 1 0.35 2.48 3.45 - - - - MS 1 0.35 2.77 4.00 - - - - IL 3 1.57 2.00 2.72 0.986 - - - PL 1 8.49 8.66 11.54 - - - - HSN 3 1.47 2.64 3.31 0.975 - - - ASB 3 0.41 1.17 1.01 0.992 - - - XLS 2 0.75 0.50 0.56 0.999 - - - BXB 2 3.40 3.09 4.47 0.833 - - - HAX 2 1.03 2.51 2.88 0.936 - - - BCB 2 0.79 3.04 3.71 0.445 - - - Dimen., dimension.

7.4.2.2 EJM vs. MR

Precision Reliability Paired samples t-test Dimen. N TEM, %TEM Mean per R* Pearson T-value P-value mm cent error, % correlation, r FHD 55 0.34 0.81 1.01 0.991 0.993 -3.418 0.001 XFL 45 1.59 0.37 0.33 0.996 0.996 0.591 0.557 TL 43 3.34 0.92 1.17 0.974 0.996 14.231 <0.000 XHL 47 2.60 0.86 0.76 0.977 0.978 -1.095 0.279 XRL 5 0.32 0.14 0.09 >0.999 1.000 -1.000 0.374 *Standard deviation calculated using SPSS. Dimen., dimension.

7.5 Per cent sexual dimorphism by time period

Predynastic Period Old Kingdom Late Period Dim. M F mean %D M mean F mean %D M mean F Mean %D mean (mm) (mm) (mm) (mm) (mm) (mm) GO 185.04 179.29 3.21 184.99 175.64 5.32 188.14 176.88 6.37 MW 135.55 129.65 4.55 140.60 137.07 2.58 141.33 136.70 3.39 BB 133.39 128.49 3.81 135.24 132.21 2.29 135.67 129.83 4.50 DB 129.70 119.17 8.84 127.51 121.67 4.80 130.55 121.60 7.36 PN 73.84 66.09 11.73 71.07 66.12 7.49 71.48 67.45 5.97 BN 102.18 97.77 4.51 101.34 96.97 4.51 103.32 96.72 6.82 BP 97.21 93.37 4.11 93.72 90.86 3.17 95.27 90.08 5.76 NB 26.34 25.04 5.19 25.57 24.71 3.48 24.92 23.95 4.05 PB 62.45 53.05 17.72 60.41 60.77 -0.59 61.96 59.22 4.63

456

OF 150.68 145.58 3.50 147.80 144.22 2.48 147.55 139.99 5.40 ML 34.69 29.08 19.29 32.76 28.28 15.84 33.88 28.57 18.59 XSL 49.37 45.09 9.49 48.08 45.39 5.93 - - - XDH 37.32 34.22 9.06 37.19 34.66 7.30 - - - DSD 10.83 10.69 1.31 11.50 10.45 10.05 - - - DTD 9.98 9.63 3.63 10.32 9.71 6.28 - - - LVF 15.89 15.03 5.72 15.44 15.80 -2.28 - - - SFB 45.44 42.64 6.57 45.12 43.03 4.86 - - - SFS 18.32 17.58 4.21 18.17 17.09 6.32 - - - SFT 17.63 16.14 9.23 17.03 15.64 8.89 - - - FHD 44.19 39.15 12.87 44.21 39.89 10.83 - - - FND 31.32 27.74 12.91 31.93 27.40 16.53 - - - FSC 88.47 79.27 11.61 89.55 77.85 15.03 - - - XFL 447.29 415.81 7.57 457.44 422.20 8.35 - - - FTD 25.21 23.28 8.29 25.96 23.02 12.77 - - - EBF 76.34 70.07 8.95 75.67 69.19 9.37 - - - TL 382.82 350.35 9.27 383.42 354.16 8.26 - - - CNF 93.17 83.10 12.12 96.90 84.07 15.26 - - - MSC 74.83 66.67 12.24 77.11 66.54 15.89 - - - APD 33.64 28.91 16.36 34.47 28.95 19.07 - - - TB 21.49 19.92 7.88 22.87 20.31 12.60 - - - PEB 72.19 64.22 12.41 72.69 64.61 12.51 - - - DEB 44.24 39.10 13.15 44.24 39.90 10.88 - - - HHD 42.75 38.00 12.50 44.41 38.48 15.41 - - - XHL 316.33 291.15 8.65 321.74 298.70 7.71 - - - EWH 61.83 54.50 13.45 60.90 53.85 13.09 - - - XRL 253.50 226.89 11.73 252.00 228.73 10.17 - - - RSSB 29.28 26.11 12.14 30.35 27.23 11.46 - - - MAXD 22.20 18.85 17.77 23.16 20.07 15.40 - - - MIND 20.99 17.93 17.07 21.70 18.87 15.00 - - - XUL 272.00 248.81 9.32 271.08 248.35 9.15 - - - USSB 16.75 15.09 11.00 16.89 15.00 12.60 - - - IAL 46.36 41.50 11.71 45.59 41.75 9.20 - - - BML 15.96 13.83 15.40 15.68 13.67 14.70 - - - BAP 15.45 13.77 12.20 15.03 13.20 13.86 - - - HML 15.18 12.45 21.93 14.41 12.68 13.64 - - - HAP 13.60 11.96 13.71 13.19 11.92 10.65 - - - MS 11.80 10.62 11.11 11.79 9.97 18.25 - - - L 62.74 57.68 8.77 62.62 57.70 8.53 - - - SIH 18.66 16.97 9.96 18.90 16.94 11.57 - - - MLH 21.01 18.85 11.46 21.17 18.44 14.80 - - - SIB 28.29 25.52 10.85 28.44 25.53 11.40 - - - MLB 19.03 17.18 10.77 19.03 16.88 12.74 - - - MSD 12.65 11.62 8.86 13.79 11.90 15.88 - - - IL 77.10 72.43 6.45 79.50 72.02 10.39 - - - PL 84.36 89.88 -6.14 86.04 86.60 -0.65 - - - HSN 52.33 48.26 8.43 52.97 50.29 5.33 - - - ASB 36.31 32.61 11.35 37.22 32.53 14.42 - - - XCL 148.87 132.58 12.29 151.09 141.87 6.50 - - - XHS 155.57 132.25 17.63 153.48 130.60 17.52 - - - XLS 142.13 121.53 16.95 139.10 128.46 8.28 - - - BXB 98.42 85.11 15.64 97.36 88.99 9.41 - - - HAX 38.41 34.30 11.98 38.91 34.16 13.91 - - - BCB 27.10 23.45 15.57 27.71 23.91 15.89 - - - Dim.=dimension; M=male; F=female; %D=per cent sexual dimorphism.

457

7.6 Analysis of metric data from Saqqara-West sample

The Shapiro-Wilk test for normality demonstrated that the data for all five variables were normally distributed (P>0.05 for all variables).

7.6.1 Outliers and extreme scores

Exploration of the Saqqara-West metric data identified seven individual measurements that were considered to be outliers (two very small measurements of GO in two males; one very large measurement of DB in a male; one very large measurement of FHD in a female; one very large measurement of PEB in a male; two very large measurements of XUL in two males). After careful consideration of the measurement scores, it was decided that all outliers should be kept in the data set to reflect natural human and sample variation. No extreme scores were identified.

7.6.2 Descriptive statistics

The descriptive statistics for the Saqqara-West necropolis sample are given in Table 7.5A. In this table, N denotes the number of individuals for whom the dimension could be measured. The minimum and maximum values for the dimension are additionally provided, as well as the overall mean, and the male and female, and Old Kingdom and Ptolemaic Period means for each dimension.

Table 7.5A: Descriptive statistics for the Saqqara-West sample.

Dimension N Range Min. Max. Mean* M mean F mean OK PP mean mean

GO 84 31.00 166.00 197.00 181.96 184.19 177.95 181.40 182.13

MW 91 22.00 125.00 147.00 137.92 138.72 136.59 138.95 137.63

DB 59 34.00 108.00 142.00 126.92 129.80 122.09 125.89 127.38

FHD 108 15.90 35.50 51.40 44.14 46.08 40.26 43.49 44.33

PEB 95 23.70 60.90 84.60 72.90 75.76 67.53 71.14 73.34

GO, glabello-occipital length of cranium; MW, maximum width of cranium; DB, maximum bizygomatic breadth of cranium; FHD, femoral head diameter; PEB, proximal epiphyseal breadth of tibia; N, the number of individuals for whom the dimension could be measured; M, male; F, female; OK, Old Kingdom; PP, Ptolemaic Period. *Overall mean.

The variability statistics for the five skeletal dimensions measured by both EJM and IK-O and required for the blind test are given in Table 7.5B.

458

Table 7.5B: Variability statistics of skeletal dimensions.

Dimension N Mean SE of mean SD Variance 95% confidence interval

Lower Upper

GO 84 181.96 0.67960 6.22868 38.796 180.61 183.31

MW 91 137.92 0.52139 4.97378 24.738 136.89 138.96

DB 59 126.92 0.79869 6.13485 37.636 125.33 128.52

FHD 108 44.14 0.33781 3.51062 12.324 43.47 44.81

PEB 95 72.90 0.51427 5.01246 25.125 71.88 73.92

GO, glabello-occipital length of cranium; MW, maximum width of cranium; DB, maximum bizygomatic breadth of cranium; FHD, femoral head diameter; PEB, proximal epiphyseal breadth of tibia; N, the number of individuals for whom the dimension could be measured; SE, standard error; SD, standard deviation.

7.6.2.1 Comparison of means

An independent samples t-test demonstrated statistically significant differences between male and female means for all five dimensions collected from the Saqqara-West necropolis sample

(P≤0.047 for all analyses; Table 7.5C).

Table 7.5C: Results of the independent samples t-test comparing male and female means for all skeletal dimensions collected from the Saqqara-West sample.

Dimension N for dimension Mean (mm) Levene’s Test* T-value P-value

Male Female Male Female F Sig.

GO 54 30 184.19 177.95 0.001 0.975 4.989 <0.001

MW 57 34 138.72 136.59 0.011 0.916 2.010 0.047

DB 37 22 129.80 122.09 0.223 0.638 5.852 <0.001

FHD 72 36 46.08 40.26 0.203 0.654 13.050 <0.001

PEB 62 33 75.76 67.53 0.193 0.661 12.242 <0.001

*Levene’s Test for Equality of Variances. **Equal variances not assumed. GO, glabello-occipital length of cranium; MW, maximum width of cranium; DB, maximum bizygomatic breadth of cranium; FHD, femoral head diameter; PEB, proximal epiphyseal breadth of tibia; N, the number of individuals in whom the dimension could be measured.

A two-factor ANOVA test was used to evaluate the main effects of sex and time period on skeletal dimensions, and to establish the presence of interactions between sex and time period.

As shown in Table 7.5DI, four of the five variables demonstrated statistically significant differences between males and females, while only two variables (FHD and PEB) demonstrated

459

statistically significant differences between the two time periods (Old Kingdom, P=0.021 and

Ptolemaic Period, P=0.003). As shown in Table 7.5D, the mean measurements for both FHD and PEB increased over time. There were no statistically significant interactions between sex and time period.

Table 7.5D: Results of a two-factor ANOVA test exploring interactions between sex differences and time period for five skeletal dimensions.

Variable (n) Levene’s Sex Time Sex*Time Bonferroni test test†

Time 1 Time 2 Male Female

sig. sig.

GO (84) OK PP 0.862 0.848

F 0.97 19.56 0.001 0.07

Sig. 0.409 <0.001 0.982 0.799

MW (91) OK PP 0.445 0.332

F 0.98 2.68 1.58 0.07

Sig. 0.407 0.105 0.212 0.791

DB (59) OK PP 0.222 0.775

F 1.79 22.83 0.27 1.00

Sig. 0.160 <0.001 0.606 0.322

FHD (108) OK PP 0.497 0.013*

F 0.78 143.78 5.47 2.33

Sig. 0.506 <0.001 0.021 0.130

PEB (95) OK PP 0.363 0.003*

F 0.16 129.24 9.02 3.48

Sig. 0.921 <0.001 0.003 0.065

*The mean difference is significant at the 0.05 level. †Levene’s test of the equality of error variances; tests the null hypothesis that the error variance is equal across groups. Design: intercept + sex + time + sex*time. GO, glabello- occipital length of cranium; MW, maximum width of cranium; DB, maximum bizygomatic breadth of cranium; FHD, femoral head diameter; PEB, proximal epiphyseal breadth of tibia; OK, Old Kingdom; PP, Ptolemaic Period; Sig., significance.

460