FORDISC AND THE DETERMINATION OF ANCESTRY FROM CRANIOMETRIC DATA

By

Marina Elliott

B.A., The University of British Columbia, 2005

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS

In

THE DEPARTMENT OF ARCHAEOLOGY

© Marina Elliott, 2008

SIMON FRASER UNIVERSITY

Summer 2008

All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without permission of the author. APPROVAL

Name: Marina Elliott Degree: Master of Arts Title of Thesis: FORDISC and the determination of ancestry from craniometric data

Examining Committee:

Chair: Catherine D'Andrea Graduate Program Chair

Mark Collard Senior Supervisor Associate Professor, Archaeology

Mark Skinner Supervisor Professor, Archaeology

Brian Chisholm Internal Examiner Senior Instructor, University of British Columbia

Date Defended/Approved:

ii SIMON FRASER UNIVERSITY LIBRARY

Declaration of Partial Copyright Licence

The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.

The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection (currently available to the public at the "Institutional Repository" link of the SFU Library website at: ) and, without changing the content, to translate the thesis/project or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.

The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.

It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.

Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.

While licensing SFU to permit the above uses, the author retains copyright in the thesis, project or extended essays, including the right to change the work for subsequent purposes, including editing and publishing the work in whole or in part, and licensing other parties, as the author may desire.

The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.

Simon Fraser University Library Burnaby, BC, Canada

Revised: Fall 2007 Abstract

FORDISC is a computer program designed to determine ancestry from human skeletal remains. It is widely used, yet its accuracy has been challenged. In this study, 200 specimens from one of FORDISC's reference samples are used to investigate four issues that are central to debate: (1) the inclusion of the source population in the reference sample, (2) the influence of sex, (3) the impact of variable number, and (4) the effect of different anatomical regions.

The results indicate that the source population must be present and the sex of the specimen known before FORDISC can provide an accurate determination of ancestry. Additionally, a determination will be successful only if more than 10 measurements pertaining to multiple anatomical regions are used. Even when these conditions are met, few determinations may be considered unambiguously correct. Overall, FORDISC performed below expectations and the results suggest that the program should be used cautiously.

Keywords: FORDISC; ancestry determination; cranial morphology; forensic identification; discriminant function analysis

Subject Terms: ; forensics; craniometry; ; human variation

iii Acknowledgements

This research could not have happened without the encouragement and assistance of many people. In particular, I would like to thank my supervisor, Dr. Mark Collard for his generous advice, support and patience throughout this process. In addition to all of his other duties and responsibilities, he always seemed to have time for my questions and concerns. I would also like to thank my committee members, Dr. Mark Skinner and Dr. Brian Chisholm, both of whom took precious time out of their summer schedules to read and provide feedback on this research.

In addition, I am extremely fortunate to have an excellent group of colleagues, friends and family members. I am especially grateful to Alan Cross, Mana Dembo, Kevan Edinborough, Luseadra McKerracher and the other members of the Laboratory of Biological Anthropology whose intelligence, curiosity and enthusiasm for their research inspired my own efforts. Many thanks also go to my friends and family for providing valuable comments, welcome distractions and incalculable kindnesses along the way. Although no words can truly express how lucky I am to have them, thanks also go to my parents - their example gives me something to strive for.

Finally, I would like to thank my husband, Robin Elliott. His writing and editing contributions were invaluable, as were his computer skills when things went awry. More importantly, his love, support, encouragement and apparently endless tolerance of my interests (academic and otherwise) are a constant source of wonder and admiration to me. I hope I have made him proud.

iv Table of Contents

Approval ii Abstract iii Acknowledgements iv Table of Contents v List of Tables vii List of Figures viii

1. Introduction 1 1.1. Aims and objectives 1 1.2. FORDISC and its applications 3 1.3. The FORDISC debate 6 1.4. Issues investigated 12 1.5. Outline of analyses 16

2. Materials and Methods 18 2.1. Data 18 2.2. Analyses 20

3. Results 28 3.1. Impact of including source population and specifying sex 28 3. 1. 1. Number ofcorrect assignments accepting all posterior and typicality probabilities 28 3.1.2. Number ofcorrect assignments using >0.5 posterior probability and >0.01 typicality probability 31 3.1.3. Number ofcorrect assignments using >0.8 posterior probability and >0.01 typicality probability 32 3. 1.4. Summary 33 3.2. Impact of variable number 34 3.2. 1 Number ofcorrect assignments accepting all posterior and typicality probabilities 34 3.2.2 Number ofcorrect assignments using >0.5 posterior probability and >0.01 typicality probability 36 3.2.3. Number ofcorrect assignments using >0.8 posterior probability and >0.01 typicality probability 38 3.2.4. Variable number and population differences 39 3.2.4.1. Number ofcorrect assignments accepting all posterior and typicality probabilities 39 3.2.4.2 Number ofcorrect classifications using >0.5 posterior probability and >0.01 typicality probability 44 3.2.4.3 Number ofcorrect classifications using >0.8 posterior probability and >0.01 typicality probability 46 3.2.5. Summary 48 3.3 Impact of cranial region 49 v 3.3. 1. Number of correct assignments accepting all posterior and typicality probabilities 49 3.3.2. Number of correct assignments using >0.5 posterior probability and >0.01 typicality probability 51 3.3.3. Number of correct assignments using >0.8 posterior probability and >0.01 typicality probability 53 3.3.4. Cranial region and population differences 54 3.3.4.1. Number ofcorrect assignments accepting all posterior and typicality probabilities 54 3.3.4.2. Number ofcorrect assignments using >0.5 posterior probability and >0.01 typicality probability 58 3.3.5. Summary 61

4. Discussion 62 4.1. Main findings 62 4.2. Implications for use of FORDISC 66 4.3. Future considerations 70

5. Conclusions 74

References 78 Appendix I 86 Appendix II 89 Appendix III 91

vi List of Tables

Table 1. Total number of test specimens correctly classified (n=200) 30

Table 2. Number of test specimens correctly classified using >0.5 posterior probability and >0.01 typicality probability (n=200) 32

Table 3. Number of test specimens correctly classified using >0.8 posterior probability and >0.01 typicality probability (n=200) 33

Table 4. Total number of test specimens correctly classified by variable number (n=200) 36

Table 5. Number of test specimens correctly classified using >0.5 posterior probability and >0.01 typicality probability (n=200) 38

Table 6. Number of test specimens correctly c1assi'fied using >0.8 posterior probability and >0.01 typicality probability (n=200) 39

Table 7. Results by population accepting all posterior and typicality probabilities (n=40) 42

Table 8. Results by population using >0.5 posterior probability and >0.01 typicality probability (n=40) 45

Table 9. Results by population using >0.8 posterior probability and >0.01 typicality probability (n=40) 47

Table 10.Total number of test specimens correctly classified (n=200) 50

Table 11. Number of test specimens correctly classified using >0.5 posterior probability and >0.01 typicality criteria (n=200) 52

Table 12. Number of test specimens correctly classified using >0.8 posterior probability and >0.01 typicality probability (n=200) 54

Table 13. Total results for each cranial region by population (n=40) 57

Table 14. Results for each cranial region using >0.5 posterior probability and >0.01 typicality probability (n=40) 60

Table 15: Range of posterior and typicality probabilities for correct and incorrect assignments by population 68

vii List of Figures

Figure 1: Genetic tree for 26 European populations 26

Figure 2: Genetic tree for 33 African populations 26

Figure 3: Genetic tree for 21 Asian populations 27

Figure 4: Genetic tree for 23 American populations 27

viii 1. Introduction

1.1. Aims and objectives

Determining ancestry from skeletonized human remains is an important task for bioarchaeologists and forensic anthropologists. As part of a biological profile, this information is used in a wide range of contexts, including the study of the movements and interactions of past populations, ancestral land claims, repatriation requests and the investigation of unlawful deaths and human rights violations (Cox et al. 2006).

Despite attempts to use other skeletal elements (e.g. Marino, 1997; Ballard,

1999; Holliday and Falsetti, 1999: Patriquin et al. 2002), the skull continues to be regarded as the most reliable area for determining ancestry (Bass 1995). As a result, both non-metric and metric techniques have been developed to effect ancestry determinations from the skull. The use of discrete traits, such as the presence or absence of shoveled incisors, is common. However non-metric characteristics are not exhaustive or always consistently defined and few standards exist for their collection (Buikstra and Ubelaker 1994). Additionally, non-metric methods have been challenged for being more susceptible to inter­ observer error (Corruccini 1974).

1 Due to their perceived objectivity and accuracy, metric assessments of the skull

have achieved wide acceptance for assessing ancestry from skeletal remains

(e.g. Giles and Elliot 1962). Furthermore, the development of statistical methods

and computer technologies to manipulate large datasets has contributed to the

widespread use of craniometric methods. In particular, user-friendly computer

software applications designed to make ancestry determinations quickly and

easily have become popular.

Currently, FORDISC (Jantz and Ousley 2005) is the leading computer program

for ancestry determination. Although it is widely used, its application to questions

of ancestry is not unproblematic and its accuracy and reliability have been

questioned (Fukuzawa and Maish 1997; Kosiba 2000; Belcher et al. 2002;

Leathers et al. 2002; Ubelaker et al. 2002; Williams et al. 2005; Hubbe and

Neves 2007). In response, FORDISC's developers argue that the program's

apparent failures are due to inappropriate use of the program and/or

interpretation of results (Freid et al. 2005). In particular, they warn against testing

individuals whose populations are not represented in the database. They also

claim that using too many variables reduces success.

Given the importance of determining ancestry from skeletonized remains and the confidence placed in FORDISC, there was a pressing need to resolve the issues that have been raised regarding its accuracy. Accordingly, this study focused on several key areas of debate. In particular, it evaluated the effect of including or

2 excluding the source population from the reference sample on FORDISC's

accuracy. It also examined how number of variables affects FORDISC's success

rate. To test the impact of constraining sex, test specimens were compared to

reference groups of both sexes and to same-sex groups alone. The effect of

using specific cranial regions on FORDISC's ability to determine ancestry was

also tested using datasets that isolated the basicranium, neurocranium and face.

1.2. FORDISC and its applications

Developed by Richard Jantz and Steve Ousley in association with the University

of Tennessee, FORDISC (short for Forensic Discriminants) was designed to

provide rapid and accurate ancestry determinations for crania of unknown origin

through Discriminant Function Analysis (DFA) of skull measurements. It also

offers ancestry and stature estimations from postcranial measurements. The

program was first commercially released in 1992. A second version followed in

1996. The current version, FORDISC 3.0, was released in 2005.

Before the program became publicly available, Jantz and Ousley provided custom discriminants for individual specimens by request (Jantz and Ousley

2005). These ancestry determinations were made through comparisons with the

Forensic Anthropology Data Bank (FDB), a repository of U.S. forensic cases from the 19th and 20th centuries. By the time FORDISC 1.0 was released the program incorporated a much larger database of craniometric measurements collected by

3 W.W. Howells (Howells 1973; 1989). Howells' dataset includes values for 70

measurements recorded on more than 2500 crania from 29 populations. The

populations come from Africa, Europe, Asia, Australia/Pacific Islands and the

Americas, and range in time from 600 B.C to the mid 20th century. Incorporating

Howells' dataset significantly broadened the geographic and temporal range of

FORDISC's comparative sample. Since the release of the first version of the

program, Jantz and Ousley have augmented the Forensic Data Bank with data

from new U.S. cases and added a sample of males taken from modern forensic

cases in Guatemala.

The first two versions of FORDISC offered ancestry determinations through DFA

of up to 21 cranial measurements. In the current version, users may select up to

82 measurements when using Howells' data or 42 when using the Forensic Data

Bank. However, Jantz and Ousley (2005) note that because some

measurements were not taken on some individuals, sample sizes are limited by

the measurements selected.

From its inception, FORDISC has been a popular tool among bioarchaeologists.

Shortly after the release of version 1.0, Mangold et al. (1993) used FORDISC to

perform a two group DFA of 21 cranial measurements to corroborate a qualitative

trait-based analysis of the sex of a set of pre-contact Native American skeletal

remains. In this study, Mangold et al. concluded that the results "strongly aligned the specimen with Amerindian females rather than males" (p. 2). In 2001,

4 Williams et al. (2001) used FORDISC 2.0 to explore the ancestry of several individuals buried in a German settler's graveyard in Halifax, Nova Scotia. The

FORDISC results led Williams et al. to conclude that the remains were "non­

European" and to involve the local Mi'kmaq chief in the investigation. FORDISC

2.0 was also used to assess 80 historical crania in the collections of the Institute of Forensic Medicine in Copenhagen (Sejrsen et al. 2005). Although many of the crania were marked only with "a general geographic or racial descriptor" (p. 40), the authors of the study claimed to confirm ettmicity in 70% of the cases.

FORDISC has also been used to analyze more ancient human remains. Lovvorn et al (1999) used FORDISC 2.0 to compare a male burial specimen from Sidney,

Nebraska with males from Howells' database. Using only six measurements because the midface and orbits were missing, FORDISC 2.0 selected "Eskimo" as the most likely population, followed by "Ainu" (Japan). Based on these results, the authors concluded that the specimen possessed a "blend of Amerindian and earlier protomongoloid traits" (p. 527) and that this was consistent with the

"hypothesis that Plains Amerindians descended from the earliest wave of

Paleoindians who crossed the Bering Straits" (p. 527). In another study,

Pleistocene remains from the site of Zhoukoudian (UC1 01, UC102 and UC103) were compared to a reference sample that combined Howells' data with data from additional Amerindian groups (Cunningham and Westcott 2002). The authors concluded that their results supported the contention that the remains

"do not represent a family but are relatively contemporaneous" (p. 636).

5 In addition to being used in historical research, FORDISC is used regularly to

assist with identifications in forensic cases. For example, in 2000, an Arlington

Cemetery press report described FORDISC as "a key piece of software" used by

the U.S. Army Central Identification Laboratory in Hawaii for "automating the

process of matching skeletal remains" (ANC 2007).

FORDISC has become sufficiently popular that Jantz and Ousley now run

workshops focusing on the program during the American Academy of Forensic

Sciences annual meetings (Anthropolog 2005). Designed to help anthropologists,

archaeologist and forensic professionals carry out and interpret the results of

FORDISC analyses, these workshops cover a variety of topics such as statistical

parameters, the estimation of ancestry from postcranial material, "problem"

crania, and secular change (Jantz and Ousley 2007).

1.3. The FORDISC debate

Despite its popularity, the utility of FORDISC for ancestry determination has been challenged. In 1997, Fukuzawa and Maish (1997) tested FORDISC 2.0 with 59 crania from two known Ontario lroquoian sites. Using both complete and partial crania, the authors compared lroquoian individuals with seven populations from

Howells' dataset and found FORDISC to be an unreliable identifier of ancestry.

Similarly, Kosiba (2000) tested a series of East Indian crania and found that

FORDISC 2.0 was unable to consistently classify the sample.

6 In 2002, two studies used ancient Nubian crania to test FORDISC 2.0. In the 'first,

Belcher et al. (2002) analyzed 47 Meroitic Nubians and found little consistency in either biological affinity or sex attribution. The authors concluded that the program was flawed and challenged "the utility of any forensic application that attempts to constrain worldwide human cranial variability" (p. 42). In the second study, Leathers et al. (2002) tested FORDISC 2.0 with a collection of post­

Meroitic Nubian crania using 12 cranial measurements. Only 57% of the 89 specimens were classified as African and the research team concluded that

FORDISC 2.0's classifications were "not morphologically or biologically accurate"

(p. 99).

A third evaluation of FORDISC 2.0 was published in 2002 (Ubelaker et al. 2002).

This study tested the program with a medieval Spanish sample. The authors reasoned that, if the program was accurate, the test specimens should be classified as one of the European populations in the reference sample. The study achieved "a variety of results" (p. 3). Using the Forensic Data Bank, FORDISC

2.0 classified 44% of the test sample as white, 35% as black, 9% as Hispanic,

4% as American Indian and 3% each as Chinese or Vietnamese. Using Howells' database, the 95 test individuals were classified into 21 different groups. 25% were classified as Egyptian, followed by Austrian with 11 %. The remaining specimens were scattered across 19 different populations ranging from Andaman

Islanders (7%) to Zulu (2%). Despite these diverse results, the authors concluded that FORDISC was still a "useful forensic tool" (p. 4).

7 In 2005, another study using Nubian crania was published by the same authors as the 2002 Meroitic paper (Williams et al. 2005). This study used 42 test specimens instead of the 47 used previously, and 12 variables based on their availability and diagnostic value. The authors reasoned that, if FORDISC was accurate, it would group the Nubians together and classify them as Howells' 26­

30th dynasty Egyptians since the two groups were geographic neighbours.

According to Williams et al. (2005), FORDISC "failed both tests" (p. 345).

FORDISC's developers did not respond to the various criticisms of the program until the Williams et al. (2005) study was published. At that point, they suggested that the "disputed results" were due to the use of "inappropriate reference samples" (Freid et al. 2005: 103). Citing the limitations of Discriminant Function

Analysis, Jantz and Ousley (2005: 16) noted that any function "will classify an unknown...regardless of its actual ethnic group" and cautioned against testing

"an individual whose race or ethnic group is not represented in the reference samples".

Jantz and Ousley (2005) also suggested that the critics had failed to properly evaluate the posterior and typicality probabilities provided by FORDISC. These probabilities are mathematical calculations used to evaluate the likelihood of group membership (Pietrusewsky 2000). Posterior probabilities are a relative measure of membership and sum to 1, while typicality probabilities assess

"whether the unknown individual could belong to any of the groups" based on the

8 absolute distances to each group (Albanese and Saunders 2006: 287). Jantz and

Ousley (2005) recommended that a population attribution be accepted only if the

posterior probabilities were 0.5 or more and the typicality probabilities higher than

The debate was not settled at this point, however. In 2007 Current Anthropology

published a discussion of FORDISC 2.0. In a reassessment of the Williams et al.

(2005) paper, Hubbe and Neves (2007) suggested that the study was flawed

because it had used only 12 variables, a number they considered to be "far from

enough to classify a skull on the basis of discriminant functions" (p. 285). In

response, Williams and Armelagos (2007) pointed out that FORDISC tutorials

frequently use 12 or fewer variables and that "there is no stipulated number of

variables ... simply because forensic evidence is often fragmentary" (p. 286).

They also pointed out that more variables would not necessarily improve success

if the measurements are collinear or non-diagnostic.

Williams and Armelagos (2007) also criticized Hubbe and Neves (2007) for using

Howells data as both test and control. They suggested that an 'independent'

sample - one whose individuals were not speci'f1cally included in the database -

should have been used and cited Naar et al. (2006) as an example of such a

I In the Freid et al. (2005) paper, Jantz and Ousley recommend accepting a determination only if the typicality probability is 0.1 or more. However, this appears to be a typographical error. The FORDISC 3.0 manual suggests that typicality probabilities are "interpretively similar to the univariate p value based on the normal distribution" (Jantz and Ousley, 2005: 48) and that "TPs below 0.05 (5%), or certainly 0.01 (1 %) for a group...indicate questionable membership...or measurement error" (Jantz and Ousley, 2005: 46). These comments lead me to believe that 0.01 is the acceptable typicality probability value rather than 0.1. 9 study. However, Naar et al. (2006) also used Howells' data, specifically the 111 crania that make up the entire Egyptian sample. FORDISC only placed 55

(49.5%) of the sample back into the Egyptian group at the appropriate statistical level. While the use of individuals from within the database may not provide an independent test of the program, doing so should result in high levels of success because the individuals already exist in the reference sample. A failure on

FORDISC's part to attribute members of its core sample appropriately would suggest a significant problem with the program.

In the next issue of Current Anthropology, Williams, Belcher and Armelagos

(2007) replied to another critique of the 2005 study. In this discussion, Keita

(2007) suggested that Williams et al. (2005) had overemphasized the role of non­ genetic factors in cranial development and noted that a "demonstration of similarity using multivariate analyses does not always mean identity, close recent common origin, or even origin in an adjacent region" (p. 425) Williams and

Armelagos (2007) responded by saying they had been criticized "for a paper that we did not write" (p. 426) and that Keita had misunderstood their intent in highlighting the conditions of growth. They stressed that their previous study had been undertaken to demonstrate the "lack of fit between conceptual models... and actual patterns of human biological variation" (p. 426) and maintained the position that FORDISC is both functionally and conceptually flawed due to the complexity of this variation.

10 Most recently, Campbell and Armelagos (2007) used a new individual scores option in FORDISC to test samples taken from within both the W.W. Howells and

Forensic Databank reference groups. In this study, FORDISC was able to correctly classify 73.1 % of Howells' individuals and 72.0% of the FDB individuals using the Freid et al. (2005) probability levels when the sex was unspecified.

When sex was constrained, the results improved to 80.7% and 78.6% respectively. Although Armelagos had previously contributed to almost every study that challenged FORDISC and been a vocal opponent of the program

(Belcher et al. 2002; Leathers et al. 2002; Williams et al. 2005; Naar et al. 2006),

Campbell and Armelagos (2007) did not suggest that the program was flawed.

Instead, they concluded that the results achieved by FORDISC were

"approaching the limit of craniometric analysis to assign group membership" (p.

84).

Last, Jantz and Ousley have suggested that secular change may be responsible for FORDISC's inconsistent performance in some cases. In particular, they suggest that Americans (both "White" and "Black") have changed significantly over the past 150 years in "response to unparalleled environmental change"

(Wescott and Jantz 2005: 242). As a result, they recommend that the Forensic

Data Bank should only be used "on individuals born in the 20th century" while

Howells data "may be more appropriate for older specimens" (Jantz and Ousley

2005:17). Certainly secular changes have been well documented (Boas 1911;

11 Angel 1976; Smith et al. 1986; Jantz and Meadows-Jantz 2000). However, the extent to which it complicates ancestry determination is not well understood.

In sum, there are a number of unresolved issues with respect to FORDISC. In particular, the significance of testing individuals whose populations are not represented in the reference sample has still not been determined. There are also inconsistencies with respect to how specifying sex affects FORDISC's accuracy. The guidelines for determining which variables are the most effective and how many to use, are also unclear. Lastly, the recommendations for accepting an attribution based on the posterior and typicality probabilities differ in the FORDISC literature. While the manual still recommends using a posterior probability of 0.5, the FORDISC 3.0 workshops run by Jantz and Ousley now suggest that "posterior probabilities <0.8 have a higher probability of being incorrect than correct" (Jantz and Ousley 2007: 33). Since FORDISC continues to be used regularly in biological anthropology and forensic settings, the study reported here was undertaken to contribute to the resolution of these important questions.

1.4. Issues investigated

The first issue addressed in this study is the impact of the presence or absence of a specimen's source population in FORDISC's reference sample. As mentioned above, a number of researchers have sought to test FORDISC by 12 analyzing specimens of known origin. FORDISC's developers have rejected most of these tests on the grounds that the test specimens' source populations were not included in FORDISC's reference sample. It is true that DFA "require[s] in principle that unknowns belong to one of the groups in the analysis from which the functions were derived" (Keita 2007: 425). However, biodistance research is based on a close relationship between morphology and ancestry. As Roseman

(2004: 12824) notes, biodistance studies assume that "populations that share recent common ancestry and or exchange a large number of migrants should resemble one another more than geographically isolated and distantly related populations". Thus, ifthere were no relationship between craniometries and ancestry, Jantz and Ousley could not continue to claim that FORDISC will classify individual crania "into the group with which they have the closest affinity"

(Spradley et al. 2008). Furthermore, in our increasingly mobile society "a representative of almost any population in the world could end up being a forensic case in almost any place in the world" (Ubelaker et al. 2002: 2).

Consequently, a program that requires an investigator to determine which populations are represented before running an analysis may have very limited application for real-world investigations.

The second issue investigated in the study is the effect of specifying the sex of a target specimen versus leaving its sex unspecified. Several studies found differences in affinity attribution when the sex was altered (Belcher et al. 2002;

Williams et al. 2005; Campbell and Armelagos 2007). By comparing a test

13 specimen to both males and females, these studies expected FORDISC to correctly identify both population and sex on the basis that male and female of a given population are more similar to each other than either is to another population (Williams et al. 2005). With this in mind, this study tested whether the population attribution changed when the sex was unspecified versus when it was restricted to the sex provided by Howells (sex specified).

The third issue addressed in this study is the impact of number of variables on

FORDISC's accuracy. While Jantz and Ousley (2005: 44) admit that "good separation and classification of many groups requires many variables" they also argue that "using too many variables produces overfitting and unreliable apparent accuracy". Similarly, Williams and Armelagos (2007: 286) suggest that using

"additional variables that are collinear or that are not diagnostic may reduce the efficacy of classification." In contrast, Hubbe and Neves (2007: 285) found that

"the number of variables used rather than the anatomical region measured" was the most critical factor affecting FORDISC's discriminant ability. Although there is little consensus as to what constitutes a "sufficiently" large number of variables in a multivariate analysis (Peitrusewsky 2000), Jantz and Ousley (2005: 49) suggest that a "reasonable recommended maximum number of variables seems to be the minimum sample size among all groups divided by three". This is based on Huberty's (1994) results. Although they suggest fewer variables may be effective, as a minimum, Jantz and Ousley recommend "10 variables for reliable comparisons" (2005: 49).

14 A fourth factor that may be contributing to FORDISC's inconsistent performance relates to the anatomical region analyzed. While most researchers recognize that all morphology is the result of combined genetic, developmental, and environmental factors, cranial morphology has been considered a reasonable proxy for geographic origin. This is particularly true of the facial region, with the midface and nose considered the most diagnostic of ancestry (Brues 1990; Gill and Gilbert 1990). However, many studies have shown the face to be particularly susceptible to external stresses related to diet, conditions of growth, cultural practices and/or climatic adaptations (Coon et al. 1950; Hiernaux 1963; Hughes

1968; Hylander 1977; Carey and Steegmann 1981; Franciscus and Long 1991;

Skelton and McHenry, 1992; Lieberman et al. 2004; Roseman 2004; Roseman and Weaver 2004; Nicholson and Harvati 2006). As a result, facial anatomy may not preserve population history adequately. Instead, the basicranium has been put forward as a better indicator of ancestry because it is more phylogenetically stable (Olson, 1981; Wood and Lieberman 2001, Harvati and Weaver 2006b).

And although it may be subject to climatic inlluences as well (Beals et al. 1983), the neurocranium has also been suggested as a reasonable proxy for population history (Roseman 2004). In light of these issues, and the fact that more than 50% of FORDISC's measurements relate to the face, it was deemed important to determine how anatomical region affected the program's success rate.

The fourth issue investigated in the study is the effect of specifying the sex of a target specimen versus leaving its sex unspecified. Several studies have found

15 differences in affinity attribution when the sex was altered (Belcher et al. 2002;

Williams et al. 2005, Campbell and Armelagos 2007). By comparing a test specimen to both males and females, these studies expected FORDISC to correctly identify both population and sex on the basis that male and female skulls of a given population are more similar to each other than either is to another population (Williams et al. 2005). With this in mind, this study tested whether the population attribution changed when the sex was unspecified versus when it was restricted to the sex provided by Howells (sex specified).

1.5. Outline ofanalyses

All analyses were conducted on individuals taken directly from the Howells reference sample employed by FORDISC. These individuals were only analyzed against the Howells reference groups and not against the Forensic Databank samples. This was done to address the question of using members whose populations are not represented in the database and to give FORDISC the greatest opportunity for success. As mentioned in the Introduction, there is disagreement as to whether or not this is an appropriate test of the program's accuracy in attributing affinity to unknown remains (Hubbe and Neves 2007;

Williams & Armelagos 2007). However, because the test individuals are part of the reference sample, if the program functions correctly, it should successfully place the majority with their source population.

16 To determine the effect of using an individual whose population was not represented in the database, all analyses were run once with the source population included and once with it excluded. To test the effect of using different numbers of variables, analyses included variable sets that included the maximum number of variables common to all groups (56) and the minimum number recommended by FORDI8C (10). To assess the relative success of using different anatomical regions on FORDI8C's ability to identify ancestry, the measurements were divided into sets of basicranial, neurocranial and facial variables. Lastly, to test the effect of sex selection, analyses used both sexes as well as the appropriate sex for the test individual. For all analyses, the results were calculated three times: once with no probability or typicality limitations, once with 0.5 posterior probability and 0.01 typicality probability values, and once with a more strict 0.8 posterior probability criterion.

Given the above, the following results were expected. Using individuals whose populations were represented in the database would result in high numbers of correct returns for all analyses. FORDI8C was expected to be able to classify individuals using either 56 or 10 variables. With the source population excluded,

FORDI8C was expected to place test individuals into a closely related group as determined by genetic and linguistic data.

In general, if sex is not a confounding factor, the sex-unspecified (8U) and sex­ specified (88) analyses should return similar results, but practically, the results

17 for SS could be expected to be better as the number of groups in the comparison is reduced.

With respect to variable number, FORDISC was expected to classify the most number of test specimens correctly using the 56-variable dataset. Following

Hubbe and Neves (2007), more variables should provide better discrimination power. At worst, adding more variables would simply fail to improve discrimination and result in a plateau effect.

For the anatomical regions, if cranial morphology tracks population history then the basicranium should produce the best results (Olson 1981; Wood and

Lieberman 2001; Harvati and Weaver 2006). Although it is still not clear whether the neurocranium relates more closely to climate or to population history (Beals et al. 1983; Roseman 2004), on the basis of Harvati and Weaver's later work

(2006b), FORDISC was expected to return fewer correct assignments using the neurocranial variable set than the basicranium. Because studies have shown the face to be the most susceptible to external stresses, the facial variables were expected to be the least accurate. If, however, cranial morphology correlates with a factor other than genetic history, then these predictions would not be supported.

18 2. Materials and Methods

2.1. Data

The craniometric data used in this study were collected by William Howells between 1965 and 1980 (Howells 1996). Howells published the data in a series of monographs (1973; 1989; 1995) and also made them available upon request and via the internet. Although the dataset does not cover certain areas (e.g.,

Indian subcontinent), and the sample sizes for some groups are small (e.g., 29 males and 18 females for Taiwanese Atayal), it is the most comprehensive and accessible collection of human craniometric data available. As noted earlier, it also forms the bulk of the reference sample for FORDISC.

The version of Howells' dataset used in this study consists of values for 74 linear measurements and angles recorded on 2504 crania from 28 populations representing five geographic regions: Europe, Africa, East Asia, Australia-Pacific, and the Americas. In an effort to maintain equal sample sizes, Howells tried to select 50-55 males and females for each of his 28 populations. Although some groups were deficient in this number, most were reasonably close. Details of the measurements and angles are given in Appendix 1. The names, geographic locations and sample sizes of the populations are presented in Appendix II.

Although some of the names Howells and FORDISC use for the groups in the

18 reference sample are no longer considered appropriate, the designations were maintained to avoid confusion.

The test sample consisted of 200 individuals taken directly from Howells' dataset:

20 males and 20 females from one population in each of the major geographic areas. The five populations from which the test sample was drawn are the Berg

(Europe), Zulu (Africa), Hokkaido Japanese (East Asia), Tasmanians (Australia­

Pacific) and Santa Cruz (Americas). These groups were chosen because their sample sizes were relatively large (32-56, mean 48) and related populations were available within the FORDISC reference sample. Test individuals were not compared to the Forensic Databank groups as they are not included in that reference sample.

To evaluate the impact of variable number and cranial region on the accuracy of ancestry determination in FORDISC, four datasets were created for each test individual. Hereinafter, these will be referred to as the whole cranium dataset, the basicranium dataset, the neurocranium dataset and the face dataset. Appendix

III lists the variables used to create the four datasets.

The whole cranium dataset was based on the 56 variables that are common to all groups represented in Howells' dataset. The complete set of 74 variables was not employed because Jantz and Ousley (2005: 7) suggest that using measurements that are not common to all groups "will limit sample sizes

19 somewhat". The 56 variables used in the whole cranium dataset were selected

with the aid of FORDISC 3.0's select all variables function.

The basicranium, neurocranium and face datasets were each based on 10

variables. Landmarks employed by Roseman (2004), Harvati and Weaver

(2006), and Hubbe and Neves (2007) were used to divide Howells' variables into

cranial region-specific groups. Of all of the measurements available for

conducting an analysis in FORDISC, 10 were associated with the basicranium,

14 related solely to the neurocranium, and 42 were face-specific. However, to

ensure consistency, each set needed to include the same number of variables.

Since the basicranium was represented by only 10 measurements, all of the

available basicranial measurements were used while 10 measurements for each

of the neurocranium and face datasets were randomly selected from their

respective totals.

2.2. Analyses

Each dataset was subjected to four analyses. In the first, the source population

was included in the reference sample and the test individual was compared to

both males and females of all available populations (population included/both

sexes). The source population was also included in the reference sample in the second analysis but the test individual was only compared to specimens of the

relevant sex (population included/same sex). In the third analysis, the source

20 population was excluded from the reference sample and the test individual was compared to both males and females (population excluded/both sexes). In the fourth, the source population was excluded from the reference sample and the test individual was only compared to specimens of the relevant sex (population excluded/same sex).

Analyses were conducted with and without the source population included because of the disagreement regarding how FORDISC should be applied. As mentioned in the introduction, several studies have tested FORDISC's accuracy using specimens whose source populations were not present in the reference sample (e.g., Williams et al. 2005). The researchers responsible for these studies argue that FORDISC should assign a test specimen to a closely related population in the reference sample in the absence of the source population.

However, Jantz and Ousley (2005) contend that FORDISC should only be used on a specimen if its population is represented in the reference sample.

'Both sexes' and 'same sex' analyses were carried out to control for the potentially confounding effects of sexual dimorphism. When the test specimen was compared only to reference specimens of the same sex, the select al/ males or select al/ females function was used in FORDISC 3.0. The sex of the test specimen was taken from the "sex" column in the Howells dataset.

21 With the exception of the source population excluded analyses, test specimens were compared with all available groups. This was done because of confusion regarding how many groups to use in an analysis. While Jantz and Ousley (2005) acknowledge that "discriminant analyses should be run initially using all possible groups that an unknown may classify into" (p. 44), they also suggest that using two to five groups will be "more accurate than those involving many more groups"

(ibid). To achieve this improved accuracy, they suggest identifying the groups with the lowest typicality probabilities and removing them after repeated runs.

However, they admit that typicality probabilities "are by no means foolproof'

(Jantz and Ousley 2005: 16) and do not clarify how many groups or runs are sufficient. Furthermore, the presence or absence of a group in a particular region cannot be assumed a priori. Nor, as Keita (2007) points out, can one "every really know if an individual's origin population is actually represented" (p. 425).

Overall, the arguments for limiting the number of groups were judged to be insufficient to justify reducing the number of comparative sample groups in this study.

To identify the closest relative of a test population, published genetic and linguistic were consulted. The best match was then chosen from the populations available in Howells' dataset. These were the Norse (Europe) for the Berg, the

Teita (Africa) for the Zulu, the Kyushu (East Asia) for the Hokkaido Japanese, the

Yauyos (Americas) for the Santa Cruz, and the mainland Australian Aborigines

22 for the Tasmanians (Australia-Pacific). These groups were selected for the

following reasons:

1. Norse and Berg. As the ancestors of present day Nordic populations, the

Norse are most closely related to Norwegians and Swedish and are the

nearest genetic match in the database for FORDISC's Berg (Austria)

group over the more distantly related Zalavar (Hungary) group (Figure 1)

(Cavalli-Sforza et al. 1994).

2. Teita and Zulu. 'Teita' is a disused name for a North-Eastern Bantu

speaking people of Kenya (Kitson 1931). They share genetic and

linguistic ties with the Zulu, a South-Central Bantu speaking group

(Bendor-Samuel and Hartell 1989). Although the Bushmen (San) tribes

are geographically closer to the Zulu, research shows them to be both

genetically and linguistically more distant from the Zulu than are the Teita

(Figure 2) (Cavalli-Sforza et al. 1994; Knight et al. 2003).

3. Kyushu and Hokkaido Japanese. Cavalli-Sforza et al. (1994) consider

the Kyushu to be an outlier among the Japanese groups (Figure 3).

However, they are genetically closer to the Hokkaido Japanese than the

other East Asian groups in the FORDISC sample, the Ainu and the

Anyang (Omoto and Saitou 1997).

23 4. Yauyos and Santa Cruz. The indigenous groups of the Yauyos District in

Peru speak Quechua, a dialect in the Andean language group (Kaestle

and Smith 2001). Figure 4 shows Andean speakers as closest to those

who speak Penutian, the language of the Santa Cruz Amerindians

(Cavalli-Sforza et al. 1994). Although the Arikara are geographically

closer to the Santa Cruz Amerindians than the Peruvians, they are

Caddoan speakers in a more distantly related Keresiouan language group

(Campbell 1997).

5. Mainland Australian Aborigines and Tasmanians. The exact timing of

the first migration of humans into Sahul - the Pleistocene landmass that

once connected New Guinea, Australia and Tasmania - is still being

debated (Hudjashov et al. 2007; Redd and Stoneking 1999; Webb and " Rindos 1997). However, the current consensus is that humans colonized

Sahul between 50,000 and 40,000 years ago (Walsh and Eckhoff 2007).

Radiocarbon dates of multiple sites suggest that Tasmania may have

been settled as early as 35,000 years ago (O'Connell and Allen, 1998),

which implies a prolonged period of genetic exchange with other Sahul

migrants until -12,000 years ago, when rising sea levels cut Tasmania off

from mainland Australia (Redd and Stoneking 1999). As such, the

mainland Australian Aborigines were considered to be the closest match

for the Tasmanian group in the FORDISC sample.

24 To score the results for the analyses that included the source population, an assignment was considered 'correct' if FORDISC chose the test individual's own population as the most likely population. For the analyses that excluded the source population, an assignment was 'correct' when FORDISC selected the population most closely related to the test individual's source population.

As noted earlier, several combinations of acceptable posterior and typicality probabilities have been proposed. To reiterate, Jantz and Ousley (2005) recommended that determinations should be accepted only if the posterior probability exceeds 0.5 and the typicality probability exceeds 0.01. Later they suggested that determinations with posterior probabilities less than 0.8 are more likely to be incorrect than correct (Jantz and Ousley 2007). With this lack of consensus in mind, the number of correctly classified specimens was calculated three times: once by accepting all posterior probabilities and typicality probabilities, once by accepting a determination if the posterior probability was

>0.5 and typicality probability >0.01, and once using a posterior probability >0.8 and typicality probability >0.01. FORDISC 3.0 provides three typicality values:

'ranked', 'F' and 'Chi'. The FORDISC 3.0 manual suggests that ranked or 'R' typicalities are the most reliable since they do not require multivariate normality.

In contrast, the 'F' ratio typicality can be artificially inflated "as the number of variables approaches a group's sample size" and the Chi-square typicality probabilities "tend to call more individuals atypical than F typicality probabilities"

(Jantz and Ousley 2005: 54). Accordingly, the R typicality values were used.

25 Figure 1: Genetic tree for 26 European populations (Cavalli-Sforza et al. 1994: 268)

Dutch Danish English Swiss German Belgian Austrian French Swedish Norwegian Czechoslovakian Portuguese Italian Spanish Hungarian Polish Russian SCottish Irish Finnish Icelandic Basque Yugoslavian L- Greek '------Sardinian '------Lapp

0.04 0.03 0.02 0.01 o ...... L ..L...... L ...I' Genetic Distance

Figure 2: Genetic tree for 33 African populations (Cavalli-Sforza et al. 1994: 169)

....--- PygmoId ! 3.8.1 ,--__ Bantu, N.E. I 3.0.1 '-- Bantu, C.E. 3.0.1 Bantu,S.W. 3.0.1 Bantu,C.W. 3.0.1 Nllotlc 3.0.1 Kunama !I 3.5.3 Bantu. S.E. 3.0.1 Bantu, N.W. 3.0.1 '---- Ubanglan 3.0.1 Voila I 3.0.1 Ewe • 3.0.1 Gur 3.0.1 Mande 3.0.1 Kru 3.0.1 Yoruba 3.0.1 100 3.0.1 Fulanl 3.0.1 Hauss 3.0.1 Bane 3.0.1 L- Bedlk 3.0.1 l- Funji 3.5.3 3.5.3 ....-__ L--C=====:: 3.0.1 serer 3.0.1 ------1C=:= Wolof 3.0.1 .r Peul 3.0.1 L-_L- sendawe 3.5.3 l- Hadza 3.5.3 L 3.2.2 Somali 3.5.3 L. ------L==~======MbullKholsan 3.5.3 3ll.3

....~_t °--lc:s ~Genetic Dlsta~

26 Figure 3: Genetic tree for 21 Asian populations (Cavalli-Sforza et al. 1994: 231).

Turkoman Uzbek Turkish AItak: l s= North Chinese Nepalese Yakut Sherpa Twa Korean Japanese Ryukyu Southwest Honshu Honshu Kanto Honshu Chubu Kyushu Honshu Kinki Bhutanese TIbetan Ainu South Chinese 0.08 0.04 0 Genetic Distance

Figure 4: Genetic tree for 23 American populations (Cavalli-Sforza et al. 1994: 323)

USSR Eskimo Chukchi Koryak Reindeer Chukchi North Na-Dene Canadian Na-Dene Inupik Eskimo Greenland Eskimo Alaskan Eskimo Canadian Eskimo Macro-Panoan Sou1h Macro-Chibchan Andean Penutian Keresiouan North Central Amerind Macro-Carib Equatorial Central Macro-Chibchan Almosan Sou1h Na-Dene Macro-Ge Macro-Tucanoan

0.12 0.10 0.05 o I I I Genetic Distance

27 3. Results

3.1. Impact ofincluding source population and specifying sex

This section addresses the impact of population inclusion and sex selection on

FORDISC's success rate. As discussed, debate surrounds whether or not a specimen should be tested with FORDISC if its source population is not represented in the reference database. Similarly, although some studies have found inconsistencies in classification between using both sexes and using only the relevant sex, the effect of sex selection on FORDISC's success rate has not been fully tested.

3. 1. 1. Number ofcorrect assignments accepting all posterior and typicality probabilities. The totals for the number of correct assignments for each set of analyses are given in table 1. In the analyses of the whole cranium dataset, the best test results were obtained when the source population was included and the sex specified (i.e. only relevant sex included in reference sample). In this analysis, 88.5% of the test specimens were assigned correctly. The next-best results were obtained when the source population was included and the sex unspeci'F1ed (Le. both sexes included in reference sample). In this case, 82.5% of the test specimens were assigned correctly. The third-best results were obtained when the source population was excluded and the sex specified. Here,

28 FORDISC correctly classified 39.5% of the test specimens. The worst results were obtained when the source population was excluded and sex unspecified.

This analysis returned 36.5% correct classifications.

The results of the analyses using the basicranium dataset followed the same pattern as those for the whole cranium dataset. Again, the best results were achieved when the source population was included and the sex specified. In this analysis, FORDISC correctly assigned 33.5% of the test specimens. With the source population included and the sex unspecified, FORDISC assigned 22.5% of the test sample correctly. With the source population removed, FORDISC assigned more test specimens correctly with the sex specified than with it unspeci'fied. In this case, 10.5% of the test sample was correctly classified with the sex specified compared to 8.5% with the sex unspecified.

The results of the analyses using the neurocranium dataset also followed the same pattern as those for the whole cranium dataset. The best result (48.5%) was achieved with the source population included and the sex specified. The next-best result occurred when the source population was included and the sex unspecified. In this case, FORDISC classified 41.5% of the test sample correctly.

When the source population was excluded, FORDISC assigned 23.0% of the test sample correctly with the sex specified and 16.0% with the sex unspecified.

29 The results of the analyses using the face dataset followed the same pattern as

the results for the previous three sets of analyses. As before, the best result was

achieved when the source population was included and the sex specified. In this

analysis, FORDISC assigned 41.5% of the test sample correctly. When the sex

was unspecified, it assigned 34.0% correctly. With the source population

excluded, FORDISC assigned 23.0% of the test sample correctly with the sex

specified and 15.0% with the sex unspecified.

Table 1. Total number of test specimens correctly classified2 (n=200)

Dataset 15U 155 E5U E55

165 177 73 79 Whole cranium (82.5%) (88.5%) (36.5%) (39.5%)

45 67 17 21 Basicranium (22.5%) (33.5%) (8.5%) (10.5%)

83 97 32 46 Neurocranium (41.5%) (48.5%) (16.0%) (23.0%)

68 83 30 46 Face (34.0%) (41.5%) (15.0%) (23.0%)

2 All tables use the following format: ISU = source population included, sex unspecified. ISS = source population included, sex specified. ESU = source population excluded, sex unspecified. ESS =source population excluded, sex specified. Bold cell indicates the variable set with the highest success rate for each population. Upper value in cell is the number of test specimens correctly classified. The value in parentheses is the percentage of test specimens correctly classified.

30 In all four sets of analyses, then, markedly more individuals were correctly c1assi'f1ed when the source population was included than when it was excluded.

More individuals were also correctly classified when comparisons were limited to specimens of the same sex rather than when the test specimens were compared to both males and females.

3.1.2. Number ofcorrect assignments using >0.5 posterior probability and

>0.01 typicality probability. Table 2 shows the scores recalculated based on these criteria. As before, FORDISC achieved the best results using the whole cranium dataset with the source population included and the sex specified. This was followed by the source population included and sex unspecified results.

Third-best results for the whole cranium were achieved when the source population was excluded and the sex specified. FORDISC consistently returned the least number of correct assignments using the whole cranium when the source population was excluded and the sex unspecified. As with the total number of correct assignments, the results for the three other datasets followed the same pattern.

31 Table 2. Number of test specimens correctly classified using >0.5 posterior probability and >0.01 typicality probability (n=200)

Dataset ISU ISS ESU ESS

160 171 49 54 Whole cranium (80.0%) (85.5%) (24.5%) (27.0%)

4 10 2 4 Basicranium (2.0%) (5.0%) (1.0%) (2.0%)

34 54 5 16 Neurocranium (17.0%) (27.0%) (2.5%) (8.0%)

29 44 6 14 Face (14.5%) (22.0%) (3.0%) (7.0%)

3.1.3. Number ofcorrect assignments using >0.8 posterior

probability and >0.01 typicality probability. Once again, for each dataset,

FORDISC classified the highest number of test specimens correctly when the

source population was included and the sex specified (Table 3). The next-best

results were achieved with the source population included and the sex

unspecified. In the case of the basicranium dataset, the sex-unspecified and

specified analyses returned the same number (0.5%) of correct classifications.

The population-excluded results followed a similar pattern. FORDISC achieved

better success when the sex was specified than when it was unspecified.

However, it is worth noting that, with the source population excluded, FORDISC

failed to classify a single individual out of 200 in three cases.

32 Overall, the numbers of correct assignments were low at the 0.8 posterior probability level. With one exception, the best results were obtained when the source population was included and the sex specified. The worst results occurred when the source population was excluded and the sex unspecified.

Table 3. Number of test specimens correctly classified using >0.8 posterior probability and >0.01 typicality probability (n=200)

Dataset ISU ISS ESU ESS

139 156 24 38 Whole cranium (69.5%) (78.0%) (12.0%) (19.0%)

1 1 0 0 Basicranium (0.5%) (0.5%) (0.0%) (0.0%)

9 26 1 4 Neurocranium (4.5%) (13.0%) (0.5%) (2.0%)

5 13 0 2 Face (2.5%) (6.5%) (0.0%) (1.0%)

3.1.4. Summary. Regardless of the criteria used to assess the results, many more test specimens were correctly classified when the source population was included than when it was excluded. Better results were also obtained when a test specimen was compared only to reference specimens of the same sex, rather than to both sexes. It is important to note here however, that the numbers of correct classifications were extremely low in most analyses. Even when all posterior and typicality probabilities were accepted, FORDISC achieved no better

33 than a 48.5% success rate in the majority of analyses. Only in the whole­ cranium, population-included analyses were more than half of the test specimens correctly c1assi'fied.

3.2. Impact of variable number

This section considers the effect of variable number on FORDISC's classification rate. To reiterate, although studies have shown more variables to be more effective in discriminating among groups, Jantz and Ousley (2005) maintain that using large numbers of variables with FORDISC produces poor results due to a phenomenon they refer to as 'overfitting'.

3.2.1 Number of correct assignments accepting all posterior and typicality probabilities. The total number of correct assignments for each set of variables is given in table 4. For the population-included, sex-unspecified analyses, the 56­ variable whole cranium dataset returned the greatest number of correct classifications. Here, FORDISC correctly assigned 82.5% of the test individuals to the appropriate population. This was followed by the 10-variable neurocranium dataset which returned 41.5% test specimens correctly. The face and basicranium datasets returned 34.0% and 22.5% correct assignments, respectively.

34 The results were similar for the population-included, sex-specified analyses.

Again, FORDISC was the most successful with the 56-variable dataset, correctly assigning 88.5% of the test individuals to the appropriate population. The 10­ variable neurocranium dataset achieved the next-best result with 48.5%, followed by the face with 41.5% and the basicranium dataset with 33.5%.

The results for the analyses that excluded the source population were similar to the population-included results. With the sex unspecified, FORDISC classified

36.5% of the test specimens with the most closely related population using the

56-variable dataset, compared to 16.0%,15.0% and 8.5% for the 10-variable datasets (neurocranium, face and basicranium, respectively).

As with the other analyses, when the source population was excluded and only the same-sex reference groups used, the best result was achieved using the 56­ variable dataset. In this case, FORDISC assigned 39.5% of the test sample to the most closely related population. Unlike the previous results, however, the next-best result was shared by the neurocranium and face. Using each of these datasets, FORDISC classified 23.0% of the test specimens correctly. In keeping with the other analyses, the basicranium dataset returned the fewest number of correct classifications, assigning 10.5% of the test specimens to the most closely related population.

35 Thus, regardless of other factors, FORDISC returned signi'f1cantly more correct

classi'f1cations using the 56-variable whole-cranium dataset than with any of the

1O-variable datasets.

Table 4. Total number of test specimens correctly classified by variable number (n=200)

56 variables 10 variables 10 variables 10 variables Analysis (cranium) (basicranium) (neurocranium) (face)

165 45 83 68 ISU (82.5%) (22.5%) (41.5%) (34.0%)

177 67 97 83 ISS (88.5%) (33.5%) (48.5%) (41.5%)

73 17 32 30 ESU (36.5%) (8.5%) (16.0%) (15.0%)

79 21 46 46 ESS (39.5%) (10.5%) (23.0%) (23.0%)

3.2.2 Number of correct assignments using >0.5 posterior probability and

>0.01 typicality probability. Table 5 shows the recalculated scores based on 0.5

posterior probability and 0.01 typicality values for the four sets of analyses. The

impact of number of variables on the number of correct classifications was

heightened when the recommended probability and typicality values were taken

into account.

36 For the population-included, sex-unspecified analyses, the best results were achieved when FORDISC used 56 variables. In this analysis, 80.0% of the test specimens were correctly classified. Of the 1O-variable datasets, the neurocranium returned the next-best result (17.0%), followed by the face dataset

(14.5%). The basicranium achieved the poorest result, with only four test individuals (2.0%) correctly assigned.

The results were similar for the population-included, sex-specified analyses.

FORDISC correctly classified 85.5% of the test population using 56 variables as opposed to 27.0%, 22.0% and 5.0% for the 10-variable neurocranium, face and basicranium datasets respectively.

Like the population-included analyses, the best results for the population­ excluded, sex-unspecified analyses were obtained using the whole cranium dataset. Here, FORDISC assigned 24.5% of the test individuals to the most closely related group. Surprisingly, the next-best results used the face dataset rather than the neurocranium. In this case, 3.0% of the test specimens were correctly classified using the face variables versus 2.5% using the neurocranium.

Once again, however, the basicranium dataset achieved the poorest results with

FORDISC placing only two individuals (1.0%) with their related population.

In the population-excluded, sex-specified analyses, the best results were obtained using the 56-variable whole-cranium dataset. Here, FORDISC correctly

37 classified 27.0% of the test sample to the most closely related group. The pattern for the 1O-variable datasets followed the first two sets of analyses: the neurocranium returned the next-best result with 8.0%, followed by the face

(7.0%) and basicranium (2.0%).

Although the number of correct assignments fell significantly when the recommended posterior probability and typicality values were used, the 56- variable dataset continued to achieve considerably better results than the three

1O-variable datasets.

Table 5. Number of test specimens correctly classified using >0.5 posterior probability and >0.01 typicality probability (n=200)

Analysis 56 variables 10 variables 10 variables 10 variables (cranium) (basicranium) (neurocranium) (face) 160 4 34 29 ISU (80.0%) (2.0%) (17.0%) (14.5%)

171 10 54 44 ISS (85.5%) (5.0%) (27.0%) (22.0%)

49 2 5 6 ESU (24.5%) (1.0%) (2.5%) (3.0%)

54 4 16 14 ESS (27.0%) (2.0%) (8.0%) (7.0%)

3.2.3. Number of correct assignments using >0.8 posterior probability and

>0.01 typicality probability. Using the 0.8 posterior probability criterion, the number of correct assignments fell again. Table 6 summarizes these results.

38 Once again, for all analyses, the best results were achieved using the 56-variable whole-cranium dataset. This was followed by the neurocranium, face and basicranium datasets. However, the 10-variable datasets performed poorly in general, especially when the source population was excluded.

Table 6. Number of test specimens correctly classified using >0.8 posterior probability and >0.01 typicality probability (n=200)

56 variables 10 variables 10 variables 10 variables Analysis (cranium) (basicranium) (neurocranium) (face)

139 1 9 5 15U (69.5%) (0.5%) (4.5%) (2.5%)

156 1 26 13 155 (78.0%) (0.5%) (13.0%) (6.5%)

24 0 1 0 E5U (12.0%) (0.0%) (0.5%) (0.0%)

38 0 4 2 E55 (19.0%) (0.0%) (2.0%) (1.0%)

3.2.4. Variable number and population differences. The previous sections combined the results for all 200 test individuals. To determine whether the results are consistent among the test populations, this section re-examines the results for each geographic group in relation to variable number.

3.2.4. 1. Number ofcorrect assignments accepting all posterior and typicality probabilities. Table 7 shows the total number of correct classifications

39 for each population. For the population-included, sex-unspecified analyses,

FORDISC achieved the best results using the 56-variable dataset. This was consistent across all five populations. For the Berg, 85.0% of the test specimens were classified correctly using the 56-variable dataset as opposed to 47.5% for the next-best 10-variable dataset (neurocranium). For the Santa Cruz

Amerindians, FORDISC correctly c1assi'fled 92.5% of the test specimens using 56 variables in contrast to 60.0% for the next-best result (face dataset). 70.0% of the

Northern Japanese were correctly classified using 56 variables, with the next­ best result returning only 25.0% (face dataset). The Tasmanian group was correctly classified in 80.0% of the cases using the 56-variable dataset with the neurocranium dataset returning the next-best result with 57.5%. For the Zulu,

FORDISC also achieved the best results using 56-variables and the next-best result using the 10-variable neurocranium dataset (50.0%).

The population-included, sex-specified analyses followed a similar pattern to the sex-unspecified results. For all five populations, FORDISC was the most successful using the 56-variable dataset. For the Berg, FORDISC classified

85.0% of the test specimens correctly using 56 variables. The neurocranium dataset returned the next-best result with 57.5%. All 40 Santa Cruz individuals

(100%) were correctly classified using the 56 variable dataset with the face returning the next-best result with 62.5%. 77.5% of the Northern Japanese were correctly classified using 56 variables, in contrast to 30.0% for the best 10­ variable dataset (basicranium). The Tasmanian group was correctly classified

40 82.5% of the time using 56 variables with the next-best result coming from the neurocranium dataset (75.0%). The Zulu group also achieved the best result using the 56-variable dataset (97.5%) and the next-best using the neurocranium dataset (47.5%).

The pattern of correct assignments for the population-excluded, sex-unspecified analyses followed the population-included results for four out of the five populations. FORDISC correctly classified 15.0% of the Berg specimens using

56 variables as opposed to 10.0% using the next-best 1O-variable dataset (face).

For the Santa Cruz population, 45.0% were correctly classified using the 56­ variable dataset followed by 20.0% using the facial dataset. 60.0% of the

Northern Japanese were correctly classified using 56 variables with the 10­ variable face dataset returning the next-best result (15.0%). The Tasmanian group also achieved better results using 56 variables, classifying 40.0% with this dataset versus 30.0% using the neurocranium dataset. In contrast, FORDISC achieved the best results for the Zulu group using the neurocranium dataset.

Here, 27.5% of the test sample was correctly assigned in comparison to 22.5% when 56 variables were used.

For the population-excluded, sex-specified analyses, three out of the five groups achieved the best results using the 56-variable dataset. For the Santa Cruz population, FORDISC c1assi'fied more specimens correctly using 56 variables

(45.0%) than 10 variables (35.0% using the face dataset). This was also the case

41 for the Northern Japanese. For this group, FORDISC correctly classified 65.0%

of the specimens using the 56-variable dataset versus 20.0% using the face.

FORDISC also classified more Tasmanians correctly using 56 variables (50.0%)

than with 10 variables (40.0% using the neurocranium dataset). In contrast, both

the Berg and the Zulu groups deviated from the general pattern. For the Berg,

FORDISC classi'fied the same number of specimens using the 1O-variable face

dataset as the 56-variable whole cranium. In each case, 17.5% of the test

specimens were correctly classified. For the Zulu, 47.5% of the test specimens

were correctly classified using the neurocranium dataset and only 20.0% using

the 56-variable whole-cranium dataset.

Table 7. Results by population accepting all posterior and typicality probabilities (n=40)

Berg Santa Cruz N.Japan Tasmanian Zulu ISU

56 variables 34 37 28 32 34 (whole cranium) (85.0%) (92.5%) (70.0%) (80.0%) (85.0%)

10 variables 11 8 7 7 12 (basicranium) (27.5%) (20.0%) (17.5%) (17.5%) (30.0%)

10 variables 19 15 6 23 20 (neurocranium) (47.5%) (37.5%) (15.0%) (57.5%) (50.0%)

10 variables 13 24 10 15 6 (face) (32.5%) (60.0%) (25.0%) (37.5%) (15.0%)

ISS

56 variables· 34 40 31 33 39 (whole cranium) (85.0%) (100%) (77.5%) (82.5%) (97.5%)

42 10 variables 15 12 12 13 15 (basicranium) (37.5%) (30.0%) (30.0%) (32.5%) (37.5%)

10 variables 23 18 7 30 19 (neurocranium) (57.5%) (45.0%) (17.5%) (75.0%) (47.5%)

10 variables 18 25 11 20 9 (face) (45.0%) (62.5%) (27.5%) (50.0%) (22.5%)

ESU

56 variables 6 18 24 16 9 (whole cranium) (15.0%) (45.0%) (60.0%) (40.0%) (22.5%)

10 variables 1 5 3 5 3 (basicranium) (2.5%) (12.5%) (7.5%) (12.5%) (7.5%)

10 variables 3 4 2 12 11 (neurocranium) (7.5%) (10.0%) (5.0%) (30.0%) (27.5%)

10 variables 4 8 6 10 2 (face) (10.0%) (20.0%) (15.0%) (25.0%) (5.0%)

ESS

56 variables 7 18 26 20 8 (whole cranium) (17.5%) (45.0%) (65.0%) (50.0%) (20.0%)

10 variables 1 8 4 5 3 (basicranium) (2.5%) (20.0%) (10.0%) (12.5%) (7.5%)

10 variables 5 4 2 16 19 (neurocranium) (12.5%) (10.0%) (5.0%) (40.0%) (47.5%)

10 variables 7 14 8 13 4 (face) (17.5%) (35.0%) (20.0%) (32.5%) (10.0%)

43 3.2.4.2 Number ofcorrect classifications using >0.5 posterior probability and >0.01 typicality probability. Table 8 shows the number of correct classifications for each population when the recommended posterior and typicality probabilities are considered. The number of correctly classi'fied specimens fell when the posterior probability and typicality criteria were used, but the results followed the same pattern as those obtained in the analyses in which all posterior and typicality probabilities were employed. Thus, in the ISU and ISS analyses the 56 variable dataset out performed all the 1O-variable datasets, while in the ESU and ESS analyses the 56-variable dataset out-performed the 10­ variable datasets in the case of the Berg, Santa Cruz, Northern Japanese and

Tasmanians, but not the Zulu. In the ESU analyses, the Zulu neurocranium 10­ variable dataset performed as well as the 56-variable dataset (10% of specimens correctly classified in both cases). In the ESS analyses, the Zulu neurocranium

10-variable dataset performed better than the 56-variable dataset (25% versus

17.5%).

44 Table 8. Results by population using >0.5 posterior probability and >0.01 typicality probability (n=40)

Berg Santa Cruz N.Japan Tasmanian Zulu ISU

56 variables 33 37 27 32 31 (whole cranium) (67.5%) (92.5%) (67.5%) (80.0%) (77.5%)

10 variables 4 0 0 0 0 (basicranium) (10.0) (0.0%) (0.0%) (0.0%) (0.0%)

10 variables 11 5 0 13 5 (neurocranium) (27.5%) (12.5%) (0.0%) (32.5%) (12.5%)

10 variables 7 16 1 5 0 (face) (17.5%) (40.0%) (2.5%) (12.5%) (0.0%)

ISS

56 variables 33 40 30 33 35 (whole cranium) (82.5%) (100%) (75.0%) (82.5%) (87.5%)

10 variables 4 3 0 0 3 (basicranium) (10.0) (7.5%) (0.0%) (0.0%) (7.5%)

10 variables 14 6 0 20 14 (neurocranium) (35.0%) (15.0%) (0.0%) (50.0%) (35.0%)

10 variables 11 18 1 14 0 (face) (27.5%) (45.0%) (2.5%) (35.0%) (0.0%)

ESU

56 variables 6 15 17 7 4 (whole cranium) (15.0%) (37.5%) (42.5%) (17.5%) (10.0%)

10 variables 0 0 0 1 1 (basicranium) (0.0%) (0.0%) (0.0%) (2.5%) (2.5%)

10 variables 0 0 0 1 4 (neurocranium) (0.0%) (0.0%) (0.0%) (2.5%) (10.0%)

45 10 variables 0 1 1 3 1 (face) (0.0%) (2.5%) (2.5%) (7.5%) (2.5%) ESS

56 variables 5 17 18 7 7 (whole cranium) (12.5%) (42.5%) (45.0%) (17.5%) (17.5%)

10 variables 0 0 0 2 2 (basicranium) (0.0%) (0.0%) (0.0%) (5.0%) (5.0%)

10 variables 0 1 0 5 10 (neurocranium) (0.0%) (2.5%) (0.0%) (12.5%) (25.0%)

10 variables 2 3 3 4 2 (face) (5.0%) (7.5%) (7.5%) (10.0%) (5.0%)

3.2.4.3 Number ofcorrect classifications using >0.8 posterior probabilitv

and >0.01 tvpicalitv probabilitv. Table 9 shows the number of correct

classifications for each population when the more strict posterior and typicality

probabilities are considered. Again, the number of correctly classified specimens

fell and the results were very low in general. For the ISU and ISS analyses the 56

variable dataset out performed all the 10-variable datasets. In the ESU analyses,

the 56-variable dataset out-performed the 10-variable datasets in the case of the

Berg, Santa Cruz, Northern Japanese and Tasmanians, but not the Zulu. Here,

the Zulu neurocranium 10-variable dataset and the 56-variable dataset achieved

the same result (2.5% of specimens correctly classified in both cases). In the

ESS analyses, the 56-variable dataset outperformed all of the 10-variable

datasets.

46 Table 9. Results by population using >0.8 posterior probability and >0.01 typicality probability (n=40)

Berg Santa Cruz N.Japan Tasmanian Zulu ISU

56 variables 27 36 21 29 26 (whole cranium) (67.5%) (90.0%) (52.5%) (72.5%) (65.0%)

10 variables 1 0 0 0 0 (basicranium) (2.5%) (0.0%) (0.0%) (0.0%) (0.0%)

10 variables 1 0 0 8 0 (neurocranium) (2.5%) (0.0%) (0.0%) (20.0%) (0.0%)

10 variables 2 3 0 0 0 (face) (5.0%) (7.5%) (0.0%) (0.0%) (0.0%)

ISS

56 variables 30 37 24 34 33 (whole cranium) (75.0%) (92.5%) (60.0%) (85.0%) (82.5%)

10 variables 1 0 0 0 0 (basicranium) (2.5%) (0.0%) (0.0%) (0.0%) (0.0%)

10 variables 6 2 0 14 4 (neurocranium) (15.0%) (5.0%) (0.0%) (35.0%) (10.0%)

10 variables 3 8 0 2 0 (face) (7.5%) (20.0%) (0.0%) (5.0%) (0.0%)

ESU

56 variables 2 6 9 6 1 (whole cranium) (5.0%) (15.0%) (22.5%) (15.0%) (2.5%)

10 variables 0 0 0 0 0 (basicranium) (0.0%) (0.0%) (0.0%) (0.0%) (0.0%)

10 variables 0 0 0 0 1 (neurocranium) (0.0%) (0.0%) (0.0%) (0.0%) (2.5%)

47 10 variables 0 0 0 0 0 (face) (0.0%) (0.0%) (0.0%) (0.0%) (0.0%)

ESS

56 variables 5 12 10 7 6 (whole cranium) (12.5%) (30.0%) (25.0%) (17.5%) (15.0%)

10 variables 0 0 0 0 0 (basicranium) (0.0%) (0.0%) (0.0%) (0.0%) (0.0%)

10 variables 0 0 0 0 4 (neurocranium) (0.0%) (0.0%) (0.0%) (0.0%) (10.0%)

10 variables 0 0 0 1 1 (face) (0.0%) (0.0%) (0.0%) (2.5%) (2.5%)

3.2.5. Summary. Overall, these results indicate that the number of

variables used has a significant impact on FORDISC's ability to identify ancestral

group, regardless of other factors. FORDISC correctly classified more test

specimens using the 56-variable whole-cranium dataset than with any of the 10-

variable datasets. The 10-variable datasets never achieved higher than 48.5%

correct classifications (total result for neurocranium dataset, ISS). This compares

unfavourably to 88.5% for the 56-variable dataset under the same conditions.

However, this finding did not hold for all the test populations or at different

probability levels.

48 3.3 Impact ofcranial region

The effect of cranial region on FORDISC's ability to identify ancestry has not previously been addressed in the literature. Accordingly, this section compares the results of the four sets of analyses using equal numbers of variables selected to isolate the neurocranium, basicranium and face.

3.3.1. Number ofcorrect assignments accepting all posterior and typicality probabilities. Table 10 provides the total number of correct assignments for each cranial region accepting all posterior and typicality probabilities. For the population-included, sex-unspecified analyses, the neurocranium dataset returned the greatest number of correct classifications. Here, FORDISC correctly assigned 41.5% of the test individuals to the appropriate population. The face and basicranium datasets returned 34.0% and 22.5% correct assignments, respectively.

The results were similar for the population-included, sex-specified analyses.

Again, FORDISC was the most successful using the neurocranium dataset. In this analysis, 48.5% of the test specimens were correctly classified versus 41.5% using the face and 33.5% using the basicranium dataset.

49 When the sex was left unspeci'fied, the results for the analyses that excluded the source population followed a similar pattern to the population-included results. In this analysis, FORDISC classified more test specimens with the most closely related population using the neurocranium dataset (16.0%). This was followed by the face dataset (15%) and the basicranium dataset (8.5%).

In contrast, when the source population was excluded and only the same sex reference groups used, the neurocranium and face datasets returned the same results. In both cases, FORDISC classified 23.0% of the test specimens correctly. In keeping with the other analyses, the basicranium dataset returned the fewest number of correct classifications, assigning 10.5% of the test specimens to the most closely related population. Thus, with only one exception,

FORDISC achieved the best results using the neurocranium dataset, followed by the face and basicranium datasets.

Table 10.Total number of test specimens correctly classified (n=200)

Analysis Basicranium Neurocranium Face

ISU 45 83 68 (22.5%) (41.5%) (34.0%)

ISS 67 97 83 (33.5%) (48.5%) (41.5%)

ESU 17 32 30 (8.5%) (16.0%) (15.0%)

ESS 21 46 46 (10.5%) (23.0%) (23.0%)

50 3.3.2. Number ofcorrect assignments using >0.5 posterior probability and

>0.01 typicality probability. Table 11 shows the recalculated scores based on 0.5 posterior probability and 0.01 typicality values for the three sets of cranial-region analyses. For the population-included, sex-unspecified analyses, the best results were achieved when FORDISC used the neurocranium dataset. Here, 17.0% of the test specimens were c1assined correctly. This was followed by the face dataset (14.5%). The basicranium achieved the poorest result, with only four test individuals (2.0%) correctly assigned.

The pattern was similar for the population-included, sex-specified analyses.

FORDISC correctly classified 27.0% of the test population using the neurocranium dataset as opposed to 22.0% and 5.0% for the face and basicranium datasets, respectively.

Surprisingly, the best results for the population-excluded, sex-unspecified analyses were obtained using the face dataset. Here, FORDISC assigned 3.0% of the test individuals to the most closely related group compared to 2.5% using the neurocranial variables. Once again, the basicranium dataset achieved the poorest results with FORDISC placing only two individuals (1.0%) with the most closely related population.

In the population-excluded, sex-specified analyses, the best result was once again obtained using the neurocranium dataset. Here, FORDISC correctly 51 classified 8.0% of the test sample to the most closely related group. The face dataset achieved the next-best result with 7.0%, followed by the basicranium with

2.0%.

Although the number of correct assignments fell significantly when the recommended posterior probability and typicality values were used, the neurocranium dataset returned the highest number of correct classifications in all but one case. With this one exception, the face dataset obtained the next-best results. The basicranium consistently returned the lowest number of correct assignments.

Table 11. Number of test specimens correctly classified using >0.5 posterior probability and >0.01 typicality criteria (n=200)

Analysis Basicranium Neurocranium Face

4 34 29 ISU (2.0%) (17.0%) (14.5%)

10 54 44 ISS (5.0%) (27.0%) (22.0%)

2 5 6 ESU (1.0%) (2.5%) (3.0%)

4 16 14 ESS (2.0%) (8.0%) (7.0%)

52 3.3.3. Number ofcorrect assignments using >0.8 posterior probability and

>0.01 typicality probability. Using the 0.8 posterior probability criterion, the number of correct assignments fell again. Table 12 shows these results. As with the previous results, FORDISC classified more test specimens for the population­ included, sex-unspecified analyses using the neurocranium dataset. Here, 4.5% of the specimens were correctly classified using the neurocranium dataset, in contrast to 2.5% using the face dataset and 0.5% using the basicranium dataset.

The pattern was similar for the population-included, sex-specified analyses.

Using the neurocranium dataset, FORDISC correctly classified 13.0% of the test specimens, followed by 6.5% using the facial variables and 0.5% using the basicranial variables. For the population-excluded, sex-unspecified analyses, the neurocranium again achieved the best results. However, since only one individual (0.5%) was classified correctly and none were classified correctly using either the face or basicranium datasets, the term 'best' is used loosely.

FORDISC fared little better with the population-excluded, sex-specified analyses.

Here, 2.0% of the test specimens were correctly classified using the neurocranium dataset. Only 1.0% was classified using the face dataset and no individuals were classified correctly using the basicranium dataset.

Once again, the neurocranium dataset achieved the highest number of correct determinations, followed by the face and the basicranium. However, at the 0.8

S3 posterior probability level, the results are generally poor and the population- excluded results are extremely low.

Table 12. Number of test specimens correctly classified using >0.8 posterior probability and >0.01 typicality probability (n=200)

Analysis Basicranium Neurocranium Face

ISU 1 9 5 (0.5%) (4.5%) (2.5%)

ISS 1 26 13 (0.5%) (13.0%) (6.5%)

ESU 0 1 0 (0.0%) (0.5%) (0.0%)

ESS 0 4 2 (0.0%) (2.0%) (1.0%)

3.3.4. Cranial region and population differences. As with the variable number analyses, the previous sections combined the results for all 200 test individuals. To establish whether or not FORDISC is consistent between the test populations, this section considers the results for each cranial region according to geographic group.

3.3.4. 1. Number ofcorrect assignments accepting all posterior and typicalitv probabilities. Table 13 breaks down the results for each cranial region by population accepting all posterior and typicality probabilities. For the Berg, when the population was included, FORDISC was the most successful with the

54 neurocranial variables, placing 47.5% (sex unspeci'fied) and 57.5% (sex specified) of the individuals into the Berg group. The facial region was the next most successful followed by the basicranium. However, when the Berg group was excluded, the facial dataset achieved the best results. Here, the program assigned 10.0% (sex unspeci'fied) and 17.5% (sex specified) of the specimens to the Norse group compared to 7.5%/12.5% for the neurocranium and 2.5%/2.5% for the basicranium.

The situation was different for the Santa Cruz group. FORDISC was the most successful in attributing ancestry using the facial variables in all analyses. With the population included and the sex unselected, 60.0% of the sample was correctly classified using the face, versus 37.5%, for the neurocranium and

20.0% for the basicranium. With the population included and sex selected, 62.5% of the sample was classified using the facial variables, in comparison to 45.0%, using the neurocranium and 30.0% using the basicranium dataset. When the

Santa Cruz group was excluded from the analysis, with the sex unselected,

FORDISC assigned 20.0% of the sample to the Peruvian group using the face dataset, 10.0% using the neurocranium and 12.5% using the basicranial variable set.

For the Northern Japanese, FORDISC returned more correct assignments using the facial region in three out of four analyses. However, when the population was included and the sex speci'fied, FORDISC obtained the best results using the

55 basicranium variable set, placing 30.0% of the individuals back into the Northern

Japanese group versus 27.5% for the face and 17.5% for the neurocranium.

For the Tasmanians, FORDISC achieved the highest success rate when using the neurocranial variables in all analyses. For the population included analyses

57.5% (sex unspecified) and 75.0% (sex specified) specimens were assigned correctly over 37.5% and 50.0% using facial variables and 32.5% and 17.5% using the basicranium.

The Zulu group also showed the best results when FORDISC used the neurocranial variables. When the population was included, 50% (sex unspecified) and 47.5% (sex specified) of the individuals were correctly identified, versus

30.0% and 37.5% for the basicranium and 15.0% and 22.5% for the facial region.

56 Table 13. Total results for each cranial region by population (n=40)

Berg Santa Cruz N.Japan Tasmanian Zulu ISU

11 8 7 7 12 Basicranium (27.5%) (20.0%) (17.5%) (17.5%) (30.0%)

19 15 6 23 20 Neurocranium (47.5%) (37.5%) (15.0%) (57.5%) (50.0%)

13 24 10 15 6 Face (32.5%) (60.0%) (25.0%) (37.5%) (15.0%)

ISS

15 12 12 13 15 Basicranium (37.5%) (30.0%) (30.0%) (32.5%) (37.5%)

23 18 7 30 19 Neurocranium (57.5%) (45.0%) (17.5%) (75.0%) (47.5%)

18 25 11 20 9 Face (45.0%) (62.5%) (27.5%) (50.0%) (22.5%)

ESU

1 5 3 5 3 Basicranium (2.5%) (12.5%) (7.5%) (12.5%) (7.5%)

3 4 2 12 11 Neurocranium (7.5%) (10.0%) (5.0%) (30.0%) (27.5%)

4 8 6 10 2 Face (10.0%) (20.0%) (15.0%) (25.0%) (5.0%)

ESS

1 8 4 5 3 Basicranium (2.5%) (20.0%) (10.0%) (12.5%) (7.5%)

5 4 2 16 19 Neurocranium (12.5%) (10.0%) (5.0%) (40.0%) (47.5%)

7 14 8 13 4 Face (17.5%) (35.0%) (20.0%) (32.5%) (10.0%)

57 3.3.4.2. Number ofcorrect assignments using >0.5 posterior probability and >0.01 typicality probability. Table 14 lists the results for each cranial region by population using 0.5 posterior probability and 0.01 typicality probability. Using the recommended probability criteria with the source population included,

FORDISC achieved the best results for the Berg using the neurocranial variables. This was followed by the face and basicranium datasets. However, when the Berg population was excluded and the sex spedfied, FORDISC was only able to place two specimens into the target group (the Norse). FORDISC could not place any individuals correctly when the sex was unspecified or by using the other cranial regions.

In the analysis of the Santa Cruz specimens, FORDISC achieved the highest rate of success when the population was included using the facial variables

(40.0% - sex unspecified, 45% - sex specified). The facial variables were also the most successful when the source population was excluded. However, like the

Berg, the c1assi'fication rates were very low and no test specimens were correctly classified using the basicranium dataset.

With the Northern Japanese group FORDISC assigned specimens correctly at the recommended probability levels only when using the facial variables. No individuals were correctly assigned using the other two variable sets.

Surprisingly, FORDISC placed more specimens correctly when the source population was excluded. Three individuals (7.5%) were placed with the

58 Southern Japanese group when the sex was specified, while only one each was correctly assigned when the source population was included and the sex unspeci'fied or specified.

For the Tasmanian sample, FORDISC returned more correct assignments using the neurocranium variables when the population was included. When the

Tasmanian group was excluded and the sex unspecified, the best result was obtained using the face dataset (7.5% compared to 2.5% for neurocranium or basicranium). However, when the sex was specified, FORDISC placed more specimens into the Australian group using the neurocranial dataset (12.5% versus 10.0% using the face and 5.0% using the basicranium).

Lastly, for the Zulu group, at the recommended probability levels, FORDISC achieved the best results using the neurocranium dataset in all analyses.

However, the population-included results were very poor for the other cranial regions and more individuals were correctly placed when the source population was excluded.

S9 Table 14. Results for each cranial region using >0.5 posterior probability and >0.01 typicality probability (n=40)

Berg Santa Cruz N.Japan Tasmanian Zulu ISU

4 0 0 0 0 Basicranium (10.0%) (0.0%) (0.0%) (0.0%) (0.0%)

11 5 0 13 5 Neurocranium (27.5%) (12.5%) (0.0%) (32.5%) (12.5%)

7 16 1 5 0 Face (17.5%) (40.0%) (2.5%) (12.5%) (0.0%)

ISS

4 3 0 0 3 Basicranium (10.0%) (7.5%) (0.0%) (0.0%) (7.5%)

14 6 0 20 14 Neurocranium (35.0%) (15.0%) (0.0%) (50.0%) (35.0%)

11 18 1 14 0 Face (27.5%) (45.0%) (2.5%) (35.0%) (0.0%)

ESU

0 0 0 1 1 Basicranium (0.0%) (0.0%) (0.0%) (2.5%) (2.5%)

0 0 0 1 4 Neurocranium (0.0%) (0.0%) (0.0%) (2.5%) (10.0%)

0 1 1 3 1 Face (0.0%) (2.5%) (2.5%) (7.5%) (2.5%)

ESS

0 0 0 2 2 Basicranium (0.0%) (0.0%) (0.0%) (5.0%) (5.0%)

0 1 0 5 10 Neurocranium (0.0%) (2.5%) (0.0%) (12.5%) (25.0%)

2 3 3 4 2 Face (5.0%) (7.5%) (7.5%) (10.0%) (5.0%)

60 3.3.5. Summary. When all the test populations were pooled, the neurocranium produced the best results of the three cranial regions. With two exceptions, the face dataset obtained the next-best results. The basicranium consistently returned the lowest number of correct assignments. Indeed, in two cases, FORDISC was unable to classify a single individual out of 200 using this dataset. However, when the populations were considered individually, the results were inconsistent and the rates of correct classification were very low in general.

In sum, FORDISC varied in its ability to classify individuals correctly with respect to cranial region.

61 4. Discussion

4.1. Main findings

The difference between FORDISC's success rate in the source population­ included analyses and its success rate in the source population-excluded analyses was substantial. When the whole cranium dataset was analyzed with the source population included, more than two thirds of the test specimens were classified correctly (70-89%) whereas when the whole cranium dataset was analyzed with the source population excluded less than half of the test specimens were classified correctly (12-40%). Far fewer specimens were classified correctly in the analyses that focused on an individual anatomical region, but the number classified correctly in the source population-included analyses was always at least twice the number classified correctly in the source population-excluded analyses. Thus, the presence or absence of the source population in the reference sample greatly impacts the accuracy of FORDISC.

Specifically, the analyses suggest that if a test specimen's source population is represented in FORDISC's reference sample, there is a reasonable chance that the ancestry will be accurately determined, whereas if the specimen's source population is not represented in FORDISC's reference sample, there is little chance that its ancestry will be accurately determined.

62 The finding that a test specimen's source population has to be represented in the reference sample in order for there to be a reasonable chance for its ancestry to be accurately determined is consistent with Jantz and Ousley's (2005) cautions regarding the use of FORDISC. However, as a result it challenges the utility of the program in any but the most restricted circumstances. As noted in the

Introduction, it is entirely possible for a set of remains to be from any place in the world, particularly if they are recent. Consequently, the likelihood of being able to determine in advance if an unknown specimen's population is represented in the

FORDISC reference group sample is extremely low. If FORDISC is only effective when an individual's source population is represented in the reference sample and a researcher must establish this in order to be confident about the program's determinations, there is no point in actually undertaking a FORDISC analysis. At best it will only confirm a determination made by some other means. urthermore, if the test specimen's source population is not represented in the program's reference sample and a specimen is analyzed anyway, an investigator cannot be confident that the resulting determination actually corresponds to a closely related population. In the end, the analysis has not assisted in narrowing an individual's ancestry.

As discussed earlier, Jantz and Ousley have also suggested that secular change may be responsible for some of FORDISC's poor performance in previous tests.

This means that in addition to needing to have the source population represented, FORDISC also requires a specimen to be contemporaneous with

63 the specimens in the reference sample to be reliable. The Forensic Databank includes modern forensic cases as well as "mid to late 19th century Amerindian remains" (Jantz and Ousley 2005:35) while Howells' reference populations range from 26-30th Dynasty Egyptians (600-200 B.C.) to mid-20th century dissection­ room cadavers (Howells 1973). As a result, even if an investigator knew that an unknown specimen came from Egypt, for example, if they could not also say it came from the same time period as Howells' group, FORDISC's attribution would have to be considered suspect. Furthermore, if modern Americans have changed so significantly in the last 150 years, the point at which secular changes override population differences needs to be clearly established.

This study also determined that restricting the program to the relevant sex improved FORDISC's ability to correctly attribute ancestry. When the source population was included and 56 variables were used, selecting the sex resulted in a six percent improvement over not doing so. For the 1O-variable datasets, selecting the sex achieved between seven and 11 percent better results than when it was left unselected. The results were similar for the source population excluded results, with between three and eight percent improvement when the sex was selected. This suggests that accurately sexing an unknown specimen through morphological examination is advisable before IJsing FORDISC to determine a specimen's ancestry.

64 The results of the analyses also suggest that the number of variables greatly affects FORDISC's ability to determine ancestry. When the 200 test specimens were considered together, using 56 variables consistently returned the highest rate of correct assignments, regardless of other criteria. Even in the "best-case scenario" where the source population was represented in the reference sample, the sex of the test specimen was specified, and all posterior and typicality probabilities were accepted, 10 variables achieved less than half the success rate that 56 variables obtained.

The 56-variable dataset also outperformed the three 1O-variable datasets when the test specimens were broken down by population. The only exceptions were the analyses in which the Zulu test specimens were analyzed without the Zulu population being represented in the reference sample. In these analyses, the 10­ variable neurocranium dataset returned more correct assignments than the 56­ variable dataset.

In general, these results contradict the claims by Jantz and Ousley (2005) that

"as more variables are added, there is a tendency for the classification accuracy to plateau and then decrease" (p. 50) and support the findings of other researchers that better discrimination is achieved by maximizing the number of variables (Hubbe and Neves 2005). Furthermore, they suggest that, contrary to claims regarding FORDISC (Ubelaker et al. 2002), the program cannot be used

65 with confidence on incomplete remains from which only a few measurements can be obtained.

The effect of anatomical region on FORDISC's ability to identify ancestry was not resolved by this study. Although FORDISC achieved the best results on average for the five groups using the neurocranium, it did not do so consistently across populations and the returns were very low in general. When the results were considered as a whole, the neurocranium was the most effective for determining ancestry, followed by the face and basicranium.

These results conflict with the prediction that the basicranium would be the most successful because it is the most phylogenetically and ontogenetically stable of the three regions, while the face would be the least successful due to non­ genetic inl~uences on its shape. In fact, while the neurocranium and facial regions vied for the highest success rate, the basicranial variable set consistently returned the fewest correct assignments. However, because all three regional datasets performed so poorly, this question was not fully resolved by the current study.

4.2 Implications for use ofFORDISC

The results of this study suggest that the utility of FORDISC is limited. In order for the program to yield an accurate determination of ancestry, the target

66 specimen's source population must be present in FORDISC's reference sample and its sex must be known. In addition, the target specimen must be complete enough for more than 10 measurements to be recorded on it and for those measurements to relate to more than one region of the cranium.

The utility of FORDISC may in fact be more limited than the analyses reported here suggest. During the course of the study, it became apparent that the evaluation criteria that have been recommended are ineffective. The following figures relate to the set of analyses that yielded the highest number of correctly classified specimens-that is, the analyses in which the source population was included in the reference sample, sex was specified and 56 variables were employed. Using the 0.5 posterior probability/0.01 typicality probability combination, five "correct" test individuals (2.5%) would be falsely rejected. Using the same criteria, 16 (8%) incorrect determinations would be falsely accepted.

Using the 0.8 posterior probability/0.01 typicality probability combination, 17

(8.5%) of the test individuals would be rejected even though they were correct, and 'five (2.5%) "incorrect" determinations would be considered correct. Thus, neither of the recommended combinations of posterior probability and typicality probability enables us to be confident that the ancestry of a specimen has been correctly determined.

With the foregoing in mind, a sectioning point for the posterior probability and typicality probabilities was calculated from the results of the analyses that yielded

67 the highest number of correctly classified specimens. The posterior probabilities associated with incorrect assignments ranged from 0.389 to 0.991, while the typicality probabilities ranged from 0.000 to 0.952 (Table 15). This indicates that, for an ancestry determination to be considered correct without ambiguity, the posterior probability must be greater than 0.991 and the typicality probability must be higher than 0.952. Using these criteria, only two determinations out of

200 (1.0%) would be considered unambiguously correct and the rest would have to be considered unclassifiable. Clearly, if in the best case scenario only 1.0% of

FORDISC's attributions can be accepted with confidence, this has serious implications for the program's utility.

Table 15: Range of posterior and typicality probabilities for correct and incorrect assignments by population

CORRECT INCORRECT PP PP TP TP PP PP TP TP MIN MAX MIN MAX MIN MAX MIN MAX Berg .646 1.0 0.0 .947 .521 .876 0.0 .643 Snt. Cruz .752 1.0 0.077 .942 -- - - N.Japan .593 1.0 .196 .964 .447 .850 0.0 .952 Tasmania .873 1.0 .043 .935 .436 .991 .327 .690

Zulu .546 1.0 0.0 .964 .389 .939 .440 .482

Even this may overestimate FORDISC's accuracy. As noted in the Materials and

Methods, Howells selected 50-55 crania of each sex to represent each group.

For a number of groups, this meant that only a small percentage of the available individuals were measured. For example, the 26th _30 th Dynasty Egyptian crania 68 were selected from a sample of nearly 1800. Significantly, the individuals were not chosen at random. Rather, Howells "carefully selected" specimens that he considered to be typical of the group (Howells 1995: 3). Crania that were

"morphologically unusual for the population as a whole" (Howells 1989: 89) were not included, even if there were no obvious pathological changes to account for the differences. Thus, Howells' data collection strategy was such that the degree of overlap among the reference populations is likely to be artificially low. Given that the accuracy of classification in DFA is inversely correlated with the degree of overlap among groups, it is likely that the analyses reported here overestimate the accuracy of FORDISC.

There is a further reason for suspecting that the study reported here may have overestimated the utility of FORDISC. A number of the collections Howells analyzed did not include "mandibles or skeletal parts to aid in the diagnosis" of sex (Howells 1989: 91). Consequently, sex was frequently assessed on cranial morphology alone. Although Howells attempted to corroborate his estimates with those of other researchers who had examined the remains, he admitted that some of the skulls of known sex "would certainly have been assigned to the wrong sex if it had been done by inspection" (Howell 1989: 94). This suggests that the sexes of Howells' populations may be more different than they should be. The corollary of this is that the success rate of FORDISC in the analyses in which sex was specified may have been artificially high.

69 4.3. Future considerations

FORDISC's utility may be limited because the nature of human variation is such that ancestry cannot be determined from skeletal remains, as Williams et al.

(2005) have suggested. However, the importance of determining ancestry is great enough that it would seem sensible to investigate other possibilities before concluding that ancestry is an aspect of the biological profile that cannot be accessed from the human skeleton.

One potential cause of FORDISC's poor performance is its reliance on two­ dimensional measurement data. Three-dimensional landmark data may capture more of the morphological differences among populations and therefore provide a better basis for determining the ancestry of unknown specimens. Although studies are beginning to use three-dimensional geometric morphometric methods to explore population history and climate signals in modern human cranial morphology (e.g. Harvati and Weaver 2006b), none has attempted to apply these methods to estimate ancestry of unknown remains.

A second potential cause of FORDISC's poor performance is its reliance on cranial data. Work on the utility of the cranium for reconstructing primate phylogeny raises the possibility that the cranium is either an inadequate source of information regarding ancestry or perhaps even a misleading one (ct. Collard and Wood 2000). Although earlier studies met with limited success using

70 postcranial data for ancestry determination (Marino 1997; Ballard 1999; Holliday and Falsetti 1999: Patriquin et al. 2002), it may be worthwhile investigating whether supplementing cranial data with postcranial data and/or data from the teeth and lower jaw provides more accurate determinations of ancestry.

A third potential cause of FORDISC's poor performance is its reliance on

Discriminant Function Analysis. It is possible that FORDISC's success rate is so limited because DFA does not distinguish the form of similarity that is informative with respect to ancestry-shared derived similarity-from forms of similarity that are not informative regarding ancestry, such as shared primitive similarity and convergent similarity. Accordingly, it would be worthwhile trying to adapt phylogenetic methods that focus on shared derived similarity, such as cladistics

(Hennig 1966), to the problem of determining the ancestry of unknown skeletal specimens.

While these possibilities are being explored, FORDISC will almost certainly continue to be used to assist with ancestry determinations. With this in mind, there would seem to be a pressing need to expand FORDISC's reference samples. Ideally this would involve maximizing both the numbers of individuals and populations represented, and ensuring that as many temporal periods are covered as possible. Although Jantz and Ousley have supplemented the

Forensic Databank with new material, they have not similarly augmented the

Howells samples in FORDISC. While some remains have already been

71 repatriated, it would seem advisable to take advantage of the large number of skeletal collections available in institutions around the world to fill in the temporal, geographic or representational gaps in FORDISC's reference sample.

There is also a pressing need to investigate the relationship between number of variables and success rate in greater detail. In the current study the maximum number of variables common to all groups was compared to the recommended minimum according to the FORDISC manual to determine how variable number affected FORDISC's success rate. Although this provided a clear indication that

10 variables are insufficient to achieve good results, it did not establish what a reasonable minimum might be. Given the fragmentary nature of many bioarchaeological and forensic specimens, it would be useful to repeat the analyses with 20, 30 and 40 variables to determine if the classification rate improved consistently as the number of variables increased or whether it levels off.

Lastly, during the 2007 FORDISC 3.0 workshop, Jantz and Ousley outlined a new option in the program that allows a specimen to be analyzed on the basis of shape alone. The option was developed, to "neutralize" the confounding effects of sex (Jantz and Ousley 2007: 40). Given the marked impact that controlling for sex had on FORDISC's success rate in the current study, it would be sensible to examine whether employing the shape-only option results in more specimens being correctly classified than when ancestry is determined on the basis of shape

72 and size. If the former proves to be the case, then the shape-only option may improve the success rate of FORDISC when dealing with specimens that cannot be sexed with confidence.

While this new transformation option might ensure that an unknown is assessed on the basis of shape alone and is not significantly smaller than the reference samples, it is not uncomplicated. Other evidence suggests that males and females of a given population are not simply different sized variants of the same basic form (Wood and Lynch 1996). As non-metric assessments attest, there are clear shape differences between males and females irrespective of ancestry. If males tend to have similar proportions regardless of size or population, removing size would not necessarily help FORDISC achieve the correct ancestry. If this is the case, then using the new shape transformation function in FORDISC 3.0 would result in males clustering with males and prove only that a skull has a male shape and not that the shape necessarily relates to ancestry.

73 5. Conclusions

This study explored several issues related to the computer program FORDISC.

In particular, it addressed problems related to population representation in the database, the number of variables to use in an analysis, the effect of constraining sex, the effect of anatomical region, and the challenge of interpreting the results.

This research was undertaken in part because these issues are fundamental to the appropriate use of the program. As FORDISC becomes more popular, a danger lies in investigators using the program without fully understanding its limitations. Additionally, the ongoing FORDISC debate has done little to resolve the questions that have arisen around the program's performance. In fact, it seems that each time a criticism of the program is raised, FORDISC's developers add a new caveat to its use. Given the popularity of FORDISC and the confidence place in it, it was deemed important to determine not only how effective the program is, but whether or not the criticisms of it are valid.

In total, this study carried out four sets of analyses on four separate datasets for

200 individuals from within FORDISC's reference sample. The test datasets were selected to include the range of possibilities in terms of both variable number and anatomical region, while the test individuals were chosen from five populations representing separate geographic regions. The first set of analyses tested each dataset using all populations (including the one from which the test individual was drawn) and both sexes. The second set of analyses also included all populations, 74 but restricted FORDISC's comparison to members of the same sex. The third set of analyses excluded the test individual's source population but used both males and females of the remaining groups. The fourth set of analyses excluded the test individual's source population and compared it only to members of the same sex.

The results of this study support FORDISC's developers' caution against using the program if a representative population is not available. However, if a population is not represented in the database, FORDISC cannot be expected to find a closely related population - either geographically or genetically. This suggests that while FORDISC may be useful in very restricted contexts, its widespread use on geographically or temporally remote populations is not acceptable.

With respect to variable number, the results contradict FORDISC's developers' contention that using too many variables reduces performance. Instead, this study found that FORDISC only achieved reasonable rates of success when the number of variables was maximized. Reducing the number of variables to the level recommended by the developers for the size of the reference sample, resulted in exceedingly low success rates. The results were also not consistent between test populations. Consequently, these results suggest that the program should not be used on incomplete remains if sufficient numbers of measurements cannot be obtained.

7S This study also determined that FORDISC was more accurate in assigning ancestry when comparing a specimen only to members of its own sex. When both sexes of each population were included in the comparison, FORDISC did not consistently select the appropriate ancestry. Unfortunately, it did not necessarily select the same sex either. While these results suggest that size may be confounding FORDISC's determination of ancestry, the problem requires further investigation.

The issue of how anatomical region affects FORDISC's ability to determine ancestry was not fully resolved by this research. Although the neurocranial region achieved the best results overall, all three regions performed very poorly.

Furthermore, the results varied across the five test populations. However, it was not possible to settle this question through the current FORDISC program as the number of variables associated with each anatomical region is limited.

Lastly, the issue of how best to interpret the results in terms of the recommended posterior and typicality probabilities was also not fully resolved. At the levels recommended by the FORDISC 3.0 manual, more incorrect determinations would erroneously be considered correct. However, at the levels recommended by the FORDISC 3.0 workshops, more correct determinations would be rejected as incorrect. Neither of these recommendations appeared to correspond with a natural sectioning point between correct and incorrect attributions. However,

76 when a sectioning point was calculated directly from the data, almost every ancestry determination had to be considered either inconclusive or incorrect.

As it stands, FORDISC requires the population, the time period, the sex and as many measurements as possible for a set of remains before it can be expected to return a reasonable estimation of ancestry. Furthermore, if FORDISC does not achieve a higher than 0.991 posterior probability in addition to a 0.952 typicality probability, the resulting ancestry determination must be considered ambiguous.

Given this situation, the only conclusion that can be drawn is that if FORDISC is used at all, it should only be under extremely restricted circumstances or to provide limited confirmation of information gathered through other means.

77 References

Albanese, J. and S.R Saunders 2006 Is it possible to escape racial typology in Forensic Identification? In Forensic Anthropology and Medicine: Complementary Sciences from Recovery to Cause ofDeath. Schmitt, A, Cunha, E and J. Pinheiro eds. Totowa: Humana Press Inc.

Angel, J. L. 1976 Colonial to modern skeletal change in the U.S.A., American Journal ofPhysical Anthropology. 45:723-736.

Anthropolog 2005 Newsletter of the Department of Anthropology. National Museum of Natural History. Accessed 02/22/08 via http://www.google.com/search?q= american +academy+of+forensic+science+fordisc+workshop&sourceid =navclient-ff&ie=UTF-8&rlz=1B2GGFB_ enCA218&aq=t

Arlington National Cemetery Website 2005 Richard Vandergeer, Second Lieutenant, USAF memorial page (http://www.arlingtoncemetery.neUrvandergeer.html) Accessed: 01/25/2007.

Ballard, M.E. 1999 Anterior femoral curvature revisited: race assessment from the femur. Journal ofForensic Sciences. Vol. 44:4.

Bass, W.M. 1995 Human Osteology: A Laboratory and Field Manual. Columbia, MO: Missouri Archaeological Society.

Beals, K., Smith, C.L. and S.M. Dodd 1983 Climate and the evolution of brachycephalization. American Journal ofPhysical Anthropology. Vol. 62:4.

Belcher, R, Williams, F. & GJ Armelagos 2002 Misidentification of Meroitic Nubians using Fordisc 2.0. (Abstract) American Journal ofPhysical Anthropology. Vol 117, Supplement 34:42.

Bendor-Samuel J, and RL. Hartell (editors) 1989 The Niger-Congo Languages - A classification and description of Africa's largest language family. Lanham, Maryland: University Press of America.

78 Boas, F. 1911 Changes in bodily form of descendants of immigrants. In Reports of the Immigration Commission. (1907-1910), Vol 38. Washington: Government Printing Office.

Brues, A.M. 1991 The Once and Future Diagnosis of Race. In Skeletal Attribution of Race. Gill, G.W. and S. Rhine eds. Abuquerque, NM: Maxwell Museum of Anthropology.

Buikstra, J.E. and D.H. Ubelaker. 1994 Standards for Data Collection from Human Skeletal Remains. Fayetteville, AK: Arkansas Archaeological Society

Campbell, L. 1997 American Indian Languages: The Historical Linguistics ofNative America. New York: Oxford University Press.

Campbell A.R, and G.J. Armelagos 2007 Assessment of FORDISC 3.0's accuracy in classifying individuals from WW Howell's populations and the forensic data bank. (Abstract) American Journal ofPhysical Anthropology Vol.132 Suppl. 44, P 83-84.

Carey, J.W. and A.T. Steegmann Jr. 1981 Human Nasal Protrusion, Latitude, and Climate. American Journal ofPhysical Anthropology. Vol. 56: 3.

Cavalli-Sforza, LL, Menozi, P and A. Piazza 1994 The history and geography ofhuman genes. Princeton: University Press.

Collard, M. and B. Wood 2000 How reliable are human phylogenetic hypotheses? Proceedings of the National Academy of Sciences (PNAS). Vol. 97:9.

Coon, C.S., Gam, S.M. and J.B. Birdsell 1950 Races: a study of the problems ofrace formation in man. Springfield, IL: Charles C. Thomas.

Corruccini, R.S. 1974 An examination of the meaning of cranial discrete traits for human skeletal biological studies. American Journal ofPhysical Anthropology. Vol. 40: 3.

79 Cox, Katharine, N.G Tayles, & H.R Buckley 2006 Forensic Identification of 'Race': The Issues in New Zealand. Current Anthropology. Vol. 47: 5.

Cunningham, D.L. & D.J. Westcott 2002 Within-group human variation in the Asian Pleistocene: the three Upper Cave crania. Journal of Human Evolution. Vol. 42: 627-638.

Franciscus, R.G and J.C. Long 1991 Variation in human nasal height and breadth. American Journal of Physical Anthropology. Vol. 85: 4.

Freid, D., Spradley, M.K., Jantz, R.L. and S.D. Ousley 2005 The truth is out there: how NOT to use FORDISC. (Abstract) American Journal ofPhysical Anthropology, Vol 126, Supplement 40.

Fukuzawa, S. and A. Maish 1997 Racial Identi'f1cation of Ontario lroquoian Crania Using FORDISC 2.0 (Abstract) from the 44th annual meeting of the Canadian Society of Forensic Science. Accessed via http://www.csfs.ca/journal/reginabstr.htm

Giles E. and O. Elliot 1962 Race identification from cranial measurements. Journal ofForensic Sciences. Vol. 7: 147-157.

Gill, G.W. and M. Gilbert 1990 Race identification from the midfacial skeleton: American blacks and whites. In Skeletal Attribution of Race. Gill, G.W. and S. Rhine eds. Abuquerque, NM: Maxwell Museum of Anthropology.

Harvati, K and T.D. Weaver 2006a Reliability of cranial morphology in reconstructing Neandertal phylogeny. In Neanderta/s revisited: new approaches and perspectives. Harvati, K and T. Harrison, eds. Dordrecht: Springer 239-254.

-- 2006b Human Cranial Anatomy and the Differential Preservation of Population History and Climate Signatures. The Anatomical Record, Part A,288A:1225-1233.

Hennig, W. 1966 Phylogenetic systematics. Urbana: University of Illinois Press.

80 Hiernaux, J. 1963 Heredity and environment: their innuence on human morphology; a comparison of two independent lines of study. American Journal of Physical Anthropology. Vol 21: 575-589.

Holliday, T.W. and A.B. Falsetti 1999 A new method for discriminating African-American from European­ American skeletons using postcranial osteometries reflective of body shape. Journal ofForensic Sciences. Vol. 44: 5. 926-30.

Howells, W.W. 1973 Cranial Variation in Man: A Study by Multivariate Analysis of Patterns of Difference Among Recent Human Populations. Papers of the Peabody Museum of Archaeology and Ethnology, Volume 67.

-- 1989 Skull Shapes and the Map. Cambridge, MA: Papers of the Peabody Museum of Archaeology and Ethnology, Volume 78.

-- 1995 Who's who in skulls: ethnic identification ofcrania from measurements. Cambridge, MA, Peabody Museum of Archaeology and Ethnology, Volume 82.

-- 1996 Howells' craniometric data on the internet. American Journal of Physical Anthropology. Vol. 101: 3.

Hubbe, M& WA Neves 2007 On the Misclassification of Human Crania. Discussion. Current Anthropology, volume 48, pp. 285-288.

Huberty, C.J. 1994 Applied Discriminant Analysis. In Wiley series in probability and mathematical statistics. Applied probability and statistics. New York, NY: Wiley.

Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P, Oefner P, Renfrew C, and R. Villems 2007 Revealing the prehistoric settlement of Australia by Y chromosome and mtDNA analysis. Proceedings of the National Academy of Sciences (PNAS). 104(21 ):8726-8730.

Hughes, D.R. 1968 Skeletal plasticity and its relevance in the Study of Earlier Populations. In The Skeletal Biology ofEarlier Human Populations. D. R. Brothwell editor, pp. 31-55. London: Thames and Hudson.

81 Hylander, W. L. 1977 The adaptive significance of Eskimo craniofacial morphology. In Orofacial Growth and Development. Dahlberg, A.A and T.M. Graber eds. Chicago, IL: Mouton 129-170.

Jantz, R.L. and L. Meadows Jantz 2000 Secular change in craniofacial morphology. American Journal of Human Biology. Vol. 12:327-338.

Jantz, RL & SO. Ousley 1992 FORDISC 1.0: Computerized Forensic Discriminant Functions. The University of Tennessee, Knoxville.

-- 1996 FORDISC 2.0: Computerized Forensic Discriminant Functions. The University of Tennessee, Knoxville.

-- 2005 FORDISC 3: Computerized Forensic Discriminant Functions. Version 3.0. The University of Tennessee, Knoxville.

-- 2007 FORDISC 3.0: Theory, Methods and Applications. Workshop held in San Antonio, TX. February 20,2007.

Kaestle, F.A, and D.G. Smith 2001 Ancient mitochondrial DNA evidence for prehistoric population movement: The numic expansion. American Journal of Physical Anthropology 115(1 ): 1-12.

Keita, S. O. Y. 2007 On Meroitic Nubian Crania, Fordisc 2.0 and Human Biological History. Discussion. Current Anthropology 48: 425-427.

Kitson, E. 1931 A Study of the Negro Skull with Special Reference to the Crania from Kenya Colony. Biometrika 23(3/4): 271-314.

Knight, A, Underhill, PA, Mortensen, HM, Zhivotovsky, LA, Lin, AA, Henn, BM, Louis, 0, Ruhlen, M, and J.L. Mountain 2003 African Y Chromosome and mtDNA Divergence Provides Insight into the History of Click Languages. Current Biology 13(6):464-473.

Kosiba, S. 2000 Assessing the Efficacy and Pragmatism of "Race" Designation in Human Skeletal Identification: A Test of Fordisc 2.0 Program (Abstract). American Journal ofPhysical Anthropology, Vol 111, Supplement 30:200.

82 Leathers, A, Edwards, J, & GJ Armelagos 2002 Assessment of Classification of Crania Using Fordisc 2.0: Nubian X-Group Test (Abstract). American Journal ofPhysical Anthropology Vol. 117, S34:99-100.

Lieberman, D., Krovitz, G.E., Yates, F.W., Devlin, M. and M. St.Claire 2004 Effects of food processing on masticatory strain and craniofacial growth in a retrognathic face. Journal ofHuman Evolution. Vol. 46: 6.

Lovvorn, MB, Gill, GW, Carlson, GF, Bozell, JR, & TL. Steinacher 1999 Microevolution and the Skeletal Traits of a Middle Archaic Burial: Metric and Multivariate Comparison to Paleoindians and Modern Amerindians. American Antiquity, Vol. 64, NO.3. pp. 527-545.

Mangold, WL, Nawrocki, SP, & J. Scherbauer 1993 The Shaffer Site (12 GR 109): Additional information on an Albee Phase Site in the White River Valley. Indiana University. Accessed 01/06/07 via www.gbl.indiana.edu/abstracts/93/mangold_93.html

Naar, N. A., D. Hilgenberg, and G.J Armelagos 2006 Fordisc 2.0 the ultimate test: What is the truth? (Abstract) American Journal ofPhysical Anthropology, Vol 129, Supplement 42:136.

Nicholson, E. and K. Harvati 2006 Quantitative analysis of human mandibular shape using three­ dimensional geometric morphometries. American Journal ofPhysical Anthropology. Vol. 131: 3, 368-383.

O'Connell, JF and J. Allen 1998 When Did Humans First Arrive in Greater Australia and Why Is It Important to Know? Evolutionary Anthropology. Vol 6:132-146.

Omoto K, and Saitou N. 1997 Genetic origins of the Japanese: A partial support for the dual structure hypothesis. American Journal of Physical Anthropology 102(4):437-446.

Patriquin, M.L., Steyn, M. and S.R. Loth. 2002 Metric assessment of race from the pelVis in South Africans. Forensic Science International. Vol. 127:1-2, pp. 104-113.

Peitrusewsky. M 2000 Metric Analysis of Skeletal Remains: Methods and Applications. In Biological Anthropology of the Human Skeleton. M.A. Katzenberg and S. Saunders eds. New York, NY: Wiley-Liss. 375-416.

83 Redd, A.J, and M. Stoneking 1999 Peopling of Sahul: mtDNA Variation in Aboriginal Australian and Papua New Guinean Populations. American Journal of Human Genetics 65(3).

Roseman, Charles C. 2004 Detecting interregionally diversifying natural selection on modern human cranial form by using matched molecular and morphometric data. Proceedings of the National Academy of Sciences (PNAS). Vol 101 :35, 12824-12829.

Roseman, C.C. and T.D.Weaver 2004 Multivariate apportionment of global human craniometric diversity. American Journal ofPhysical Anthropology. Vol. 125: 257-263.

Sejrsen B, Lynnerup N & Hejmadi M. 2005 An historical skull collection and its use in forensic odontology and anthropology. Journal of Forensic Odontostomatology. 2005 Dec. 23(2):40-4.

Skelton, RR 1996 A Suggested Method for Using Means Data in Discriminant Functions Using Anthropometric Data. Journal of World Anthropology. Vol 1(4).

Skelton, Rand H. McHenry 1992 Evolutionary relationships among early hominids. Journal ofHuman Evolution. Vol 23: 309-349.

Smith, BH, Gam, SM and WS Hunter 1986 Secular trend in face size. Angle Orthodontist. Vol. 56: 196-204.

Spradley, M.K, Ousley, SD and RL Jantz 2008 Evaluating Cranial Morphometric Relationships using Discriminant Function Analysis. (Abstract) American Journal ofPhysical Anthropology. Vo1.135: S46, 199.

Steadman, DW, Adams, BJ, & LW. Konigsberg 2006 Statistical basis for positive identification in forensic anthropology. American Journal ofPhysical Anthropology. Vol 131 (1), pp15-26.

Ubelaker, DH., Ross, AH and SM Graver 2002 Application of Forensic Discriminant Functions to a Spanish Cranial Sample. Forensic Science Communications 4(3).

84 Walsh, SJ and C. Eckhoff 2007 Australian Aboriginal population genetics at the D1 S80 VNTR locus. Annals ofHuman Biology. Vol. 34: 5, 557-565.

Webb RE, and Rindos DJ. 1997 The Mode and Tempo of the Initial Human Colonization ofEmpty Landmasses: Sahul and the Americas Compared. p 233-250.

Wescott, D.J and R.L. Jantz 2005 Assessing Craniofacial Secular Change in American Blacks and Whites Using Geometric Morphometry. In Modern Morphometries in Physical Anthropology. New York: Kluwer Academic/Plenum Publishers. p.231-45.

Williams, F L'Engle, Belcher, RL. & GJ. Armelagos 2005 Forensic Misclassification of Ancient Nubian Crania: Implications for Assumptions about Human Variation. Current Anthropology 46(2): 340­ 346.

Williams, Paul B., Erickson, P and L. Niven 2001 Retrieving History: The 18th Century Mortuary History of the Little Dutch Church, Halifax. Paper Presented At The 33rd Annual Meeting of The Canadian Archaeological Association.

Wood, B. and D. Lieberman 2001 Craniodental variation in Paranthropus boisei: a developmental and functional perspective. American Journal ofPhysical Anthropology. 116:13-25.

Wood, C. and J.M. Lynch 1996 Sexual dimorphism in the craniofacial skeleton of modern humans. In Advances in Morphometries. F.L. Marcus, M. Corti, A. Loy, G.J.P Naylor and D.E. Slice, editors. NATO ASI Series A: Life Sciences Vol. 284.

Wright, R. V. S. 1992 Correlation between cranial form and geography in Homo sapiens: CRANID -A computer program for forensic and other applications. Archaeology in Oceania (27): 128-34.

-- 2005 Guide to using the CRANID program CR5Ind.exe. Accessed via http://box.neUpublic/richwrig/dfiles/CR5Ind.lIP

85 Appendix I

Howells' measurements used in FOROISC

Measurement Description GOl glabello-occipital (maximum cranial) length NOl nasio-occipital length BNl basion nasion (cranial base) length BBH basion bregma height XCB maximum cranial width XFB max frontal breadth STB bistephanic breadth ZYB bizygomatic breadth AUB biauricular breadth WCB minimum cranial breadth ASB biasterionic breadth BPl basion prosthion length NPH nasion prosthion height NlH nasal height OBH orbital height OBB orbital breadth JUB bijugal breadth NlB nasal breadth MAB palate breadth MOH mastoid height MOB mastoid width 2MB Bimaxillary breadth SSS zygomaxillary subtense FMB bifrontal breadth NAS nasio-frontal subtense

86 EKB biorbital breadth OKS dacryon subtense OKB interorbital breadth NOS naso-dacryal subtense WNB simotic chord SIS simotic subtense IML malar length, inferior XML malar length maximum MLS malar subtense WMH cheek height SOS supraorbital projection GLS glabella projection FOL foramen magnum length FRC nasion-bregma chord FRS nasion-bregma subtense FRF nasion-subtense fraction PAC bregma-lambda chord PAS bregma-lambda subtense PAF bregma-subtense fraction OCC lambda-opisthion chord OCS lambda-opisthion subtense OCF lambda-subtense fraction VRR vertex radius NAR nasion radius SSR subspinale radius PRR prosthion radius OKR dacryon radius ZOR zygoorbitale radius FMR frontomalare radius

87 EKR ectoconichion radius ZMR zygomaxillare radius AVR M1 alveolus radius NAA nasion angle ba-pr PRA prosthion angle na-ba BAA basion angle na-pr NBA nasion angle ba-br BBA basion angle na-br SSA zygomaxillare angle NFA nasio-frontal angle DKA dacryal angle NDA naso-dacryal angle SIA simotic angle FRA frontal angle PAA parietal angle OCA occipital angle BRR Bregma radius LAR Lambda radius OSR Opisthion radius BAR Basion radius

88 Appendix II

Howells populations used in FORDISC and their sample sizes. (test samples in bold).

Abbreviation3 Population Location Males/Females

NOR Medieval Norse Norway 55/55

ZAl Medieval Zalavar Hungary 53/45

BER Berg Austria 56/53 Egyptian (26-30 EGY Egypt 58/53 Dynasty) TEl Teita Kenya 33/50

DOG Dogon Mali 47/52

ZUL Zulu South Africa 55/46

BUS Bushman South Africa 41/49

AND Andaman Islanders Indian ocean 35/35 lake Alexandrina AUS South Australia 52/49 Tribes TAS Tasmanian Tasmania 45/42 Papua New TOl Tolai 56/54 Guinea MOK Mokapu Hawaii 51/49

BUR Buriat Siberia 55/54

ESK Inugsuk Eskimo Greenland 53/55

ARI Arikara South Dakota 42/27

PER Yauyos Peru 55/55

3 Used by FORDISC 3.0 when displaying the results ofan analysis. 89 EAS Easter Islanders South Padfic 49/37

AIN Ainu Japan 48/38

NJA Hokkaido North Japan 55/32

SJA Kyushu South Japan 50/41

HAl Hainan South China Sea 45/38

ANY Anyang Northeast China 42/0

ATA Atayal Taiwan 29/18

PHI Philippino Philippines 50/0

GUA Indigenous Guam South Pacific 30/27

MOR Moriori Chatham Islands 57/51

SAN Santa Cruz California 51/51

90 Appendix III

Variable sets used in the current study

Variable Sets Variables Used ASB,AUB,AVR,BBH,BNl,BPl,OKB,OKR,OKS, EKB, EKR, FMB, FMR, FOl, FRC, FRF, FRS, GlS, GOl, IMl, JUB, MAB, MOH, MlS, NAR, NAS, NOS, NlB, NlH, NOl, NPH, OBB, OBH, OCC, OCF, OCS, 56 whole cranium PAC, PAF, PAS, PRR, SIS, SOS, SSR, SSS, STB, VRR, WCB, WMH, WNB, XCB, XFB, XMl, 2MB, ZMR, ZOR and ZYB

AUB, WCB, ASB, MOH, MOB, OCC, OCS, OCF, FOl, 10 basicranium and OCA

GOl, NOl, XCB, XFB, FMB, FRS, PAC, PAF, FRA and 10 neurocranium PM

BNl, NlB, MAB, OKB, NOS, WNB, NAR, OKR, PRA 10 face and OKA

91