Description of the Populations and Linguistic Data Used in Dediu & Ladd

Description of the populations and linguistic data used in Dediu & Ladd (2007) This document represents the detailed description of the populations used in Dediu & Ladd (2007) as well as the relevant linguistic data, their sources and the methodology employed. It must be highlighted that the process of gathering this type of data is still highly non-standardized, time-consuming and difficult, and the apparently simple coding used hides a huge amount of complexity. Therefore, this type of data must not be used without due consideration for this hidden complexity and without reference to the actual meaning of the linguistic features and their values in each individual case. ID Population full name Population short Country/ region Lng. Linguistic family Main city/ region Geographical name coordinates 01 Southeastern and SESWBantu South Africa Niger-Congo Pretoria 25°45′S/ 28°17′E Southwestern Bantu 02 San San Namibia naq Khoisan Windhoek 22°56′S/ 17°9′E 03 Mbuti Pygmy Mbuti Democratic Republic of Congo efe Nilo-Saharan Bunia 1°34′N/ 30°15′E 04 Masai Tanzania mas Nilo-Saharan Arusha 3°22'S/ 36°40'E 05 Sandawe Tanzania sad Khoisan Dodoma 6°15'S/ 35°45'E 06 Burunge Tanzania bds Afro-Asiatic Kondoa 4°54'S/ 35°47'E 07 Turu Turu Tanzania rim Niger-Congo Singida 6°00'S/ 34°30'E 08 Northeastern Bantu Kikuyu Kenya kik Niger-Congo Nairobi 1°16′S/ 36°48′E 09 Biaka Pygmy Biaka Central African Republic axk Niger-Congo Nola 3°35'N/ 16°04'E 10 Zime Cameroon lme Afro-Asiatic Garoua 9°19'N/ 13°21'E 11 Bakola Pygmy Bakola Cameroon gyi Niger-Congo Kribi 2°57'N/ 9°56'E 12 Bamoun Bamoun Cameroon bax Niger-Congo Foumban 5°45'N/ 10°50'E 13 Yoruba Yoruba Nigeria yor Niger-Congo Ibadan 7°22'N/ 03°58'E 14 Mandenka Mandenka Senegal mnk Niger-Congo Ziguinchor 12°34' N/ 16°16'W 15 Mozabite Mozabite Algeria (Mzab) mzb Afro-Asiatic Ghardaia 32°20'N/ 03°37'E 16 Druze Druze Israel (Carmel) apc Afro-Asiatic Haifa 32°46'N/ 35°00'E 17 Palestinian Palestinian Israel (central) ajp Afro-Asiatic Jerusalem 31°47'N/ 35°10'E 18 Bedouin Bedouin Israel (Negev) ayl Afro-Asiatic Rahat 31°33'N/ 34°47'E 19 Hazara Hazara Pakistan haz Indo-European Quetta 31°15'N/ 66°55'E 20 Balochi Balochi Pakistan bgp Indo-European Quetta 31°15'N/ 66°55'E 21 Pathan Pathan Pakistan pst Indo-European Quetta 31°15'N/ 66°55'E 22 Burusho Burusho Pakistan bsk Burushaski Balochistan 27°30'N/ 65°00'E 23 Makrani Makrani Pakistan bcc Indo-European Gwadar 25°10'N/ 62°18'E 24 Brahui Brahui Pakistan brh Dravidian Kalat 29°08'N/ 66°31'E 25 Kalash Kalash Pakistan kls Indo-European Balanguru 35°44'N/ 71°46'E ID Population full name Population short Country/ region Lng. Linguistic family Main city/ region Geographical name coordinates 26 Sindhi Sindhi Pakistan snd Indo-European Karachi 24°53'N/ 67°00'E 27 Hezhen Hezhen China gld Altaic Harbin 45°48'N/ 126°40'E 28 Mongola Mongola China mvf Altaic Hohhot 40°52'N/ 111°40'E 29 Daur Daur China dta Altaic Nirji 48°28'N/ 124°28'E 30 Orogen (Oroqen) Orogen China orh Altaic Alihe 50°34'N/ 123°43'E 31 Miaozu Miaozu China hmy Hmong-Mien Guizhou 27°00'N/ 107°00'E 32 Yizu Yizu China yif Sino-Tibetan Minjian 28°50'N/ 103°32'E 33 Tujia Tujia China tji Sino-Tibetan Jishou 28°19'N/ 109°43'E 34 Han Han China cmn Sino-Tibetan Beijing 39°55'N/ 116°20'E 35 Xibo Xibo China sjo Altaic Shenyang 41°48'N/ 123°27'E 36 Uygur Uygur China uig Altaic Urumqi 43°43'N/ 87°38'E 37 Dai Dai China tdd Tai-Kadai Jinghong 21°27'N/ 100°25'E 38 Lahu Lahu China lhu Sino-Tibetan Kunming 25°01'N/ 102°41'E 39 She She China shx Hmong-Mien Fuzhou 26°05'N/ 119°16'E 40 Naxi Naxi China nbf Sino-Tibetan Lijiang 23°12'N/ 108°9'E 41 Tu Tu China mjg Altaic Xining 36°34'N/ 101°40'E 42 Cambodian Cambodian Cambodia khm Austro-Asiatic Phnom Penh 11°33'N/ 104°55'E 43 Japanese Japanese Japan jpn Japanese Tokyo 35°41'N/ 139°46'E 44 Yakut Yakut Russia (Siberia) sah Altaic Yakutsk 62°12'N/ 129°44'E 45 Papuan New Guinea New Guinea Island 05°00'S/ 140°00'E 46 NAN Melanesian NANMelanesian Bougainville nas East Papuan Bougainville 06°00'S/ 155°00'E 47 French Basque FrBasque France eus Basque Bayonne 43°30'N/ 01°28'W 48 French French France fra Indo-European Paris 48°50'N/ 02°20'E 49 Sardinian Sardinian Italy src Indo-European Cagliari 39°13'N/ 09°07'E 50 North Italian NItalian Italy (Bergamo) vec Indo-European Bergamo 45°41'N/ 09°43'E 51 Tuscan Tuscan Italy ita Indo-European Firenze 43°47'N/ 11°15'E ID Population full name Population short Country/ region Lng. Linguistic family Main city/ region Geographical name coordinates 52 Orcadian Orcadian Orkney Islands sco Indo-European Kirkwall 59°09'N/ 02°59'W 53 Russian Russian Russia rus Indo-European Moskva 55°45'N/ 37°37'E 54 Adygei Adygei Russia (Caucasus) ady North Caucasian Maykop 44°36'N/ 40°05'E 55 Karitiana Brazil ktn Tupi Porto Velho 08°46'S/ 63°54'W 56 Surui Brazil sru Tupi Vilhena 12°40'S/ 60°05'W 57 Colombian Colombia pio Arawakan Puerto Carreno 06°12'N/ 67°22' W 58 Pima Mexico pia Uto-Aztecan Hermosillo 29°10' N/ 111°00' W 59 Maya Mexico yua Mayan Mérida 20°58' N/ 89°37' W Table 1: Geographic/politic and linguistic information for the 59 populations in the entire sample. ID: the population's numeric identification used in the original papers, Population full name: the population's name in the original papers, Population short name: he population's name as used in the present work, Country/region: the population's geographical information as reported in the original papers, Lng.: the 3 letter language code as defined in Gordon (2005), Linguistic family: the historical linguistic affiliation, as given by Gordon (2005), Main city/region: the capital or most important city of the region where the language is reportedly spoken (or, if none, the entire region is reported), and Geographical coordinates: the “Main city/region”'s geographical coordinates. The populations without a short name have not been included in the final sample. Figure 1: The approximate geographical positions of the 54 populations in the Old World sample. Black: ID, gray : population name. The positions have been adjusted to fit. The linguistic data The attribution of languages to populations is based mainly on information in the Ethnologue (Gordon, 2005) and is summarized in Table 1 above, and further detailed below. Most such attributions are fairly straightforward and uncontroversial, but some must be discussed: ● SESWBantu: no single language was attributed, but data from Xhosa (xho) and Zulu (zul) was preferentially used; ● San: Nama (naq) was chosen to represent this population linguistically, because of the number of speakers and availability of information; ● NANMelanesian: as described below, its linguistic attribution was far from obvious, but it turned out to be the Naasioi, [nas]; ● Papuan: It is impossible (and unwarranted) to assume a single linguistic attribution for this sample: therefore, majority judgments at the scale of Papua-New Guinea were used. For each language, its linguistic family was recorded, using a mostly uncontroversial classification, based on Gordon (2005), and avoiding any hazardous claims involving macrofamilies or, for example, the inclusion of Japanese in Altaic. The 141 linguistic features database in Haspelmath, Dryer, Gil & Comrie (2005) was carefully screened and 24 linguistic features were retained, which met the selection criteria: ● as good as possible coverage of the languages corresponding to Old World sample; ● meaningful collapsing of the values range into a binary classification. While the requirement for the best covering of the considered languages (populations) is obvious, the second condition might seem artificial. Nevertheless, it is justified by the restrictions inherent in the statistical theory of measurement and its influence on the range of statistical tests applicable (de Vaus, 2002:40-46; Howitt & Cramer, 2003:5-8). If all linguistic features considered are uniformly binary, then the results can be meaningfully compared across features and, very importantly, most multivariate and spatial statistical methods treat binary variables as interval (Tabachnick & Fidell, 2001:112). Besides these 24 binary linguistic features retained from Haspelmath, Dryer, Gil & Comrie (2005), another 2 binary and 2 numeric (interval) linguistic features were added, resulting in a gross total of 28 linguistic features. The names and short descriptions on these features are given in Table 2. Linguistic feature Description ConsCat The richness of consonant inventory. Cons* The actual number of consonants. VowelsCat The richness of vowel inventory. Linguistic feature Description Vowels* The actual number of vowels. UvularC Are there uvular consonants? GlotC Are there glottalized consonants? VelarNasal Are there velar nasals? FrontRdV Are there front rounded vowels? Codas Are codas allowed? OnsetClust Are onset clusters allowed? WALSSylStr The complexity of syllable structure. Tone Does the language have a tonal system? RareC Does the language have any rare consonants? Affixation How much affixation does the language use? CaseAffixes Are cases marked with affixes? NumClassifiers Does the language have numeral classifiers? TenseAspect Are there tense-aspect marking inflections? MorphImpv Are there dedicated morphological categories for second person imperatives? SVWO What is the dominant Subject-Verb word order (if any)? OVWO What is the dominant Object-Verb word order (if any)? AdposNP What is the dominant order (if any) between adposition and noun phrase? GenNoun What is the dominant order (if any) between genitive and noun? AdjNoun What is the dominant order (if any) between adjective and noun? NumNoun What is the dominant order (if any) between numeral and noun? InterrPhr Is the interrogative phrase initial? Passive Is there a passive construction? NomLoc Are the encoding strategies for locational and nominal predications identical? ZeroCopula Is the omission of copula allowed? Table 2: Summary listing of the 28 considered linguistic features Details in Annexes 6.1 and 6.2; the starred (*) linguistic features are not binary.

Description of the Populations and Linguistic Data Used in Dediu & Ladd

547 References

Development of Basic Literacy Learning Materials for Minority Peoples in Asia and the Pacific

The POSS-Final Suffix Order in Dagur∗

LCSH Section Y

Computing a World Tree of Languages from Word Lists

Book of Abstract Cantonese Syntax

Using Toponyms to Analyze the Endangered Manchu Language in Northeast China

Features and Changes of Vowels of Eastern Yugur Language Han Wu

LCSH Section Y

The World Tree of Languages: How to Infer It from Data, and What It Is Good For

Negation in Mongolic

National Languages