Contrasting Linguistic and Genetic Origins of the Asian Source Populations of Malagasy Received: 08 February 2016 Pradiptajati Kusuma1,2, Nicolas Brucato1, Murray P
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/scientificreports OPEN Contrasting Linguistic and Genetic Origins of the Asian Source Populations of Malagasy Received: 08 February 2016 Pradiptajati Kusuma1,2, Nicolas Brucato1, Murray P. Cox3, Denis Pierron1, Accepted: 26 April 2016 Harilanto Razafindrazaka1, Alexander Adelaar4, Herawati Sudoyo2,5, Thierry Letellier1 & Published: 18 May 2016 François-Xavier Ricaut1 The Austronesian expansion, one of the last major human migrations, influenced regions as distant as tropical Asia, Remote Oceania and Madagascar, off the east coast of Africa. The identity of the Asian groups that settled Madagascar is particularly mysterious. While language connects Madagascar to the Ma’anyan of southern Borneo, haploid genetic data are more ambiguous. Here, we screened genome- wide diversity in 211 individuals from the Ma’anyan and surrounding groups in southern Borneo. Surprisingly, the Ma’anyan are characterized by a distinct, high frequency genomic component that is not found in Malagasy. This novel genetic layer occurs at low levels across Island Southeast Asia and hints at a more complex model for the Austronesian expansion in this region. In contrast, Malagasy show genomic links to a range of Island Southeast Asian groups, particularly from southern Borneo, but do not have a clear genetic connection with the Ma’anyan despite the obvious linguistic association. The Austronesian expansion was a major human migration in Southeast Asia, triggered by the spread of agricul- tural populations approximately 5,000 years ago1–3. Thought to have originated in Taiwan, its influence spread through Philippines and Indonesian archipelago, ultimately impacting a wide geographical area ranging from Remote Oceania in the east, to Madagascar and the eastern coast of Africa in the west2,4,5. This expansion had outsized cultural and genetic impact on these territories, but the populations caught up in the dispersal were regionally different and diverse across the Indo-Pacific. This created a diverse modern range of Austronesian populations with their own cultural traits and genetic heritage, among which Madagascar is a unique case. Despite clear evidence, based on biological6–10 and linguistic data11,12, of Malagasy’s mixed ancestry with both African and Southeast Asian groups, identifying the parental populations of Malagasy and clarifying the pro- cess of settling Madagascar around the middle of the first millennium AD13–15 has remained complex. Language studies have identified many linguistic characters that relate Malagasy to languages spoken in Borneo, nota- bly in the Southeast Barito region. This includes much vocabulary and structural linguistic agreement shared between Malagasy and Southeast Barito languages, which form a subgroup of West Malayo-Polynesian languages in the Austronesian language family11,16–21. Among the communities speaking Southeast Barito languages, the Ma’anyan show linguistic characteristics that place them as the closest known Asian parental population to Malagasy16–18,22,23. Curiously, the Ma’anyan are an indigenous ethnic group representing approximately 70,000 individuals, who live in remote inland areas of central and southeastern Kalimantan (the Indonesian part of the island of Borneo). Today, the Ma’anyan are largely agricultural, cultivating dry rice on shifting fields, but also gathering forest products24. They do not exhibit any particular mastery of seafaring technologies or navigational knowledge22, raising questions about how a closely related language travelled across the vast Indian Ocean and came to be spoken in Madagascar. However, in historical times, the south Borneo coastline was split by a gulf that may have extended 200 kilometres into the interior25,26, thus potentially placing Ma’anyan communities that are firmly inland today in what was then a formerly coastal environment. 1Evolutionary Medicine Group, Laboratoire d’Anthropologie Moléculaire et Imagerie de Synthèse UMR-5288, Université de Toulouse, Toulouse, France. 2Genome Diversity and Diseases Laboratory, Eijkman Institute for Molecular Biology, Jakarta, Indonesia. 3Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand. 4Asia Institute, University of Melbourne, Melbourne, Australia. 5Department of Medical Biology, Faculty of Medicine, University of Indonesia, Jakarta, Indonesia. Correspondence and requests for materials should be addressed to F.-X.R. (email: [email protected]) SCIENTIFIC REPORTS | 6:26066 | DOI: 10.1038/srep26066 1 www.nature.com/scientificreports/ Figure 1. Map showing the location of each population group studied in this work. The map is generated using Global Mapper v.15 software (http://www.bluemarblegeo.com/products/global-mapper.php). Several genetic studies have sought to detect Indonesian genetic connections in the Malagasy genome (includ- ing mitochondrial DNA, Y chromosome and autosomal markers)6–10, but no clear parental groups in Southeast Asia have yet been identified. The limited geographical coverage of Indonesian populations in these studies (including the absence of key populations such as the Ma’anyan) has often prevented precise conclusions. The possibility that the Ma’anyan are the Asian parental source of Malagasy was first explored genetically using uni- parental markers (mitochondrial DNA and the Y chromosome) only in 201510. This preliminary study, which covered a range of Southeast Asian groups, linked the origins of the Asian genetic components in Malagasy to modern populations located between Sulawesi (eastern Indonesia) and eastern Borneo (western Indonesia), thus confirming the general results of earlier studies8. However, surprisingly, the Ma’anyan shared few mtDNA or Y chromosome lineages with Malagasy. Given this apparent contradiction between linguistic evidence and genetic analyses of uniparental markers, and to overcome the potential bias of this lineage-based approach (which is more sensitive to genetic drift), a genome-wide analysis of Southeast Borneo individuals was deemed necessary to better explore the link between Madagascar and Borneo. Here, we perform that genome-wide analysis in the Ma’anyan and other groups from southern Borneo to deter- mine the genetic background and potential Asian sources of the Malagasy. Using Illumina HumanOmniExpress Bead Chips, we genotyped over 700,000 genomic markers in 169 Ma’anyan individuals, together with a further 42 individuals from Dayak ethnic groups across southern Borneo. The aims of this study were dual: i) to examine the genetic diversity of populations in southeastern Borneo (focusing on the Ma’anyan and other indigenous Dayak groups), and thereby determine their place in the wider genetic diversity of Island Southeast Asia; and ii) to identify whether the clear linguistic relationship between the Ma’anyan and Malagasy is also reflected in a shared genetic inheritance. Results The unique Austronesian origin of the Ma’anyan. Following quality control, we obtained genotypes for 701,211 SNPs in a new set of 202 individuals from Borneo: 162 Ma’anyan and 40 South Kalimantan Dayak (SK-Dayak). To characterize the Ma’anyan and SK-Dayak gene pool within an Asian context, we focused our analyses on Island Southeast Asian, East Asian and Mainland Southeast Asian populations (Fig. 1). In a Principal Component Analysis (PCA) using a subset of the SNPs that intersect with published data from an extensive range of regional populations (the low density dataset) (Supplementary Fig. S1), the first principal component (explaining 19.3% of the variance) separates Island Southeast Asian populations from East Asian and Mainland Southeast Asian groups, while the second principal component (explaining 17.5% of the variance) splits the Igorot on the positive axis and the Ma’anyan on the negative axis, with other Austronesian-speaking populations falling in between, such as Taiwanese aborigines, Filipinos, Borneo populations (Murut, Dusun, Lebbo’ and South Kalimantan Dayak) and Sumatran populations (Sumatran Malay and Karo). Other Austronesian-speaking groups, like the Bidayuh, Javanese and Malaysians cluster towards mainland Southeast Asia, likely due to the historical influence of that region on these groups. Interestingly, the Ma’anyan form their own pole on the plot and do not SCIENTIFIC REPORTS | 6:26066 | DOI: 10.1038/srep26066 2 www.nature.com/scientificreports/ Figure 2. ADMIXTURE plot using the low density database with K = 14 (the optimum determined by cross-validation). Each component is identified by a specific color and a C label which corresponds to its order of appearance from K = 2 to K = 14. cluster closely with other populations from Borneo, although the genetically closest population is still the South Kalimantan Dayak group, which is also geographically the nearest neighbour to the Ma’anyan. A similar popu- lation clustering pattern is observed with both the low- and high density SNP datasets (Supplementary Fig. S2). This observation also agrees with FST values calculated on the low density dataset (Supplementary Table S1). This unique genetic placement of the Ma’anyan is supported by admixture estimates, also performed on the low density dataset (Fig. 2), especially at K = 14 where it achieves its lowest cross-validation value (Supplementary Fig. S3). The main ancestral components observed in Southeast Asian populations are: i) an Austronesian Igorot and indigenous Formosan component (C3; light green), ii) a Mainland Southeast Asian (MSEA) component