Genetic and Historical Migration Relationships of Three Northern Sabah Native Ethnic Groups with Their Southern China and Southeast Asia Neighbours
Total Page:16
File Type:pdf, Size:1020Kb
Genetic and Historical Migration Relationship of Three Northern Sabah 51 Genetic and Historical Migration Relationships of Three Northern Sabah Native Ethnic Groups with Their Southern China and Southeast Asia Neighbours C. W. Yew1, M.Z.Hoque2, J. Pugh-Kitingan3, C.L.Y. Voo1, J. Rangsangan4, S.T.Y. Lau1, X. Wang5, W. Y. Saw5, T. H. Ong5, Y. Y. Teo5, S.H. Xu6, B.P. Hoh7, M.E. Phipps8 and S.V. Kumar1* 1Biotechnology Research Institute, 2School of Medicine, 3School of Social Science, 4Borneo Marine Research Institute, Universiti Malaysia Sabah, Jalan UMS, 88400 Kota Kinabalu, Sabah, Malaysia. 5Department of Statistics and Applied Probability, National University of Singapore, Singapore. 6Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, China. 7Institute for Molecular Medical Biotechnology, Universiti Teknologi MARA, Malaysia. 8Jeffrey Cheah School of Medicine and Health Sciences, Monash University, Malaysia *Email: [email protected] Abstract The native ethnic groups of Sabah are divided into Dusunic, Paitanic and Murutic-speaking groups under the North Borneo stock of the Austronesian linguistic family. As this region is a putative entry and transition point of the ‘Out-of-Taiwan’ human migration history, the founder effect may have created multiple new ethnic groups. Nevertheless, there is no evidence to support this hypothesis as the population structure and genetic relationships of the indigenous ethnic groups in Sabah and with those in Southeast Asia and Southern China regions are unknown. As such, this study aims to unravel and compare the population structure and genetic relationships of the Northern Borneo populations against the regional populations, for subsequent inference of migration history. Ethical clearance was obtained and blood samples were collected from healthy individuals of aboriginal ethnic groups. A total of 63 individuals of three ethnic groups namely Rungus, Sonsogon and Sungai- Lingkabau were genotyped with ~2.4 million genome-wide SNP markers. The genotyping data were then merged with the Pan-Asian SNP Consortium (PASNP) data set to form a comprehensive data set of SC-SEA-NB which composes of 58 neighbouring populations. The genetic relationships were then inferred via a complementary analysis of principal component clustering and admixture of 52 Short Communications in Biotechnology Vol. 4 / 2017 ancestry proportion. The Principal Coordinate Analysis (PCA) revealed that the three Sabah ethnic groups form a distinct cluster that is close to the Filipino Austronesians and Taiwanese native groups. Interestingly, the Sabah ethnic groups form a distinct genetic ancestry, denoted as ‘North-Borneo’ and showed a decreasing cline of admixture predominantly towards populations in Taiwan, Philippines, and Indonesia. The Sonsogon group, which had an average of 95% ‘North-Borneo’ ancestry, suggested that the Northern Sabah population had undergone a population isolation event. As such, this work suggests that the Northern Borneo population follows the ‘Out-of-Taiwan’ migration, undergone population isolation, and resulted in admixture with the regional populations. The findings indicate that Sabah’s indigenous population, as a whole, consist of a distinct pool of genetic variants, which are important for anthropology and medical genetic studies in the future. Keywords: Genetic diversity, Ethnic populations, Sabah, Admixture INTRODUCTION The Borneo Island flourishes with multiple ethnic groups of diverse languages, cultures and plausibly the genetic background with others. Previous genome- wide SNP genotyping analysis by the PASNP consortium showed a scarce of sampling coverage in the Borneo Island. Apart from the Bidayuh from Sarawak and the Dayak from Kalimantan, there was no ethnic groups from Sabah (Northern Borneo) in the study. As Northern Borneo is geographically the nearest to the Southern Philippines Islands, the lack of data from Sabah may present a void of better picture on the migration history which is inferred by the genetic study (The HUGO Pan-Asian SNP Consortium, 2009, Xing et al., 2009). Generally, the Austronesians in the Island Southeast Asia (ISEA) were believed to be originated from the ‘Out-of-Taiwan’ exodus which happened at approximately 5000 years ago (Jinam et al., 2012). It is believed that once the ancestors of the Borneo Island entered into the island, this North-East region of Sabah became the transition point to other part of ISEA, through the land route. All previous migration history analysis have only drawn a simplistic model of migration, particularly there is no clear route of migration once the putative ancestors from Taiwan entered the Borneo Island (Macauley et al., 2005; Jinam et al., 2012). In addition, there was no quantitative analysis to measure the proportion of admixture among the SEA populations. As such, the current inhabitants of Sabah are plausibly the most recent ancestors to other ISEA Austronesians. Comparative analysis of population structure with the Southern China and Southeast Asia populations are thus predicted to provide insight into Genetic and Historical Migration Relationship of Three Northern Sabah 53 the migration history of the region and the overview of population substructure of the ISEA populations. Sabah herself has >30 officially recognized aboriginal ethnic groups. Linguistic study showed that the North Borneo language stock extends from Southern Philippines into the vast majority of Sabah and extends to most of the interior lands of Sarawak and Kalimantan (Lewis, 2009). This strongly infer the genetic relatedness of these aboriginal ethnic groups in the Borneo, as linguistic affiliations always reflect the genetic relatedness (Gray et al., 2009). The North Borneo language stock in Sabah can be divided into three, namely Dusunic, Paitanic and Murutic family. The distribution of these language-speaking ethnic groups is based on regions. The Dusunic family, which is the major population in the state, spans range from Northeast, Central to the West Coast; the Paitanic family concentrates in the interior land of the East; whereas the Murutic family range from interior lands in the South-West and expands to the Heartland of the Borneo (Lewis, 2009). As such, this study aims at discovering the population structure and proportion of genetic ancestry of the ethnic groups at the North-East region of Sabah, which is bordering to the putative migration entrance from the Southern Philippines route of ‘Out-of-Taiwan’. The findings will be used to infer a better description of the migration history in this region. METHODS Ethical clearance and sample collection Ethical clearance was obtained from the Committee of Research Ethics of the university (code: JKEtika 4/10(3)). Subsequently, approvals were also granted from the District Officers of Pitas and Kota Marudu to collect blood samples from the Rungus, Sonsogon and Sungai-Lingkabau ethnic groups in the districts. Prior to sample collection, volunteers were briefed on the project objectives and future applications. Their protected rights such as confidential identity and handling of samples were also explained. A brief interview was also conducted to obtain background of the volunteers, particularly the ethnicity, origin, and health history of the donor and family. After that, a consent form was signed by the volunteers. All data were kept private and confidential to avoid exposure of volunteers’ data. Next, 10 mL of peripheral blood was collected from each healthy individual. 54 Short Communications in Biotechnology Vol. 4 / 2017 SNP genotyping A total of 63 samples composing of 21 samples from each ethnic group were selected. Genomic DNA was isolated from whole blood or buffy coat with DNeasy Blood and Tissue kit (Qiagen) in accordance with the manufacturer’s protocol. Next, 200 ng of DNA was prepared for SNP genotyping with Illumina’s Omni2.5 bead array that contains ~2,379,855 SNPs, as described by the manufacturer’s protocol. Quality assessment and merging of SC-SEA-NB merged data set The SNP genotyping data were visualised with Genome Studio (Illumina) and converted to PLINK format. Quality assessment of the samples were then performed to remove samples which are i) <99% call rate, ii) deviated from Hardy-Weinberg equilibrium, iii) discrepant in reported gender and iii) first degree relatives, for subsequent analysis. Next, all monomorphic SNPs among the three ethnics were removed. Principal component analysis (PCA) was conducted to identify the putative admixed individuals. These individuals will be excluded for further analysis. For comparative analysis, a total of 50 populations composing of 978 unrelated individuals who originate from the Southern China and Southeast Asia were extracted from the Pan-Asian SNP Consortium (PASNP) data set. Meanwhile, 30 random unrelated individuals each from the Yoruba in Nigeria (YRI), Caucasians from North-West Europe (CEU), Han Chinese from Beijing (CHB) and Japanese from Tokyo (JPT), were also extracted from HapMap3 data set and served as reference populations representing continental Africa, Europe and East Asia, respectively. These data sets were then merged with the North Borneo samples by common SNP markers among the data sets to form the SC-SEA-NB merged data set. The web-tool, LiftOver was then used to update the chromosome position of each SNP marker. Markers which are obsolete were removed. Moreover, PASNP samples which are in first-degree relationships with others and discrepant with the reported gender were removed. This