Sibship Assignment to the Founders of a Bangladeshi Catla Catla Breeding Population Matthew G
Total Page:16
File Type:pdf, Size:1020Kb
Hamilton et al. Genet Sel Evol (2019) 51:17 https://doi.org/10.1186/s12711-019-0454-x Genetics Selection Evolution SHORT COMMUNICATION Open Access Sibship assignment to the founders of a Bangladeshi Catla catla breeding population Matthew G. Hamilton1* , Wagdy Mekkawy1,2 and John A. H. Benzie1,3 Abstract Catla catla (Hamilton) fertilised spawn was collected from the Halda, Jamuna and Padma rivers in Bangladesh from which approximately 900 individuals were retained as ‘candidate founders’ of a breeding population. These fsh were fn-clipped and genotyped using the DArTseq platform to obtain, 3048 single nucleotide polymorphisms (SNPs) and 4726 silicoDArT markers. Using SNP data, individuals that shared no putative parents were identifed using the pro- gram COLONY, i.e. 140, 47 and 23 from the Halda, Jamuna and Padma rivers, respectively. Allele frequencies from these individuals were considered as representative of those of the river populations, and genomic relationship matrices were generated. Then, half-sibling and full-sibling relationships between individuals were assigned manually based on the genomic relationship matrices. Many putative half-sibling and full-sibling relationships were found between indi- viduals from the Halda and Jamuna rivers, which suggests that catla sampled from rivers as spawn are not necessarily representative of river populations. This has implications for the interpretation of past population genetics studies, the sampling strategies to be adopted in future studies and the management of broodstock sourced as river spawn in commercial hatcheries. Using data from individuals that shared no putative parents, overall multi-locus pairwise estimates of Wright’s fxation index (FST) were low ( 0.013) and the optimum number of clusters using unsupervised K-means clustering was equal to 1, which indicates≤ little genetic divergence among the SNPs included in our study within and among river populations. Background International Development (USAID) to restock Bangla- In terms of quantity produced, Catla catla (Hamilton) deshi catla hatcheries with genetically diverse and non- is the sixth most important fnfsh aquaculture species, inbred broodstock [6]. Te collection of these fsh was 6 with approximately 2.8 × 10 tons produced globally in subsequently recognised as an opportunity to establish a 2015 [1]. It is primarily grown in South Asia, often on breeding population for the long-term genetic improve- a small scale in polyculture with other species [2–4]. In ment of the species. Te aims of this study were to (1) spite of its economic importance, in a number of coun- develop and describe a panel of catla DArTseq-based tries, including Bangladesh, the quality of catla seed pro- silicoDArT and single nucleotide polymorphism (SNP) duced in hatcheries has historically sufered from high markers; (2) determine the extent of genetic relationships levels of inbreeding, uncontrolled interspecifc hybridisa- between, and assign putative sibship to, ‘candidate found- tion and negative selection [2, 5]. In an efort to address ers’ of the breeding population; and (3) examine molec- these issues, in 2012, fertilised spawn was collected ular genetic variability among and within catla sampled from the Halda, Jamuna and Padma (Ganges) rivers, as from the three river systems. part of a project funded by the United States Agency for Methods *Correspondence: [email protected] Fertilised spawn were collected in 2012 from the Halda, 1 WorldFish, Jalan Batu Maung, 11960 Bayan Lepas, Penang, Malaysia Jamuna and Padma rivers as part of a program to replace Full list of author information is available at the end of the article inbred broodstock in Bangladeshi hatcheries [6]. Fish © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/ publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Hamilton et al. Genet Sel Evol (2019) 51:17 Page 2 of 8 from each river were reared separately by two commer- code from Gondro [15] that was modifed to replace cial hatcheries in the case of the Halda and Padma, and missing observations in SNP data with the average of the by one hatchery in the case of the Jamuna. At 1 year of observed allele frequency. Te G matrix was constructed age, approximately three hundred catla individuals were separately for individuals from each river. Clustering of randomly selected from each river as candidate found- genomic relationships using the ‘Ward2’ algorithm was ers of a breeding population. Tese fsh were fn-clipped implemented using the ‘hclust’ function [16]. Individuals and samples were archived in 2015, as part of the routine were reordered according to clustering and heatmaps of husbandry of the breeding population. Fin-clips were genomic relationships between individuals were gener- obtained from fsh that were anesthetized with clove oil ated. Tese heatmaps revealed the presence of full-sib- by removing an approximately 2-mm-wide sample from ling and half-sibling relationships between the sampled the extremities of the dorsal fn. Subsequently, fsh were individuals from each of the three river populations. To placed in tanks for monitoring and released back into account for sibship in subsequent analyses and produce a ponds once they had satisfactorily recovered from anaes- pedigree for genetic analyses of the breeding population, thesia. All fsh in the breeding population are managed sibship was assigned to individuals. Sibship was initially in accordance with the Guiding Principles of the Animal assigned using the program COLONY (version 2.0.6.4 Care, Welfare and Ethics Policy of the WorldFish Center [17]). For the COLONY analyses: (1) only SNPs with a [7]. MAF higher than 0.2 were retained, i.e. 571 from Halda, For the purpose of the current study, in 2016, archived 569 from Jamuna, 518 from Padma; (2) individuals from fn-clip samples were genotyped. Genotyping was con- diferent rivers were assumed to be unrelated; and (3) ducted using the DArTseq platform [8] according to the SNPs were assumed to be on separate chromosomes (i.e. laboratory procedures and analytical pipelines outlined unlinked). COLONY inputs were generated by means of in Lind et al. [9], except that the complexity reduction the ‘write_colony’ function (‘radiator’ package version method involved a combination of PstI and SphI enzymes 0.0.11 [18]), using the default settings except that ‘update (SphI replacing HpaII used in Lind et al. [9]). Raw allele frequency’ was set to true [19]. Errors noted in the DArTseq data are available at https ://doi.org/10.7910/ COLONY inputs generated by ‘write_colony’ were man- DVN/6LEU9 O [10]. ually corrected. Quality control procedures were implemented to Comparison of the G matrices with the pedigree-based ensure that only high-quality and informative SNPs, in additive (i.e. numerator) relationship matrices (A) [20], approximate linkage equilibrium, were retained for anal- derived from COLONY sibship assignments, revealed ysis. First, SNPs with an observed minor allele frequency that a large number of putatively full sibling relation- (MAF) lower than 0.05 or a rate of missing observa- ships in the G matrices were assigned as half-siblings by tions higher than 0.05 were excluded. Second, only one COLONY, particularly in the case of the Padma river fsh. randomly-selected SNP was retained from each unique Tis disparity was attributed to COLONY falsely splitting DNA fragment. Tird, as a measure of linkage disequilib- large full-sibship groups into multiple full-sibship groups rium (LD), pairwise squared Pearson’s correlations (r 2) of [19, 21, 22]. However, by using the COLONY sibship genotypic allele counts were computed, and then a ran- assignments, putatively unrelated individuals were iden- dom SNP from the pair with the highest r 2 was excluded tifed. For each river, these individuals were identifed iteratively until all pairwise r 2 values were lower than by (1) generating the A matrix (‘makeA’ function; ‘nadiv’ 0.2. Finally, SNPs that deviated from Hardy–Weinberg package version 2.16.0.0 [23]); (2) listing individuals that equilibrium (HWE) were fltered out. To achieve this, a were unrelated (aij = 0) to other individuals in A and then preliminary analysis (method outlined below) was under- removing these individuals from A; (3) appending to the taken to identify and remove close relatives, and thus list that was generated in step (2) the individual remain- reduce the risk of false identifcation of SNPs with geno- ing in A with the lowest average relationship with the typing problems (see [11]). Ten, data were converted other individuals and then removing this individual and to a ‘genind’ object using the ‘df2genind’ function (‘ade- its relatives (a ij > 0) from A; and (4) iteratively repeating genet’ package, version 2.1.1 [12]) and the deviation from step (3) until no individuals remained in A. Ten, allele HWE for each SNP and sampled population was tested frequencies using data from the listed individuals only using the ‘hw.test’ function (‘pegas’ package, version 0.10 were taken as estimates of allele frequencies in the sam- [13]). All the SNPs that signifcantly deviated from HWE pled river populations. Tese allele frequencies were pro- in any sampled population were excluded (classical χ2 vided as inputs to regenerate the G for each river. Tese test; P < 0.05 after Dunn–Šidák correction). G matrices were then used to manually assign sibship and To construct genomic relationship matrices (G), the dummy parents. Tis was achieved by (1) visually iden- method of VanRaden [14] was implemented using the tifying groups of individuals with genomic relationships Hamilton et al.