Evolutionary Triangulation to Refine Genetic Association Studies Of
Total Page:16
File Type:pdf, Size:1020Kb
Original Article 1041 Evolutionary Triangulation to Refine Genetic Association Studies of Spontaneous Preterm Birth Tracy A. Manuck, MD, MSCI1 Minjun Huang, MS2 Louis Muglia, MD3 Scott M. Williams, PhD4 1 Department of Obstetrics & Gynecology, University of North Address for correspondence Tracy A. Manuck, MD, MSCI, Division of Carolina at Chapel Hill, Chapel Hill, North Carolina Maternal-Fetal Medicine, Department of Obstetrics & Gynecology, 2 Department of Molecular and Systems Biology, Dartmouth College, University of North Carolina at Chapel Hill, 3010 Old Clinic Building, Hanover, New Hampshire CB#7516, Chapel Hill, NC 27599-7516 3 Department of Pediatrics, Cincinnati Children’sHospital, (e-mail: [email protected]). Cincinnati, Ohio 4 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio Am J Perinatol 2017;34:1041–1047. Abstract Objective The objective of this study was to apply evolutionary triangulation, a novel technique exploiting evolutionary differentiation among three populations with variable disease prevalence, to spontaneous preterm birth (PTB) genetic association studies. Study Design Single nucleotide polymorphism (SNP) allele frequency data were obtained from HapMap for CEU, GIH/MEX, and YRI/ASW populations. Evolutionary triangulation SNPs, then genes, were selected according to the overlaps of genetic population differences (CEU ¼ outlier). Evolutionary triangulation genes were then compared with three PTB gene lists: (1) top maternal and fetal genes from a large genome-wide association study of PTB, (2) 640 genes from the database for PTB, and (3) 118 genes from a recent systematic review. Empirical p-values were calculated to determine whether evolutionary triangulation enriched for putative PTB associating genes compared with randomly selected sample genes. Keywords Results Evolutionary triangulation identified 5/17 maternal genes and 8/16 fetal genes ► preterm birth from PTB gene list 1. From list 2, 79/640 were identified by CEU_GIH_YRI evolutionary ► evolutionary triangulation, and 57/640 were identified by CEU_ASW_MEX evolutionary triangulation. triangulation Finally, 20/118 genes were identified by evolutionary triangulation from gene list 3. For all ► genetic analyses, p < 0.001 except CEU_ASW_MEX analysis of list 3 where p ¼ 0.002. ► susceptibility Conclusion Genes identified in prior PTB association studies confirmed by evolu- ► racial disparity tionary triangulation should be prioritized for further genetic prematurity research. Preterm birth (PTB) remains a major public health problem Though sociodemographic factors can account for some of the worldwide,1 and is theleading cause of neonatal morbidityand differences in spontaneous PTB rates, it cannot account for all mortality of nonanomalous infants in the United States.2,3 differences. Variations in rates of prematurity by population Approximately two-thirds of PTB are spontaneous, and though are also seen within countries, including the United States, there appears to be a genetic component to spontaneous PTB where disparities are pronounced. Non-Hispanic black women susceptibility, genetic association studies have traditionally have a rate of spontaneous PTB that is 49% higher than non- yielded inconsistent results and have been difficult to repli- Hispanic white women.3,4 Hispanic women also have an cate. Spontaneous PTB is known to vary by population, and elevated rate of spontaneous PTB compared with non-Hispanic significant differences are seen in PTB rates across the world. white women. accepted Copyright © 2017 by Thieme Medical DOI https://doi.org/ April 20, 2017 Publishers, Inc., 333 Seventh Avenue, 10.1055/s-0037-1603508. published online New York, NY 10001, USA. ISSN 0735-1631. May 22, 2017 Tel: +1(212) 584-4662. 1042 Evolutionary Triangulation for Spontaneous Preterm Birth Manuck et al. Genetic variation (e.g., allele frequencies at specific single high and low FST, that is, several degrees of differentiation. Each nucleotide polymorphisms [SNPs]) is known to be popula- evolutionary triangulation SNP was then used to identify tion specific. All populations have unique genetic variation nearby genes as evolutionary triangulation genes, as recent inherent to each group. For example, in general, non-Hispa- literature suggests that variants in the regulatory regions or nic black populations are known to have more variation and near transcription start sites of genes may impact function and many more low-frequency variation compared with other possibly disease. For that reason,genes located within100 kb of populations. Across the genome, though some regions are each evolutionary triangulation SNP were compiled to com- highly conserved with minimal variations between indivi- prise the final evolutionary triangulation gene list. This meth- dual and between populations, at other sites, minor allele odology is summarized in ►Fig. 1. We generated two separate frequencies have the potential to vary between individuals evolutionary triangulation SNP lists, each containing a repre- and between populations. sentative non-Hispanic white, Hispanic, and non-Hispanic Evolutionary triangulation is a novel genetic filtering and black populations: (1) CEU_GIH_YRI and (2) CEU_MEX_ASW. prioritization method that capitalizes on the differences in Next, we compared the evolutionary triangulation gene allele frequencies between populations to refine results from lists generated earlier to three separate gene lists. Sponta- genetic association studies.5 The technique identifies alleles neous PTB gene list 1 comprised the top 20 maternal and fetal that differ among populations with different disease pre- SNPs from a multicenter genome-wide association study of valence, such as spontaneous PTB. It has previously success- 1,025 spontaneous PTB cases delivering less than 34 weeks’ fully identified genes implicated in lactose intolerance, gestation and 1,015 matched term controls with sponta- Smith–Lemli–Opitz, and albinism.5 neous labor; these top 20 SNPs were located in 32 different We hypothesized that genes identified by evolutionary genes.13 Spontaneous PTB gene list 2 consisted of 640 genes triangulation could refine results from previous spontaneous on the online database for PTB “dbPTB” Web site (www. PTB gene association studies. ptbdb.cs.brown.edu).14,15 This Web site, hosted by Brown University, provides a web-based aggregation site of pub- Materials and Methods lished genes previously associated with spontaneous PTB in one or more studies.14,15 Our third spontaneous PTB gene list First, we generated our evolutionary triangulation gene lists. comprised the 118 candidate genes identified from a recent We obtained baseline SNP allele frequency data for populations systematic review of 92 genetic studies of spontaneous PTB selected to represent non-Hispanic black, non-Hispanic white, published from 2007 to 2015.16 and Hispanic women using data from the International Haplo- Finally, we applied principles of evolutionary triangula- type Map Project HapMap6,7 (►Table 1). Each population was tion to each gene list. Genes found in both our evolutionary chosen based onpopulationdifferences inrates of PTB. HapMap triangulation gene list and the spontaneous PTB gene lists is an international consortium with publically available SNP were considered top candidate genes, and were examined for frequencies, genotypes, and haplotypes by population, and can evidence of expression in the myometrium or placenta. We – be applied to estimate genetic ancestry.7 9 Next, to assess also analyzed genes identified by evolutionary triangulation population differences, we calculated Wright’s fixation index using the STRING database (http://string-db.org)17 to assess 10,11 (FST), a metric assessing population genetic differences by functional relationships, interactions between genes, and to pairwise allele comparisons between groups, according to the classify genes into ontology groups as applicable. Cockerham and Weir’sformula.12 We first compared alleles Permutation testing was performed to determine between the first two populations, and then compared the whether evolutionary triangulation significantly enriched allelic differences between populations 1 and 3. Finally, we in the lists of genes associated with spontaneous PTB. compared differences between populations 2 and 3. A list of Specifically, this was performed by calculating empirical evolutionary triangulation SNPs was then generated according p-values, comparing the ability of evolutionary triangulation to the overlaps of low FST between the populations with to detect putative spontaneous PTB-associated genes from a similarly high rates of spontaneous PTB (YRI, GIH, MEX, and list of randomly sample genes. We first generated a null ASW) and high FST with CEU as the outlier population with (random) list of genes by sampling from the list of genes in lower rates of spontaneous PTB. We used several thresholds of the entire genome 10,000 times. For each resample, we Table 1 Selected populations used to generate evolutionary triangulation gene lists Population Source from HapMap Representative population CEU Utah residents with European ancestry Non-Hispanic white GIH Gujarati Indians (Houston, TX) Hispanic MEX Mexican ancestry (Los Angeles, CA) Hispanic YRI Yoruba (Ibadan, Nigeria) Non-Hispanic black ASW African ancestry in southwestern United States Non-Hispanic black American Journal of Perinatology Vol. 34 No. 11/2017 Evolutionary Triangulation for Spontaneous Preterm Birth Manuck et al. 1043 Fig.