Genetic Variants Contribute to Gene Expression Variability in Humans
Total Page:16
File Type:pdf, Size:1020Kb
INVESTIGATION Genetic Variants Contribute to Gene Expression Variability in Humans Amanda M. Hulse* and James J. Cai*,†,1 *Interdisciplinary Program in Genetics and †Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843-4458 ABSTRACT Expression quantitative trait loci (eQTL) studies have established convincing relationships between genetic variants and gene expression. Most of these studies focused on the mean of gene expression level, but not the variance of gene expression level (i.e., gene expression variability). In the present study, we systematically explore genome-wide association between genetic variants and gene expression variability in humans. We adapt the double generalized linear model (dglm) to simultaneously fit the means and the variances of gene expression among the three possible genotypes of a biallelic SNP. The genomic loci showing significant association between the variances of gene expression and the genotypes are termed expression variability QTL (evQTL). Using a data set of gene expression in lymphoblastoid cell lines (LCLs) derived from 210 HapMap individuals, we identify cis-acting evQTL involving 218 distinct genes, among which 8 genes, ADCY1, CTNNA2, DAAM2, FERMT2, IL6, PLOD2, SNX7, and TNFRSF11B, are cross-validated using an extra expression data set of the same LCLs. We also identify 300 trans-acting evQTL between .13,000 common SNPs and 500 randomly selected representative genes. We employ two distinct scenarios, emphasizing single-SNP and multiple-SNP effects on expression variability, to explain the formation of evQTL. We argue that detecting evQTL may represent a novel method for effectively screening for genetic interactions, especially when the multiple-SNP influence on expression variability is implied. The implication of our results for revealing genetic mechanisms of gene expression variability is discussed. UANTITATIVE genetic analysis has long focused on de- level (i.e., expression QTL, eQTL) (Montgomery and Dermit- Qtecting genetic variants that affect organismal pheno- zakis 2011). The difference in mean gene expression and its types. This is often done by contrasting mean differences in genetic control have been extensively examined in humans phenotypes among genotypes. Despite increasing evidence (Stranger et al. 2005, 2007b; Choy et al. 2008; Montgomery across several species for genetic control of phenotypic var- et al. 2010; Pickrell et al. 2010).Thedifferenceinvarianceof iance (Ansel et al. 2008; Hill and Mulder 2010; Jimenez- gene expression (i.e., gene expression variability) is geneti- Gomez et al. 2011), variance differences in phenotypes have cally controlled and likely to be selectable (Raser and O’Shea been largely ignored. Recently, however, the fact that vari- 2005; Blake et al. 2006; Maheshri and O’Shea 2007; Cheung ance of phenotypes is genotype dependent has inspired the and Spielman 2009; Zhang et al. 2009). A small number of detection of genetic variants associated with phenotypic var- initial efforts have been made to quantify the difference in iability (Pare et al. 2010; Sudmant et al. 2010; Yang et al. gene expression variability (that is, variance of gene expres- 2012). sion) (Ho et al. 2008; Li et al. 2010a; Mar et al. 2011; Xu et al. When gene expression level is considered as a heritable, 2011b). Yet, little attention has been paid to the genetic con- quantitative trait, statistical associations between mean gene trol of gene expression variability in humans. expression and genotype can be established to identify those In the present study, we seek to discover genome-wide genomic loci associated with or linked to gene expression genetic variants (i.e., SNPs) associated with differences in the variance of gene expression among individuals. We adapt the Copyright © 2013 by the Genetics Society of America double generalized linear model (dglm) (Verbyla and Smyth doi: 10.1534/genetics.112.146779 Manuscript received September 13, 2012; accepted for publication October 31, 2012 1998) to test for the inequality of expression variances and Supporting information is available online at http://www.genetics.org/lookup/suppl/ measure the contribution of genetic variants to the expression doi:10.1534/genetics.112.146779/-/DC1. 1Corresponding author: Department of Veterinary Integrative Biosciences, Texas A&M heteroscedasticity. The model has been recently used to de- University, TAMU 4458, College Station, TX 77843. E-mail: [email protected] tect genetic loci controlling phenotypic variability in chicken Genetics, Vol. 193, 95–108 January 2013 95 F2 crosses (Ronnegard and Valdar 2011). A likelihood ratio Genotype data test (LRT) of the dglm method allows us to compare the fitof Human polymorphism data were obtained from the HapMap a “full model” and a “mean model.” The full model takes into project (International Hapmap Consortium 2007) and the account the contribution of genotype to both the mean and pilot phase of The 1000 Genomes (1000G) Project (The the variance of gene expression simultaneously, while the 1000 Genomes Project Consortium 2010). The HapMap mean model takes into account only the contribution of ge- data release 28 includes genotypes of 4 million SNPs notype to the mean, ignoring the contribution to the variance. merged from phases 1, 2, and 3 of the project (International Asignificant result of the LRT indicates the nonrandom asso- Hapmap Consortium 2007). HapMap samples are from four ciation between the genotypes and the variances of gene different populations: Yorubans from Ibadan, Nigeria (YRI), expression. Here we designate the genomic loci statistically individuals of European origin in Utah (CEU), Han Chinese associated with gene expression variability expression variabil- from Beijing (CHB), and Japanese from Tokyo (JPT). The ity QTL (evQTL). The results of our genome-wide scan for raw data from the pilot study of the 1000G project contains evQTL provide a glimpse into the abundance and distribution 15 million SNPs from a total of 179 samples also from the of expression variability controlling variants in the human four HapMap populations. From the HapMap data, we genome. Given that the variance of a quantitative trait is extracted genotypes of individuals whose gene expression likely to differ under the influence of genetic interactions data are included in the GSE6536 and GSE11582 data sets. (Pare et al. 2010; Ronnegard and Valdar 2011), our evQTL The HapMap CEU and YRI populations consist of 60 trios. detecting method may be used to help detect the interactions We removed the 60 child samples from trios and used the between genetic variants controlling gene expression. remaining 210 unrelated samples in our analysis. From the 1000G data, we extracted genotypes of 153 and 149 individuals whose gene expression data were available Methods in GSE6536 and GSE11582, respectively. The 1000G project Expression data data give a snapshot of human polymorphism at an unprece- dented scale and resolution. The released 1000G low-coverage Gene expression data from the studies of Stranger et al. data captures nearly all (95%) of the common polymor- (2007a) and Choy et al. (2008) were downloaded from phism in a relatively ascertainment-free manner (The 1000 the Gene Expression Omnibus (GEO) website with accession Genomes Project Consortium 2010). One disadvantage of nos. GSE6536 and GSE11582, respectively. The two data using the 1000G genotype data in our case was that the sets were designated GSE6536 and GSE11582 thereafter. sample size became smaller. In addition, genotypes of 59 In the two studies, the expression levels were measured in individuals were extracted from the HapMap release and lymphoblastoid cell lines (LCLs) derived from HapMap in- paired with expression data in the RNA-seq data set. dividuals using two different platforms: Illumina human The minor allele frequency (MAF) and F statistic for whole-genome expression array (WG-6 version 1) for st SNPs, and LD estimates between SNP pairs (R2 and D9) were GSE6536 (Stranger et al. 2007a,b) and Affymetrix human computed using Matlab functions in PGEToolbox (Cai 2008). genome U133A array for GSE11582 (Choy et al. 2008). The To eliminate low-frequency polymorphisms, we discarded downloaded data had been normalized by using quantile SNPs with MAFs ,10% in the YRI, CEU, and CHB/JPT pop- normalization across replicates of a single individual and ulations. To control for the effect of population stratification, then median normalized across all HapMap individuals. we excluded SNPs with F $0.2. We also excluded SNPs that The downloaded data sets included 16,992 genes (19,440 st were found to be deviated from the Hardy-Weinberg (HW) probes) and 13,012 genes (20,995 probes) in GSE6536 and equilibrium by using hweStrata, an exact stratified test across GSE11582 data sets, respectively, and 11,633 shared genes. populations (Schaid et al. 2006). The Sequence Alignment/Map (SAM) files of the RNA se- quencing (RNA-seq) data for 60 individuals of European Regression models origin in Utah (CEU) HapMap individuals from the study For each transcript-SNP pair, the association between gene of Montgomery et al. (2010) were downloaded at the website expression level and the genotype is assumed to be linear. http://jungle.unige.ch/rnaseq_CEU60.WeusedSAMMate The conventional linear regression model is (Xu et al. 2011a) to estimate the expression level using the number of reads per kilobase of transcript per million mapped ¼ m þ a þ e ; e ð ; s2Þ; yi gi i i N 0 (1) reads (RPKM)