Joiret et al. BioData Mining (2019) 12:11 https://doi.org/10.1186/s13040-019-0199-7 RESEARCH Open Access Confounding of linkage disequilibrium patterns in large scale DNA based gene-gene interaction studies Marc Joiret1,2* , Jestinah M. Mahachie John1, Elena S. Gusareva1 and Kristel Van Steen1,3 *Correspondence:
[email protected] Abstract 1BIO3, GIGA-R Medical Genomics, Background: In Genome-Wide Association Studies (GWAS), the concept of linkage Avenue de l’Hôpital 1-B34-CHU, disequilibrium is important as it allows identifying genetic markers that tag the actual 4000 Liège, Belgium 2Biomechanics Research Unit, causal variants. In Genome-Wide Association Interaction Studies (GWAIS), similar GIGA-R in-silico medicine, Liège, principles hold for pairs of causal variants. However, Linkage Disequilibrium (LD) may Avenue de l’Hôpital 1-B34-CHU, 4000 Liège, Belgium also interfere with the detection of genuine epistasis signals in that there may be Full list of author information is complete confounding between Gametic Phase Disequilibrium (GPD) and interaction. available at the end of the article GPD may involve unlinked genetic markers, even residing on different chromosomes. Often GPD is eliminated in GWAIS, via feature selection schemes or so-called pruning algorithms, to obtain unconfounded epistasis results. However, little is known about the optimal degree of GPD/LD-pruning that gives a balance between false positive control and sufficient power of epistasis detection statistics. Here, we focus on Model-Based Multifactor Dimensionality Reduction as one large-scale epistasis detection tool. Its performance has been thoroughly investigated in terms of false positive control and power, under a variety of scenarios involving different trait types and study designs, as well as error-free and noisy data, but never with respect to multicollinear SNPs.