Mendelian Randomization Analysis Identified Genes Pleiotropically Associated with the Risk and Prognosis of COVID-19 Di Liu1*, J
Total Page:16
File Type:pdf, Size:1020Kb
medRxiv preprint doi: https://doi.org/10.1101/2020.09.02.20187179; this version posted September 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Mendelian randomization analysis identified genes pleiotropically associated with the risk and prognosis of COVID-19 Di Liu1*, Jingyun Yang2,3*, Bowen Feng4, Wenjin Lu5, Chuntao Zhao6, Lizhuo Li7 1Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, Beijing, China 2Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, USA 3Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA 4Odette School of Business, University of Windsor, Windsor, ON, Canada 5Department of Mathematics, University College London, London, United Kingdom 6Brain Tumor Center, Cancer & Blood Diseases Institute, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA 7Emergency Department, Xuanwu Hospital, Capital Medical University, Beijing, China *The two authors contributed equally to this paper and share first authorship. Correspondence to Lizhuo Li, E-mail: [email protected] Running title Genes pleiotropically associated with risk and prognosis of COVID-19 NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. 1 medRxiv preprint doi: https://doi.org/10.1101/2020.09.02.20187179; this version posted September 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Abstract Objectives: COVID-19 has caused a large global pandemic. Patients with COVID-19 exhibited considerable variation in disease behavior. Pervious genome-wide association studies have identified potential genetic variants involved in the risk and prognosis of COVID-19, but the underlying biological interpretation remains largely unclear. Methods: We applied the summary data-based Mendelian randomization (SMR) method to identify genes that were pleiotropically associated with the risk and various outcomes of COVID-19, including severe respiratory confirmed COVID-19 and hospitalized COVID-19. Results: In blood, we identified 2 probes, ILMN_1765146 and ILMN_1791057 tagging IFNAR2, that showed pleiotropic association with hospitalized COVID-19 (β [SE]=0.42 [0.09], P=4.75×10-06 and β [SE]=-0.48 [0.11], P=6.76×10-06, respectively). Although no other probes were significant after correction for multiple testing in both blood and lung, multiple genes as tagged by the top 5 probes were involved in inflammation or antiviral immunity, and several other tagged genes, such as PON2 and HPS5, were involved in blood coagulation. Conclusions: We identified IFNAR2 and other potential genes that could be involved in the susceptibility or prognosis of COVID-19. These findings provide important leads to a better understanding of the mechanisms of cytokine storm and venous thromboembolism in COVID-19 and potential therapeutic targets for the effective treatment of COVID-19. 2 medRxiv preprint doi: https://doi.org/10.1101/2020.09.02.20187179; this version posted September 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Key words: coronavirus disease 2019; IFNAR2; gene expression quantitative trait loci; summary Mendelian randomization 3 medRxiv preprint doi: https://doi.org/10.1101/2020.09.02.20187179; this version posted September 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Introduction Coronavirus disease 2019 (COVID-19), which is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has created a large global pandemic and poses a serious threat to public health (1, 2). As of August 7, 2020, there were more than 19.3 million confirmed cases worldwide, with the total deaths exceeding 719,830 (3). SARS-CoV-2 is a highly pathogenic and transmissible coronavirus that primarily spreads through respiratory droplets and close contact (4). Seeking solutions to control the spread of COVID-19 and exploring effective treatments are of utmost importance to address the global challenge posed by COVID- 19. Therefore, there is pressing urgency to further identify the pathological mechanisms underlying COVID-19. Patients with COVID-19 exhibited considerable variation in disease behavior. Recently, genome-wide association studies (GWAS) have been performed to identify genetic variants associated with diagnosis of and prognosis of COVID-19 (5, 6), but biological interpretation of their findings remains largely unclear. Previous research found that approximately 88% of trait-associated genetic variants detected by GWAS resided in non-coding regions of the genome and might have regulatory functions on gene expression (7). In the context of COVID-19 research, it is therefore important to explore genes whose expressions were pleiotropically/potentially causally associated with susceptibility or the development of COVID-19. Different from conventional randomized controlled trials (RCTs), Mendelian randomization (MR), which uses genetic variants as the proxy to randomization (8), is 4 medRxiv preprint doi: https://doi.org/10.1101/2020.09.02.20187179; this version posted September 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. a promising tool to search for pleotropic/potentially causal effect of an exposure (e.g., gene expression) on the outcome (e.g., COVID-19 susceptibility). MR minimizes confounding and reverse causation that are commonly encountered in traditional association studies (8, 9), and has been successful in identifying gene expression sites or DNA methylation loci that are pleiotropically/potentially causally associated with various phenotypes, such as cardiovascular diseases, BMI, and rheumatoid arthritis (10-13). In this paper, we applied the summary data-based MR (SMR) method integrating summarized GWAS data for COVID-19 and cis- eQTL (expression quantitative trait loci) data to prioritize genes that are pleiotropically/potentially causally associated with the risk and prognosis of COVID-19. Methods Data sources eQTL data In the SMR analysis, cis-eQTL genetic variants were used as the instrumental variables (IVs) for gene expression. We performed SMR analysis for gene expression in blood and lung separately. For blood, we used the CAGE eQTL summarized data (14), which included 2765 participants. For lung, we used the V7 release of the GTEx eQTL summarized data (15), which included 278 participants. The eQTL data for blood and lung can be downloaded at https://cnsgenomics.com/data/SMR/#eQTLsummarydata. 5 medRxiv preprint doi: https://doi.org/10.1101/2020.09.02.20187179; this version posted September 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. GWAS data for COVID-19 The GWAS summarized data were provided by the COVID-19 host genetics initiative (6) and can be downloaded at https://www.covid19hg.org/results/. Three phenotypes were examined, including severe respiratory confirmed COVID-19, COVID-19 and hospitalized COVID-19. Details on definition of the phenotypes can be found in Table S1. The control groups were subjects from the general population without the specific phenotype, subjects who were COVID-19 negative based on prediction or self-report, or subjects who had COVID-19 without hospitalization, making a total of five comparisons: severe respiratory confirmed COVID-19 (n=536) vs. population (n=329,391; hereafter severe COVID-19); COVID-19 (n=6,696) vs. population (n=1073072; hereafter COVID-19), COVID-19 (n=3,523) vs. COVID-19 negative (n=36,634; hereafter COVID-19 negative); hospitalized COVID-19 (n=3,199) vs. population (n=897,488; hereafter hospitalized COVID-19), and hospitalized COVID- 19 (n=928) vs. COVID-19 without hospitalization (n=2,028; hereafter COVID-19 without hospitalization). In addition, we repeated the SMR analysis for severe respiratory confirmed COVID-19 using GWAS summarized data from the Severe COVID-19 GWAS Group (5) (hereafter severe COVID-19 NEJM). The study included 1,160 patients who had severe respiratory confirmed COVID-19, and 2,205 participants from the general population without COVID-19 as the control (Table 1). The definition of severe respiratory confirmed COVID-19 in this study was different from the one as defined by the COVID-19 host genetics initiative (Table S1). The GWAS summarized data can be downloaded at https://ikmb.shinyapps.io/COVID- 6 medRxiv preprint doi: https://doi.org/10.1101/2020.09.02.20187179; this version posted September 4, 2020. The copyright holder