medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
A global overview of genetically interpretable comorbidities among
common diseases in UK Biobank
Guiying Dong1,2, Jianfeng Feng1,2, Fengzhu Sun3, Jingqi Chen1,2,*, and Xing-Ming Zhao1,2,*
1. Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, China
2. Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of
Education, Shanghai, 200433, China
3. Molecular and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
* Corresponding authors
Abstract
Background: Comorbidities greatly increase global health burdens, but the landscapes of their
genetic factors have not been systematically investigated.
Methods: We used the hospital inpatient data of 385,335 patients in UK Biobank to investigate the
comorbid relations among 439 common diseases. Post-GWAS analyses were performed to identify
comorbidity shared genetic risks at the genomic loci, network, as well as overall genetic
architecture levels. We conducted network decomposition for interpretable comorbidity networks to
detect the hub diseases and the involved molecules in comorbidity modules.
Results: 11,285 comorbidities among 439 common diseases were identified, and 46% of them were
genetically interpretable at the loci, network, or overall genetic architecture level. The comorbidities
affecting the same and different physiological systems showed different patterns at the shared
genetic components, with the former more likely to share loci-level genetic components while the
1
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
latter more likely to share network-level genetic components. Moreover, both the loci- and
network-level genetic components shared by comorbidities mainly converged on cell immunity,
protein metabolism, and gene silencing. Furthermore, we found that the genetically interpretable
comorbidities tend to form network modules, mediated by hub diseases and featuring physiological
categories. Finally, we showcased how hub diseases mediating the comorbidity modules could help
provide useful insights into the genetic contributors for comorbiditities.
Conclusions: Our results provide a systematic resource for understanding the genetic
predispositions of comorbidity, and indicate that hub diseases and converged molecules and
functions may be the key for treating comorbidity. We have created an online database to facilitate
researchers and physicians to browse, search or download these comorbidities
(https://comorbidity.comp-sysbio.org).
Keywords: Comorbidity, Comorbidity pattern, Genetic factors, Genetic association pattern,
Converged biological function, Hub diseases, Comorbidity module
Background
Comorbidity, the coexistence of more than one disease in a patient not by chance, presents great
challenges for disease diagnosis and treatment [1, 2]. Compared with single diseases, comorbidities
are usually associated with more adverse health outcomes, such as lower quality of life and higher
mortality rate, and with higher economic burden [3-5]. Understanding the mechanisms of
comorbidities may be helpful for their early diagnosis, treatment and management, thereby helping
reduce the global disease burden associated with comorbidity.
During the last decade, large-scale genome-wide association studies (GWASs) have found
overlapped genetic risks for a few frequently comorbid diseases at the genomic loci level, i.e.
single-nucleotide polymorphisms (SNPs) or genes, suggesting that there might be a molecular basis
of comorbid relations. For example, GWASs have uncovered 38 SNPs associated with both asthma
2 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
and allergic diseases [6], and 244 genome loci associated with ankylosing spondylitis, Crohn's
disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis [7]. Additionally,
Sánchez-Valle et al. found that disease interactions inferred from similarities between patients’
gene expression profiles have significant overlaps with epidemiologically documented comorbid
relations [8], further supporting the genetic basis of comorbidity. Moreover, several studies also
point out that diseases with high probability of concurrency tend to share more genes [9, 10]. These
findings have added useful information to inform the biological etiology of comorbidity [10, 11].
The malfunctions caused by disease risk loci can spread via cellular networks owing to molecular
interactions among genes. To this end, some studies capture the genetic overlaps between
comorbidities by network level evidence, including protein-protein interactions (PPIs) and
molecular pathways [9, 12, 13]. For example, Park et al. found a significantly positive correlation
between the number of shared PPIs and the extent of disease concurrency by combining
information on cellular interactions, disease–gene associations, and Medicare data [9]. Moreover, a
significantly increased number of shared pathways between cancers and comorbid Mendelian
diseases have also been observed [10]. These results indicate that dysfunctional entanglement in
molecular networks might contribute to the coexistence of comorbid diseases in a patient.
Additionally, some comorbidities have also been reported to be similar in overall genetic
architectures measured by genetic correlations, such as the widespread genetic correlations among
comorbid psychiatric disorders [14-17].
Due to the limited access to the matched epidemiological and genomic data of the same
population group, existing studies either used matched data for a limited number of diseases, or
collected large-scale genomic data and epidemiology data from different sources. However, the
separation of genetic and epidemiological data makes it tricky to decide whether the shared genetic
risks identified from one group can actually explain comorbidity identified in another group. In the
3 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
past few years, the UK Biobank (UKB) has collected hospital inpatient data and genetic data for
about four hundred thousand individuals, providing a unique opportunity to investigate the genetics
underlying comorbid relationships between hundreds of common diseases [18].
In this study, we take advantage of the large scale, matched epidemiological and genetic data
hosted in UKB to systematically investigate the comorbid relationships among 439 common
diseases as well as their shared genetic factors. We have identified the comorbidity shared genetic
components at loci and network levels, and performed the functional analyses on them to uncover
converged biological functions. Furthermore, the shared genetic patterns of the comorbidities
affecting the same and different physiological systems have been explored. Finally, we construct
and decompose two comorbidity networks to find the hub diseases that mediate lots of comorbid
relationships in comorbidity modules and to highlight the corresponding molecular mechanisms.
Our results provide a systematic resource of comorbidities among common diseases, as well as their
shared genetic risks (online database at https://comorbidity.comp-sysbio.org). The converged
biological molecules and functions identified in this study are responsible for many comorbidities,
which may serve as the key factors for management and treatment of comorbidity.
Methods
Population data: Population data used in this study is collected from the UKB [18]. More than
500,000 individuals aged 40-69 living in the UK were recruited to the assessment centers and
signed an electronic consent to allow a broad range of access to their anonymized data for
health-related research.
Disease selection and classification: In UKB, field-41270 is the summary diagnoses for 410,293
patients across all their hospital inpatient records, which are coded according to the International
Classification of Disease version 10 (ICD10). A total number of 11,727 ICD10 codes are recorded
with affected patients. We define common diseases as the level 2 ICD10 codes of chapter I ~ XIV
4 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
with prevalence > 0.1%, since most of the publically available GWAS summary statistics of the
UKB diagnoses are based on level 2 codes [19, 20]. Patients diagnosed with codes at or under level
2 are considered as suffering from the corresponding level 2 diseases. Due to the too-detailed
phenotype description of ICD10 codes, such as F20 Schizophrenia and F25 Schizoaffective
disorders, we further aggregated the highly similar ICD10 codes into one disease according to the
phecodes [21]. Phecode is a collection of manually curated phenotypes by experts with the
advantage of better aligning with diseases mentioned in clinical and genomic research. Finally, the
diseases are manually classified, mostly according to their affected physiological system while also
considering their origins [22].
Results
Comorbidities among common diseases in UKB
439 common diseases (prevalence > 0.1%) are selected for comorbidity analysis from UKB hospital
inpatient data (see Methods for details, Additional file 2: Table S1), covering 385,335 patients. The
average age of the patients for their first hospital diagnosis is 54, with more female patients than
male patients (55% VS. 45%).
In all, 11,285 comorbidities are identified, involving 438 out of the 439 diseases, with D21
(“other benign neoplasms of connective and other soft tissue”) being the only exception (RR >1,
P-value < 4.1e-6 with Bonferroni correction for all disease-pairs with RR > 1, Fig. 1a, Addiitonal
file 2: Table S2; see Methods for details). Most diseases have less than 100 comorbid partners
(average of 51, Fig. 1b). We observe that diseases with high prevalences tend to have more
comorbid partners (Pearson’s correlation, r = 0.69, P-value = 3.3e-64; Additional file 1: Fig. S3).
For example, the top three diseases with the most comorbid partners--Hypertension (I10),
Hyperlipidemia (E78), and Type 2 diabetes (E11), have a prevalence of 27.5%, 13.1%, and 7.1%,
respectively.
5 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
To validate the credibility of the comorbid relationships found through our analysis, we firstly
compare them to the comorbid relationships identified in a recent study by Blair et al [24]. Blair et
al. studied the comorbid relationships between Mendelian diseases (14 out of the 439 diseases used
in our study) and complex diseases (62 out of the 439 diseases). Among the comorbid relationships
they identified, 184 reach the prevalence criteria set for our analysis (for each comorbid relationship,
patients with both diseases have to exceed 1% of patients for at least one of the two diseases). In
comparison, the number of comorbid relationships we find between the 14 Mendelian diseases and
62 complex diseases is 175, 152 (86.9%) of which are among the comorbid relationships found by
Blair et al. (Fig. 1c). We examine the 32 comorbid relationships identified by Blair et al. but missed
by us, and find that 23 of them fulfill RR>1 and are significant in single test (P-value < 0.05), but
fail to pass the multi-test correction. This is possibly due to that in our analysis, we test more than
12000 pairs of diseases (RR > 1) at once, which incurs a strict multi-test correction with a trade-off
between type I error (false positives) and type II error (false negatives). In other words, our analysis
may err on the more conservative side. On the other hand, among the 23 comorbid relationships
identified by us but not by Blair et al., 22 of them have literature evidence supporting their
existence, indicating their potential validity (Additional file 2: Table S3). Additionally, we also
compare the comorbidities identified by us to those identified by Jensen et al [25] (Additional file 3:
Supplementary Text). The comparison results show that the comorbidities identified by us and
Jensen et al. have a significant overlap (OR=6.8, P=0, Fisher exact test). Taken together, we
consider that our results can be confirmed by previous results, and are of high reliability.
6 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
Importantly, benefiting from the long follow-up time (25 years) and the extensive coverage of
diseases provided by UKB hospital inpatient data, the collection of comorbidities reported here is
the largest one of common disease comorbidity by far, and thus provide an atlas of comorbidities
useful for further analysis.
Prevalent comorbidity intra- and inter-physiological systems
In the classic Human Disease Network (HDN), diseases are connected if they share at least one
disease-associated gene [22]. We have observed several clusters in HDN formed by diseases
affecting the same physiological systems, such as cardiovascular diseases and nutritional diseases.
This inspires us to explore whether diseases affecting the same physiological system also tend to be
comorbidity. We divide the 439 diseases into 24 categories, mostly according to their affected
physiological systems while also considering their origins (e.g. “Neoplasm”) (Additional file 2:
Table S1), and calculate the disease comorbidity tendency among the 24 categories (see Methods).
As expected, we find significantly prevalent comorbidity within 19 out of 24 categories (Fig. 1d).
For example, 89% (32/36) of all possible disease-pairs within the “Spine” category are
comorbidities (adjusted P= 5e-5), and all the 4 nutritional diseases are comorbid with one another
(adjusted P = 8.9e-3). To get a visual and more confident observation, we construct a
high-confidence comorbidity network comprised of comorbidities with RR>15 (Fig. 1e), and are
able to observe small and large clusters of comorbidities affecting the same physiological system,
many of which are supported by previous studies for examples, comorbidity clusters affecting the
“Cardiovascular” [39], the “Ophthalmological” [40], the “Ear, Nose, Throat” [41], and the
“Psychiatric” [42] categories. These findings suggest shared mechanisms for diseases affecting the
same physiological systems.
7 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
Interesting, apart from the intra-category comorbidity, we also find significantly more comorbid
relationships between 41 pairs of different categories (Fig. 1d). Diseases affecting the “Respiratory”
system are significantly more likely to be comorbid with diseases from 11 other categories,
followed by diseases affecting the “Metabolic” system with diseases from 10 other categories, both
suggesting shared etiologies beyond the boundaries of physiological systems. Moreover, metabolic
diseases have overall the highest comorbid rate with diseases of other categories, which is
consistent with many reports on the involvement of altered metabolism in a wide range of diseases
[43, 44]. It is noteworthy that sometimes the significant cross-category comorbidity patterns are
mediated by a small number of diseases that have a large number of comorbid partners. For
example, in the high-confidence comorbid network (Fig. 1e), psychiatric disorders comorbid with
neurological disorders predominantly through F05 (“Delirium, not induced by alcohol and other
psychoactive substances”) and F06 (“Other mental disorders due to brain damage and dysfunction
and to physical disease”). In fact, for 33 out of the 41 significant comorbid category-pairs, more
than 50% of the inter-category comorbid relationships are mediated through no more than 3 “hub”
diseases. Take one of the most centered hubs as an example, E66 (“Obesity”) mediates more than
half of the comorbid relationships between the “Nutritional” category and three other categories
“Psychiatric”, “Spine”, and “Joint”. This is consistent with previous findings that obesity is usually
associated with mental, joint, and spinal diseases, such as depressive disorders, anxiety disorders,
gout, and spondylosis [45-47]. As a result, understanding the mechanisms underlying comorbidities,
especially those mediated by the “hub” diseases, may provide a way forward to understand how
they happen, and seek to manage or treat them simultaneously.
46% of the comorbidities are genetically interpretable
Previous studies have shown that diseases with more genes or PPIs shared are more likely to be
comorbidities [9, 10]. However, it still remains unclear that how many comorbidities share genetic
8 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
components (deemed as genetically interpretable), and whether there are any specific patterns in
shared genetic components for different types of comorbidities. To explore these two questions, we
capture the genetic associations of comorbidities at 3 levels, i.e. loci level (SNP and genes),
network level (PPI and pathways) and overall genetic architecture level (genetic correlation) (see
Methods).
All available GWAS summary statistics based on UKB subjects are collected from geneAtlas
[19], covering 332 out of 439 diseases used in this study and comprising 8,212 comorbidies. We
find 46% (3,766) of comorbidities have shared genetic components: 147, 1,463, 1,803, and 1,959
comorbidities share SNPs, genes, PPIs, and pathways, respectively; and 1,970 comorbidities have
significant genetic correlations (Fig. 2a, Additional file 2: Table S4, S5, S6, S7, S8, see Methods).
Comorbidities are significantly more likely to share genetic components, compared with
non-comorbidities across all genetic levels as well as their aggregation (Fig. 2a, see Methods).
Additionally, we also find that 98%, 70% and 100% of the genetically interpretable comorbidities
share significantly more SNPs, genes and pathways than expected, respectively (see Methods).
Only 5% of comorbidities share significantly more PPIs than expected, possibly due to the
universality of PPIs. Moreover, the genetically interpretable comorbidities have a significant
overlap with disease-pairs reported by Park et al., which have shared genes, PPIs or co-expressed
genes (P= 2.4e-6, Fisher exact test), further confirming the genetic associations of comorbidities
identified in this study [9]. As the genetic information and the epidemiological information come
from the same subjects, we consider our results relatively robust against the usual confounding
factors for genetic analysis of comorbidity, such as differences in genetic background. Thus, our
results strongly support the existence of genetic predispositions for almost half of the identified
comorbid relationships. We have created a queryable online database to facilitate researchers and
physicians to explore the comorbidities of interest (https://comorbidity.comp-sysbio.org).
9 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
We then explore whether there are any specific patterns in shared genetic components for
different types of comorbidities, i.e. exploring differences in the types of genetic components
mediating intra- and inter-category comorbidity. Overall, we find that comorbidities of 9 (37.5%)
intra-categories and 52 (18.8%) inter-categories significantly share genetic components, suggesting
a high probability of genetic involvement for comorbidities affecting the same physiological
systems (Fig. 2b). Interestingly, as in Figs. 2b and 2c, intra-category comorbidities are slightly more
likely to share loci-level genetic components (P=4.6e-4 for SNPs; difference not significant for
genes after Bonferroni correction; Fisher exact test) compared to inter-category comorbidities,
while the latter are more likely to share network-level genetic components (P-values =4.6e-17 and
8.3e-17 for PPIs and pathways, respectively). There is no significant difference in how likely
comorbidities of intra- and inter-category have genetic correlations. These results suggest that
comorbidities affecting the same and different physiological systems may have different biological
origins with the former tend to directly originate from pleiotropic loci and the latter tend to
indirectly originate from converged functions.
The above statistical observation is best illustrated by the diseases affecting male genital organs
(Fig. 2b). The comorbidities within “Male genital organs”category tend to share genes (adjusted
P-value = 4.1e-2, FDR corrected). In fact, 50% (8/16) of the comorbidities within this category
share disease-associated genes. 20 genes are involved in these intra-category comorbid relationships,
and are mainly related to the human leukocyte antigen (HLA) complex (such as HLA-DQA1 and
HLA-DRB1), histone clusters (such as HIST1H1B and HIST1H2AJ), and tumors (such as TERT and
NOTCH4 for prostate cancer) [48, 49]. In contrast, comorbid relationships of inter-category
involving the “Male genital organs” category tend to share pathways (72/142, 51%). The KEGG
pathway “cell adhesion molecules cams” is shared by half (36/72) of these cross-category comorbid
10 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
relationships, followed by “antigen processing and presentation”, which is shared by 32 of these
comorbidities.
To summarize, based on matched genetic and epidemiological data, we find that almost half
(46%) of comorbidities identified in this study are genetically interpretable, indicating a strong
genetic role in the origin of comorbidity. Among these genetically interpretable comorbidities, the
intra-category and inter-category ones tend to share genetic components in different levels (loci VS.
network), suggesting their different biology origins.
Genetically interpretable comorbidities converge on cell immunity, protein metabolism, and
gene silencing
To enhance the understanding about the biological mechanisms of comorbidities, we conduct
functional analyses of the loci level and network level genetic components. The overall genetic
architecture is not analyzed here, as it reflects statistical correlations but not detailed functions (see
Discussion).
We firstly examine the genome-wide distribution and the deleteriousness of the
comorbidity-SNPs (see Methods). We find that comorbidity-SNPs tend to be located in noncoding
RNA (P-values = 1.2e-44 and 4.1e-147) and intergenic regions (P-values = 1.7e-54 and 2.9e-48)
(Fig. 3a), but with slightly higher CADD scores (P-values = 9.8e-129 and 0) than other-disease
SNPs (SNPs that are disease-associated but not shared by comorbidities) and non-disease SNPs (Fig.
3b). These results suggest that comorbidity-SNPs are slightly more deleterious, possibly through
playing important roles in gene transcriptional regulation [50]. We have also examined the effects
of comorbidity-SNPs on splicing by dbscSNC splicing score [35], but found no difference among
the comorbidity-SNPs, other disease-SNPs and non-disease SNPs (Additional file 1: Fig. S4).
Additionally, we find that 73% of the comorbidity-SNPs locate in a small region of the genome
the HLA region (chr6:29,691,116–33,054,976), and 51% of the comorbidities which are
11 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
interpretable through SNPs share at least one HLA-region SNP. The HLA region is well known for
its high degree, long-ranged LD blocks, which may help explain the pleiotropy of these SNPs in
comorbidities [51]. SNPs in this region have been previously predicted to be relevant for multiple
autoimmune diseases through disrupting the regulation of immune-related genes [50]. Consistent
with this, most of the top comorbidities with the largest number of shared HLA-region SNPs
contain autoimmune diseases or contain a significant autoimmune-related origin. For example, E03
“Other hypothyroidism”, J45 “Asthma”, K90 “Intestinal malabsorption”, E10 “Insulin-dependent
diabetes mellitus”, and G35 “Multiple sclerosis”. Finally, we test whether the genomic location and
CADD score distributions of comorbidity-SNPs are mainly determined by the HLA-region variants.
After removing the HLA-region SNPs, comorbidity-SNPs are still overrepresented in noncoding
RNA regions (P-values=3.5e-47 and 1.2e-61), and still have significantly higher CADD scores
(P-values=0.02 and 1.5e-15) than other-disease SNPs and non-disease SNPs, but are no longer
overrepresented in intergenic regions (Additional file 1: Figs. S5A, S5B). We conclude that
comorbidity-SNPs, no matter whether in the HLA region or not, are slightly more deleteriousness
and more likely to locate in noncoding RNA region than other SNPs.
Goh et al. reported that most (78%) disease-genes (i.e. genes associated with at least one disease
recorded in Online Mendelian Inheritance in Man) are not essential genes critical for survival, and
that disease-genes are less likely to be housekeeping genes that express in all tissues [22]. We then
test whether our comorbidity-genes behave similarly or differently. Here, we obtain 2,852 essential
genes, which are human orthologs of mouse genes whose disruptions are embryonically or
postnatally lethal (see Methods). We find that only 17% of the disease-genes (identified by us) and
17% of the comorbidity-genes are essential (Fig. 3c), although disease-genes and
comorbidity-genes are more enriched in essential genes than non-disease genes (P-values = 2.9e-38
and 8.8e-15, respectively). Essential genes are reported to have a tendency to encode hub proteins
12 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
in human interactome, and play an important role in maintaining normal developmental and/or
physiological function [22]. These results indicate that most comorbidity-genes are functionally
peripheral in human interactome, and their mutations are compatible with survival into reproductive
years so that these comorbidity phenotypes are conveyed in a population. Although most
comorbidity-genes are not essential, we find a higher probability of loss of function intolerance (pLI)
for comorbidity-genes, compared to other disease-genes as well as non-disease genes (P-values =
0.01 and 3.6e-11, respectively, t test; Fig. 3d). Removing the essential genes, this trend remains
unchanged, suggesting the higher pLI distribution of comorbidity-genes is not just due to the
essential genes (Additional file 1: Fig. S6). To examine whether comorbidity-genes tend to be
housekeeping genes, we summarize the number of tissues each gene is expressed in based on the
gene expression data for 53 tissues in GTEx [37]. We find that comorbidity-genes are more likely to
be expressed in the majority of tissues, compared to other disease-genes and non-disease genes (P =
4.7e-4 and 8.1e-44, respectively, two-sided Mann–Whitney U-test; Fig. 3e). Considering the high
pleiotropy of HLA regions, we recalculate the properties of the comorbidity-genes after removing
HLA variants. In this case, we find that most comorbidity-genes are still nonessential and
comorbidity-genes still tend to have higher pLI (P-values=0.04 and 2.9e-10) and express in more
tissues(P-values=4.4e-3 and 1.2e-32) compared with other-disease genes and non-disease genes
(Additional file 1: Figs. S5C, S5D, S5E). As a result, we consider that most comorbidity-genes are
very important for normal biological mechanisms, though most of them are not essential for
survival, and disrupted comorbidity-genes may have clinical consequences affecting slightly more
tissues than other disease-genes.
For the network level genetic components, the genes involved in the top 10 PPIs shared by the
most numbers of comorbidities are significantly enriched in GO terms related to biological
processes of gene silencing and protein metabolism including localization, acetylation,
13 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
ubiquitination, catabolism (see Methods). These 10 PPIs account for 18% of the PPI interpretable
comorbidities. Moreover, as shown in Fig. 3f, the top 10 pathways shared by the most numbers of
comorbidities are predominantly immune-related processes, and correspond to 56% of all the
pathway interpretable comorbidities. Most of the top pathway involved diseases are autoimmune or
inflammatory diseases, such as J45 “Asthma”, K20 “Oesophagitis”, M06 “Other rheumatoid
arthritis”, L40 “Psoriasis”, and E10 “Insulin-dependent diabetes mellitus”. The findings based on
network level genetic components suggest a phenomenon that a significant portion of genetically
interpretable comorbidities may converge on a handful of biological mechanisms, with the most
common mechanisms related to cell immunity, gene silencing and protein metabolism. This
phenomenon is further supported by the loci level genetic components: the top 10 genes can
interpret as much as 41% of the gene interpretable comorbidities, and are enriched in
immune-related GOs, such as “interferon gamma mediated signaling pathway”, “antigen processing
and presentation of peptide antigen”, and “regulation of T cell mediated cytotoxicity”. Moreover,
after removing HLA-region SNPs, we still find a few genetic components interpreting many
comorbidities (the top 10 SNPs, genes, PPIs and pathways can interpret 23%, 30%, 15%, 34% of
comorbidities interpretable through SNP, gene, PPI, pathway, respectively.). The top enriched GO
terms are “protein localization to chromosome telomeric region” and “beta-catenin-tcf complex
assembly”, and the top enriched pathways are “rna pol i rna pol iii and mitochondrial transcription”
and “meiosis” (Additional file 1: Figs. S5F).
“Hub” disease-mediated genetically interpretable comorbidity modules
The fact that a small number of genetic components can interpret a large portion of the genetically
interpretable comorbidities, inspires us to examine whether the “small world” phenomenon exist in
the shared genetic architectures of those comorbidities. Therefore, we construct two comorbidity
networks by connecting comorbid diseases that share loci level genetic components (denote as the
14 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
LG-network) and network level genetic components (denote as the NG-network), respectively. As
expected, for the LG- and NG-networks, their degrees follow the power-law distribution
(Additional file 1: Fig. S7), and they have the attributes of small worldness based on their average
clustering coefficient and average shortest path length (sigma=1.15 and 1.03, respectively; see
Methods). In previous sections, we have shown that many of the inter-category comorbidities are
mediated through hub diseases. Given the “small world” properties of the LG- and NG-networks,
we hypothesize that there are comorbidity modules in the two networks, possibly featured by hub
diseases and specific genetic components. In order to test this hypothesis, we perform network
decomposition to detect comorbidity modules by the Louvain algorithm [38].
We first arbitrarily define nodes (diseases) connected with more than 25% of all nodes in each
network as “universal hub diseases”. It is not appropriate to assign the “universal hub diseases” into
any single comorbidity module, as each module usually contains far fewer than 25% of all nodes.
Seven and thirteen “universal hub diseases” are found for the LG-network and the NG-network,
respectively (Fig. 4, Additional file 2: Table S9). The “universal hub diseases” are usually known to
co-occur with many diseases. For example, Hypertension (I10) connect to 235 (80%) diseases in the
NG-network, and is well-known for having heavy comorbidity burdens [52].
Next, based on the remained nodes that are not “universal hub diseases”, we have detected 11
comorbidity modules for the LG-network and 10 comorbidity modules for the NG-network
(Modularity Q = 0.40 and 0.32 , respectively) (Figs. 4a and 4b, Additional file 2: Table S10, S11).
Overall, the module sizes (number of nodes in a module) range from 2 to 49, and the LG- and
NG-networks have 8 and 9 comorbidity modules whose sizes are larger than 5, respectively. Each
comorbidity module is assigned with “featured categories”, which are the categories that the
diseases in this module are significantly overrepresent (Fisher exact test, adjusted P-value <= 0.05,
FDR corrected). Within each module (size > 5), if diseases with top 3 largest number of
15 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
within-module comorbidities mediate more than half of the within-module edges, we define them as
“local hub diseases” of the module. As shown in Table 1, we have identified “featured categories”
for all comorbidity modules in the LG- and NG-networks, and “local hub diseases” for 7 and 7
comorbidity modules in the LG- and NG-network, respectively. We find that most local hub
diseases belong to the featured categories of their modules, highlighting the prevalence of
genetically interpretable comorbidities within the same physiological system. Nonetheless, most
modules have more than one featured category, demonstrating that genetically interpretable
comorbidities are not limited by physiological boundaries.
We will describe several cases to illustrate how the network and module structures can help us
understand the genetics underlying large numbers of comorbid relationships. Firstly, some
categories are consistently grouped together. In both networks, we identify modules that feature
Male genital organs-Urinary, Dermatological-Neoplasm, and Neurological-Spine categories,
supporting the existence of genetic associations between the involved comorbidities from multiple
genetic levels (Table 1). Secondly, modules can help distinguish different comorbidity tendencies
and the corresponding genetic mechanisms among diseases of the same categories. In the
LG-network, Neoplasm diseases are featured in two modules LG-module2 (Male genital
organs-Neoplasm-Urinary) and LG-module8 (Dermatological-Neoplasm). Neoplasm diseases in
LG-module2 are mainly prostate (C61), bladder (C67), urinary organs (D41) and intestinal
(C18,C19;C20) cancers, while Neoplasm diseases in LG-module8 are all skin cancers (C43, C44,
D04) (Fig. 5a, Additional file 2: Table S10). Except for TERT and CLPTM1L, genes shared by
Neoplasm diseases and other diseases in the two modules are different, reflecting diverse
mechanisms among Neoplasms related to different tissues. Thirdly, hub diseases may provide a new
perspective for understanding comorbidity with different categories of diseases. For example, as
shown in Fig. 5b, the Psychiatric disorder F17 “Mental and behavioral disorders due to use of
16 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
tobacco” is a hub diseases of LG-module5 (Cardiovascular-Respiratory), and is comorbid with 5
respiratory diseases (emphysema (J43), pyothorax (J86;J93), respiratory failure (J96), chronic
obstructive pulmonary disease (COPD, J42;J44), pneumonia (J18)), 3 cardiovascular diseases
(atherosclerosis (I70), aortic aneurysm and dissection (I71), peripheral vascular diseases (I73)), and
lung cancer (C34). The most often genes shared by these comorbidities are IREB2 and CHRNA3,
located in 15q25, a well-known region for association with COPD, lung cancer, and smoking [53].
The IREB2 gene encodes an iron-responsive element-binding protein (IRP) that regulates the iron
metabolism [53]. CHRNA3 encodes the neuronal nicotinic acetylcholine receptor, and its mutation
is associated with lung function and COPD severity in ever-smokers [54]. Though there are many
other possible pathways individually associated with the above diseases, our analysis indicates that
iron metabolism and the neuronal nicotinic acetylcholine receptor pathway may be the top
candidates to examine when study the comorbidity of these diseases with F17. Lastly, “universal
hub diseases” connect to multiple modules, sometimes through different genetic components. In the
NG-network, the universal hub disease Obesity (E66) have multiple connections to NG-module1
(Bone-Joint-Muscular-Neurological-Spine) and NG-module6 (Hepatobiliary pancreas) (Fig. 4b).
Pathways shared by Obesity and Hepatobiliary pancreas diseases are mostly related to lipoprotein
metabolism, such as “lipid digestion mobilization and transport”, “fatty acid triacylglycerol ketone
body metabolism”, “cytosolic sulfonation of small molecules”, “mitochondrial protein import”,
while also related to biological oxidations, myogenesis etc (Fig. 5c, Additional file 2: Table S11). In
comparison, pathways shared by Obesity and Bone, Joint, Muscular, Neurological, Spine diseases
from NG-module1 are mostly related to immunity, cell adhesion and transcription. In summary, the
genetically interpretable network and modules can provide insights through hub diseases to the
understanding of the molecular mechanisms underlying comorbidity, and may help prioritize target
genes and pathways for designing new treatment.
17 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
Discussion
In this study, we have profiled the comorbid relationships among the common diseases in UKB,
and systematically investigated the genetic risks shared by comorbidities. We report an atlas of
11,285 comorbid disease-pairs among 438 common diseases, which is by now the largest in scale.
We find that 46% of the comorbidities with available genetic information share genetic components
in at least one of the three levels loci, network, or overall genetic architecture, and show that
comorbidities affecting the same and different physiological systems tend to share different levels
of genetic components. Functional analyses show that the loci level genetic components shared by
comorbidities tend to be more deleterious (for SNPs) and affect more tissues (for genes), and both
loci and network level genetic components mainly converge on cell immunity, protein metabolism,
and gene silencing related functions. We also construct two comorbidity networks by genetically
interpretable comorbidities, and show that hub diseases mediating most connections among
comorbidity modules can provide useful insights into the genetic contributors for comorbidities
affecting different physiological systems. Therefore, our results provided a detailed comorbid and
genetic landscapes of many common diseases, which may be valuable for guiding the early
diagnosis, management, and treatment of comorbidity.
Our results highlight shared genetic predispositions or mechanisms underlying comorbidity,
which may provide useful information for drug discovery. Theoretically, it is plausible to repurpose
existing drugs that target shared genetic components of a pair of comorbid diseases, to treat the
comorbidity of the two diseases. In an exploratory test, we are able to identify 8,458
drug-comorbidity relationships with the drug known to target the comorbidity-genes (Additional
file 2: Table S12). Interestingly, some of these drugs are indeed used in the population with the
corresponding comorbidity. For example, the gene EDNRA, a known target of aspirin, is shared by
I20 “Angina pectoris” and I25 “Chronic ischaemic heart disease”, and we find that 65% of the
18 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
people suffering from both diseases report usage of aspirin in UKB. Moreover, the indication of
aspirin for I20 “Angina pectoris” and I25 “Chronic ischaemic heart disease” have been reported by
the Comparative Toxicogenomics Database (CTD, http://ctdbase.org/; Supplementary Data 12) [55].
Besides this encouraging case, we also found a surprising case concerning Lansoprazole, which
targets MAPT, a gene shared by E66 “Obesity” and J84 “Other interstitial pulmonary diseases”.
16.7% of the people suffering from both diseases ever used lansoprazole according to the UKB. We
find the indications of lansoprazole only include E66 “Obesity” in CTD, but lansoprazole was
reported to be able to induce interstitial lung disease [56], suggesting that some of the comorbidity
of E66 “Obesity” and J84 “Other interstitial pulmonary diseases” may be due to the use of
lansoprazole. Though very preliminary, these initial results shed light on the possibility that our
resource of comorbidities and their shared genetic components may help with drug discovery as
well as avoid severe side-effects for treating comorbidity in the future.
Notably, as much as 24% of the 8,218 comorbidities have significant genetic correlations,
supporting the polygenic architecture of complex diseases, while nearly half of them can not be
readily interpretable at either the loci or the network level (Additional file 1: Fig. S8). This indicates
unidentified genetic information within the genetic architecture that may require further
investigation, such as the copy number variants (CNVs). CNVs are the structural chromosomal
variants greater than 1 kb in size, and usually have dosage effects for genes. Several common
diseases have been reported to be associated with rare CNVs, such as Autism, schizophrenia [57,
58]. In addition, we find that genetic correlation is positively and significantly correlated with RR (r
= 0.39, P-value = 1.9e-72, Additional file 1: Fig. S9A), and with phenotype similarity of comorbid
diseases (r = 0.22, P-value = 0.03, Additional file 1: Fig. S9B, phenotype similarity pre-calculated
by van Driel et al [59].). Moreover, phenotype similarity and RR have a positive and significant
19 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
correlation (r = 0.27, P-value = 1.5e-8, Additional file 1: Fig. S9C). These results suggest that
genetic architecture might interpret comorbidity by contributing to symptom similarities.
One possible limitation with our study is the sample size. Though the overall sample size is not
small, we do not have that many cases for each disease or each comorbid disease-pair. As such, we
may fail to identify some comorbid disease-pairs that are disproportionally represented in our
dataset. Also, the GWAS analyses might miss variants with very small effects. Nevertheless, our
study is the first and largest study that combines the epidemiological and genetic information of the
same subjects to explore the genetic components underlying comorbidity. The matched phenotype
and genotype data makes our results less affected by population-related confounding factors. Based
on our current findings, one interesting future direction is to integrate more samples from other
studies, and incorporate more types of data, such as imaging data and quantitative traits, in order to
analyze the endophenotypes that bridge diseases and deepen our understanding of the mechanisms
of comorbidity. Conclusions
In summary, we have performed, for the first time, a systematic analysis of comorbid relations
among common diseases as well as their genetic associations based on the matched epidemiological
and genetic data of the same subjects. Our results illustrate the comorbidity tendency and the
genetic association patterns of comorbidities of intra- and inter-physiological systems, and indicate
that the hub diseases and converged biological molecules and functions may be the key for treating
comorbidity. An online database of UKB-Comorbidity is created to facilitate researchers and
physicians to search the comorbidities of interest.
Acknowledgements
Not applicable.
Funding
20 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
This work was partly supported by National Key R&D Program of China (2020YFA0712403,
2018YFC0910500), National Natural Science Foundation of China (61932008, 61772368), and
Shanghai Municipal Science and Technology Major Project (2018SHZDZX01).
Availability of data and materials
The epidemiological data used in the present study is available from UK Biobank with restrictions
applied. Data were used under license and thus not publicly available. Access to the UK Biobank
data can be requested through a standard protocol (https://www.ukbiobank.ac.uk/register-apply/).
GWAS summary statistics used in this study can be accessed from http://geneatlas.roslin.ed.ac.uk.
GRCh37.p13 is downloaded from https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.25/.
The CADD scores are downloaded from https://cadd.gs.washington.edu. The dbscSNC scores are
downloaded from ftp://dbnsfp:[email protected]. The GTEx eQTL data are
downloaded from https://gtexportal.org/home/datasets. The MAGMA tool can be found at
https://ctg.cncr.nl/software/magma. The Mouse Genome Informatics database is at
http://www.informatics.jax.org. The MSigDB database is at
https://www.gsea-msigdb.org/gsea/msigdb/index.jsp. The UMLS description can be found at
https://www.nlm.nih.gov/research/umls/index.html. The ExAC database is at
https://gnomad.broadinstitute.org.
Ethics approval and consent to participate
This work used data from UK Biobank (project 1954).
Competing interests
The authors declare no competing interests.
Consent for publication
Not applicable.
Author’s contributions
21 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
Xing-Ming Zhao conceived the project. Guiying Dong collected, analyzed the data, and wrote the
manuscript. Jingqi Chen and Xing-Ming Zhao supervised the data analyses, and revised the
manuscript. Jianfeng Feng provided the UK Biobank data. Jianfeng Feng and Fengzhu Sun give
suggestions to the manuscripts.
Author details
1Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai,
China. 2Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan
University), Ministry of Education, Shanghai, China. 3Molecular and Computational Biology
Program, University of Southern California, Los Angeles, CA, USA.
Code availability
All code used for data preparation and analysis are available at
https://github.com/Arya-bioinformatics/ukb-comorbidity.
References
1. He F, Zhu G, Wang YY, Zhao XM, Huang DS. PCID: A Novel Approach for Predicting
Disease Comorbidity by Integrating Multi-Scale Data. IEEE/ACM Trans Comput Biol
Bioinform. 2017;14:678-686.
2. MacMahon S. Multimorbidity: a priority for global health research. The Academy of
Medical Sciences: London, UK 2018.
22 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
3. Jani BD, Hanlon P, Nicholl BI, McQueenie R, Gallacher KI, Lee D, Mair FS. Relationship
between multimorbidity, demographic factors and mortality: findings from the UK Biobank
cohort. BMC Med. 2019;17:74.
4. Nelis SM, Wu YT, Matthews FE, Martyr A, Quinn C, Rippon I, et al. The impact of
co-morbidity on the quality of life of people with dementia: findings from the IDEAL study.
Age Ageing. 2019;48:361-367.
5. Mannino DM, Higuchi K, Yu TC, Zhou H, Li Y, Tian H, et al. Economic Burden of COPD
in the Presence of Comorbidities. Chest. 2015;148:138-150.
6. Zhu Z, Lee PH, Chaffin MD, Chung W, Loh PR, Lu Q, et al. A genome-wide cross-trait
analysis from UK Biobank highlights the shared genetic architecture of asthma and allergic
diseases. Nat Genet. 2018;50:857-864.
7. Ellinghaus D, Jostins L, Spain SH, Cortes A, Bethune J, Han B, et al. Analysis of five
chronic inflammatory diseases identifies 27 new associations and highlights disease-specific
patterns at shared loci. Nat Genet. 2016;48:510-518.
8. Sánchez-Valle J, Tejero H, Fernández JM, Juan D, Urda-García B, Capella-Gutiérrez S, et al.
Interpreting molecular similarity between patients as a determinant of disease comorbidity
relationships. Nat Commun. 2020;11:1-13.
9. Park J, Lee DS, Christakis NA, Barabasi AL. The impact of cellular networks on disease
comorbidity. Mol Syst Biol. 2009;5:262.
23 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
10. Melamed RD, Emmett KJ, Madubata C, Rzhetsky A, Rabadan R. Genetic similarity
between cancers and comorbid Mendelian diseases identifies candidate driver genes. Nat
Commun. 2015;6:1-10.
11. Khandaker GM, Zuber V, Rees JMB, Carvalho L, Mason AM, Foley CN, et al. Shared
mechanisms between coronary heart disease and depression: findings from a large UK
general population-based cohort. Mol Psychiatry. 2019;25:1477-1486.
12. Lee DS, Park J, Kay KA, Christakis NA, Oltvai ZN, Barabasi AL. The implications of
human metabolic network topology for disease comorbidity. Proc Natl Acad Sci U S A.
2008;105:9880-9885.
13. Menche J, Sharma A, Kitsak M, Ghiassian SD, Vidal M, Loscalzo J, et al. Uncovering
disease-disease relationships through the incomplete interactome. Science.
2015;347:1257601.
14. Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK, Bras J, Duncan L, et al. Analysis of
shared heritability in common disorders of the brain. Science. 2018;360:eaap8757.
15. Lee PH, Anttila V, Won H, Feng YA, Rosenthal J, Zhu Z, et al. Genomic Relationships,
Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders. Cell.
2019;179:1469-1482.
16. Walters RK, Polimanti R, Johnson EC, McClintick JN, Adams MJ, Adkins AE, et al.
Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with
psychiatric disorders. Nat Neurosci. 2018;21:1656-1669.
24 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
17. Solberg BS, Zayats T, Posserud MB, Halmoy A, Engeland A, Haavik J, et al. Patterns of
Psychiatric Comorbidity and Genetic Correlations Provide New Insights Into Differences
Between Attention-Deficit/Hyperactivity Disorder and Autism Spectrum Disorder. Biol
Psychiatry. 2019;86:587-598.
18. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank
resource with deep phenotyping and genomic data. Nature. 2018;562:203-209.
19. Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nat
Genet. 2018;50:1593-1599.
20. Claire C. Rapid gwas of thousands of phenotypes for 337,000 samples in the uk biobank.
Neale Lab. 2017;744.
21. Wei WQ, Bastarache LA, Carroll RJ, Marlo JE, Osterman TJ, Gamazon ER, et al.
Evaluating phecodes, clinical classification software, and ICD-9-CM codes for
phenome-wide association studies in the electronic health record. PLoS One.
2017;12:e0175508.
22. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL. The human disease network.
Proc Natl Acad Sci U S A. 2007;104:8685-8690.
23. Hidalgo CA, Blumm N, Barabasi AL, Christakis NA. A dynamic network approach for the
study of human phenotypes. PLoS Comput Biol. 2009;5:e1000353.
25 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
24. Blair DR, Lyttle CS, Mortensen JM, Bearden CF, Jensen AB, Khiabanian H, et al. A
nondegenerate code of deleterious variants in Mendelian loci contributes to complex disease
risk. Cell. 2013;155:70-80.
25. Jensen AB, Moseley PL, Oprea TI, Ellesoe SG, Eriksson R, Schmock H, et al. Temporal
disease trajectories condensed from population-wide registry data covering 6.2 million
patients. Nat Commun. 2014;5:4022.
26. 1000 Genomes Project Consortium. A map of human genome variation from
population-scale sequencing. Nature. 2010;467:1061-1073.
27. de Leeuw CA, Mooij JM, Heskes T, Posthuma D. MAGMA: generalized gene-set analysis
of GWAS data. PLoS Comput Biol. 2015;11:e1004219.
28. Chatr-Aryamontri A, Oughtred R, Boucher L, Rust J, Chang C, Kolas NK, et al. The
BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017;45:D369-D379.
29. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP.
Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739-1740.
30. Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR, et al. Genetic
Consortium for Anorexia Nervosa of the Wellcome Trust Case Control C, Duncan L, et al:
An atlas of genetic correlations across human diseases and traits. Nat Genet.
2015;47:1236-1241.
26 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
31. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group
of the Psychiatric Genomics C, et al. LD Score regression distinguishes confounding from
polygenicity in genome-wide association studies. Nat Genet. 2015;47:291-295.
32. Bodenreider O. The Unified Medical Language System (UMLS): integrating biomedical
terminology. Nucleic Acids Res. 2004;32:D267-270.
33. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from
high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164.
34. Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the
deleteriousness of variants throughout the human genome. Nucleic Acids Res.
2019;47:D886-D894.
35. Jian X, Boerwinkle E, Liu X. In silico prediction of splice-altering single nucleotide variants
in the human genome. Nucleic Acids Res. 2014;42:13534-13544.
36. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of
protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285-291.
37. GTEx Consortium. Genetic effects on gene expression across human tissues. Nature.
2017;550:204-213.
38. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in
large networks. J Stat Mech. 2008;2008:P10008.
27 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
39. Cleland JG, Daubert JC, Erdmann E, Freemantle N, Gras D, Kappenberger L, et al. The
effect of cardiac resynchronization on morbidity and mortality in heart failure. N Engl J
Med. 2005;352:1539-1549.
40. Mokhles P, Gorcom Lv, Schouten JSAG, Berendschot TTJM, Beckers HJM, Webers CAB.
Contributing ocular comorbidity to end-of-life visual acuity in medically treated glaucoma
patients, ocular hypertension and glaucoma suspect patients. Eye.
2020;https://doi.org/10.1038/s41433-41020-40991-41430.
41. de Azevedo AF, Pinto DC, de Souza NJ, Greco DB, Goncalves DU. Sensorineural hearing
loss in chronic suppurative otitis media with and without cholesteatoma. Braz J
Otorhinolaryngol. 2007;73:671-674.
42. Buckley PF, Miller BJ, Lehrer DS, Castle DJ. Psychiatric comorbidities and schizophrenia.
Schizophr Bull. 2009;35:383-402.
43. Simon TG, King LY, Chong DQ, Nguyen LH, Ma Y, VoPham T, et al. Diabetes, metabolic
comorbidities, and risk of hepatocellular carcinoma: Results from two prospective cohort
studies. Hepatology. 2018;67:1797-1806.
44. Rios JA, Cisternas P, Arrese M, Barja S, Inestrosa NC. Is Alzheimer's disease related to
metabolic syndrome? A Wnt signaling conundrum. Prog Neurobiol. 2014;121:125-146.
45. Sheng B, Feng C, Zhang D, Spitler H, Shi L. Associations between obesity and spinal
diseases: a medical expenditure panel study analysis. Int J Environ Res Public Health.
2017;14:183.
28 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
46. Scott KM, Bruffaerts R, Simon GE, Alonso J, Angermeyer M, de Girolamo G, et al. Obesity
and mental disorders in the general population: results from the world mental health surveys.
Int J Obes (Lond). 2008;32:192-200.
47. Jung JH, Song GG, Ji JD, Lee YH, Kim JH, Seo YH, et al. Metabolic syndrome: prevalence
and risk factors in Korean gout patients. Korean J Intern Med. 2018;33:815-822.
48. Pedrosa AR, Graca JL, Carvalho S, Peleteiro MC, Duarte A, Trindade A. Notch signaling
dynamics in the adult healthy prostate and in prostatic tumor development. Prostate.
2016;76:80-96.
49. Rafnar T, Sulem P, Stacey SN, Geller F, Gudmundsson J, Sigurdsson A, et al. Sequence
variants at the TERT-CLPTM1L locus associate with many cancer types. Nat Genet.
2009;41:221-227.
50. Chen J, Tian W. Explaining the disease phenotype of intergenic SNP through predicted long
range regulation. Nucleic Acids Res. 2016;44:8641-8654.
51. Watanabe K, Stringer S, Frei O, Mirkov MU, de Leeuw C, Polderman TJC, et al. A global
overview of pleiotropy and genetic architecture in complex traits. Nat Genet.
2019;51:1339-1348.
52. Noh J, Kim HC, Shin A, Yeom H, Jang SY, Lee JH, et al. Prevalence of comorbidity among
people with hypertension: the Korea National health and nutrition examination survey
2007-2013. Korean Circ J. 2016;46:672-680.
29 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
53. Ziolkowska-Suchanek I, Mosor M, Gabryel P, Grabicki M, Zurawek M, Fichna M, et al.
Susceptibility loci in lung cancer and COPD: association of IREB2 and FAM13A with
pulmonary diseases. Sci Rep. 2015;5:13502.
54. Kaur-Knudsen D, Nordestgaard BG, Bojesen SE. CHRNA3 genotype, nicotine dependence,
lung function and disease in the general population. Eur Respir J. 2012;40:1538-1544.
55. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, et al. The
Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res.
2017;45:D972-D978.
56. Hwang KW, Woo OH, Yong HS, Shin BK, Shim JJ, Kang EY. Reversible
lansoprazole-induced interstitial lung disease showing improvement after drug cessation.
Korean J Radiol. 2008;9:175-178.
57. Pinto D, Pagnamenta AT, Klei L, Anney R, Merico D, Regan R, et al. Functional impact of
global rare copy number variation in autism spectrum disorders. Nature. 2010;466:368-372.
58. Marshall CR, Howrigan DP, Merico D, Thiruvahindrapuram B, Wu W, Greer DS, et al.
Contribution of copy number variants to schizophrenia from a genome-wide study of 41,321
subjects. Nat Genet. 2017;49:27-35.
59. van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA. A text-mining analysis
of the human phenome. Eur J Hum Genet. 2006;14:535-542.
30 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
Figures
Fig. 1 Comorbidities identified in UKB. a Schematic illustration of how RR is calculated for each pair of
diseases. b The distribution of the number of comorbidities for UKB diseases. c Comparison of the comorbid
relationships identified by us (“UKB”) and by Blair et al. d Disease comorbidity tendency intra- and
inter-physiological categories. Color and size of the circles represent the proportions of comorbidities in all
disease-pairs within a category or between two categories. The deeper the color and the larger the size of a circle,
the higher the proportion is. Star represents adjusted P-value < 0.05 (FDR corrected). e The high confidence
comorbidity network constructed by only including comorbidities with RR > 15. Each node represents a disease,
and each edge represents a comorbid relationship between two diseases. The color code of a node represents the
category of the disease. The size of each node is proportional to the number of its comorbidties (not restricted to
RR > 15).
Fig. 2 Five types of genetic components shared by comorbidities. a Ratios of comorbidities interpretable
through SNP, gene, PPI, pathway, and genetic correlation, compared with non-comorbidities. b Intra- and
inter-category comorbidity that can be significantly interpreted through the five types of genetic components.
Each circle is divided into five parts, representing the five types of genetic components. Color-filled parts of each
circle represent the types of genetic components that can significantly inteprete the corresponding intra- or
inter-category comorbidity. No circle is drawn where none of the five types of genetic components is significant. c
Ratios of comorbidities of intra- and inter-categories interpretable through SNP, gene, PPI, pathway, and genetic
correlation.
Fig. 3 Characteristics of the genetic components shared by comorbidities. a The ratios of SNPs located in
genic region, intergenic region, and noncoding RNA region for comorbidity-SNPs, other disease-SNPs, and
non-disease SNPs. b CADD score distributions for comorbidity-SNPs, other disease-SNPs, and non-disease SNPs.
c Overlaps between comorbidity-genes and essential genes, and between disease-genes and essential genes. d The
pLI distributions of comorbidity-genes, other disease-genes, and non-disease genes. e The ratios of genes
expressed in few or many types of tissues, for comorbidity-genes, other disease-genes, and non-disease genes. f
Top ten pathways that are shared by the largest numbers of comorbidities.
31 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license .
Fig. 4 Loci-level (a) and Network-level (b) genetically interpretable comorbidity networks. a, b Each circle
represents a disease. Colors of the circles (diseases) correspond to the categories of the diseases. “Universal hub
diseases” are located at the center of each network, and diseases that belong to the same comorbidity module are
grouped close to each other. For each module, the featured categories are annotated. The black border of the nodes
indicates that they are the local hub diseases of the comorbidity modules. Diseases that belong to neither universal
hub diseases nor a comorbidity module are not included in this figure.
Fig. 5 Case studies for genetically interpretable comorbidity networks. a-c Circles, triangles, and squares
represent diseases, genes, and pathways, respectively. Colors of the circles (diseases) correspond to the categories
of the diseases, following the same color code as in Fig. 1. a Neoplasm diseases, their comorbid diseases, and
shared genes in LG-module2 and LG-module8. b Comorbidities of the hub disease F17 (Mental and behavioural
disorders due to use of tobacco) in LG-module5, and their shared genes. c Pathways shared by the universal hub
disease E66 (Obesity) and its comorbidities in NG-module1 and NG-module6. Only the top 5 pathways shared by
the largest numbers of comorbidities are shown.
Supplementary information
Additional file 1: Supplementary Figures and Tables. Number of comorbidities for diagnosis time windows of
no, same day, Reactome pathways, KEGG pathways, BioCarta pathways and PID pathways (top). Disease and comorbidity pathway size distribution of Reactome and non-Reactome when only using pathways with size <= 200 (bottom) (Fig. S2). Correlation between prevalences and number of comorbidities (Fig. S3). Splicing score of comorbidity-SNPs, other disease-SNPs and non-disease SNPs (Fig. S4). Characteristics of the genetic components shared by comorbidities when removing HLA-region variants (Fig. S5). pLIs of comorbidity-genes, other disease-genes and non-disease genes when removing essential genes (Fig. S6). Degree distributions of diseases in comorbidity networks that share loci and network level genetic components (Fig. S7). Comorbidity overlap interpreted by SNPs, genes, PPIs, pathways and genetic correlations (Fig. S8). Correlations among genetic correlation (rg), relative risk (RR), and phenotype similarity (PheSim) of comorbidities. (Fig. S9). 32 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . Additional file 2: Supplementary Tables. Common disease summary information (Table S1). Comorbidity among common diseases in UK Biobank (Table S2). UK Biobank comorbidity supported by other studies but not Blair’s results (Table S3). Comorbidities interpreted by SNPs (Table S4). Comorbidities interpreted by genes (Table S5). Comorbidities interpreted by PPIs (Table S6). Comorbidities interpreted by pathways (Table S7). Comorbidities interpreted by genetic correlations (Table S8). Summary of universal hub diseases in LG- and NG-networks (Table S9). Comorbidity modules in LG-network (Table S10). Comorbidity modules in NG-network (Table S11). Comorbidity-drug-prescription relations derived from comorbidity genes, drug targets and UK Biobank self-report medications (Table S12). Additional file 3: Supplementary Text. 33 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . Table 1 Genetically interpretable comorbidity modules and their hub diseases. Disease module Module size Enriched categories Hub diseases LG-module1 33 Neurological-Spine Obesity (E66) | Dorsalgia (M54) Hyperplasia of prostate (N40) | Other disorders of urinary system LG-module2 33 Male genital organs-Neoplasm-Urinary (N39) | Other disorders of bladder (N32) LG-module3 31 Gastrointestinal-Immunological / Angina pectoris (I20) | Chronic renal failure (N18) | Pulmonary LG-module4 29 Endocrine-Hematological-Ophthalmological embolism (H26) Mental and behavioural disorders due to use of tobacco (F17) | LG-module5 20 Cardiovascular-Respiratory Other chronic obstructive pulmonary disease (J42;J44) | Pneumonia, organism unspecified (J18) Cholelithiasis (K80) | Fibrosis and cirrhosis of liver (K74) | Other LG-module6 16 Hepatobiliary pancreas diseases of liver (K76) Phlebitis and thrombophlebitis (I80) | Other arthrosis (M19) | LG-module7 14 Joint Gonarthrosis [arthrosis of knee] (M17) | Other soft tissue disorders, not elsewhere classified (M79) | Pulmonary embolism (I26) LG-module8 10 Dermatological-Neoplasm Other malignant neoplasms of skin (C44) NG-module1 49 Bone-Joint-Muscular-Neurological-Spine / NG-module2 41 Hematological-Infectious / Hyperplasia of prostate (N40) | Other disorders of urinary system NG-module3 39 Male genital organs-Urinary (N39) Mental and behavioural disorders due to use of tobacco (F17) | Insulin-dependent diabetes mellitus (E10) | Acute myocardial NG-module4 33 Cardiovascular-Ophthalmological infarction (I21;I22) | Other systemic involvement of connective tissue (M35) Diaphragmatic hernia (K44) | Gastritis and duodenitis (K29) | Other NG-module5 27 Gastrointestinal diseases of digestive system (K92) | Other diseases of anus and rectum (K62) NG-module6 12 Hepatobiliary pancreas Cholelithiasis (K80) Carcinoma in situ of cervix uteri (D06;N87) | Excessive, frequent NG-module7 10 Female genital organs and irregular menstruation (N92) | Female genital prolapse (N81) NG-module8 9 Dermatological-Neoplasm Other malignant neoplasms of skin (C44) NG-module9 6 Ear, Nose, Throat Nasal polyp (J33) 34 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . A UK Biobank D ...... Bone Disease 1 Disease 2 ● Bone Cardiovascular ● ● Cardiovascular Dermatological ● *● ● Dermatological Ear, Nose, Throat ● ● ● ● Ear, Nose, Throat Endocrine ● ● ● *● ● Endocrine Female genital organs ● ● ● ● *● ● Female genital organs Gastrointestinal ● ● ● ● ● *● ● Gastrointestinal Hematological ● ● ● ● ● ● ● ... Hematological * ● Hepatobiliary pancreas ...... ● ● ● ● ● ● ●* *● Hepatobiliary pancreas ● Immunological ● ● ● ● ● ● *● ● *● ● Immunological Infectious ● ● ● ● ● ● ● ● ● ● ● Relative Risk Infectious Joint ● ● ● ● ● ● ● *● *● ● *● Joint ● Male genital organs *● ● ● ● ● ● ● ● ● *● ● *● Male genital organs ● Metabolic ● ● ● ● ● ● ● ● ● ● ● ● *● Metabolic ● Muscular 100 Muscular ● *● ● ● *● ● *● *● *● ● *● ● ● *● ● B C Neoplasm ● ● ● ● ● ● ● ● ● ● ● *● ● ● *● ● Neoplasm Neurological 80 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Neurological Nutritional ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Nutritional ● ● ● Ophthalmological 60 ● ● ● ● ● ● *● ● ● ● ● *● ● *● ● ● ● *● ● Ophthalmological Oral Oral ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● *● ● 23 152 32 Psychiatric 40 Psychiatric ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● *● ● Respiratory ● ●* ● ● *● ● ● ● ● ● ● *● ● * ● ● *● ●* ● ● *● Respiratory ● ● Spine 20 ● ● ● ● ● *● ● *● *● *● *● *● *● ● ● *● ● ● ●* ● *● *● Spine ● Urinary 0 ● ● ● ● ● ● ● ● ● ● ● ● Number of diseases Number Urinary ● ● ● ● *● ● * ● ● * *● *● ● * ● UKB Blair et al. * * * * * 0 1 2 Comorbid ratio, % Number of comorbidities (log10) 0 20 40 60 80 100 M41 M43 I21 L02 M51 M48 E L73 N73 N80 I24 D27 M50 E53 E61 N97 G50 C49 G99 D25 Bone C43 G54 N70 N83 E55 G35 D26 Cardiovascular L81 M62 G82 M88 G95 C80 N36 N35 N28 N21 D48 Dermatological N23 M90 N31 G83 E21 N88 M76 N20 E04 C90 C64 Ear, Nose, Throat C73 E06 N13 D35 N85 M70 Endocrine C82 D63 E22 E89 N90 J42 M07 C34 N02 C91 Female genital organs J39 N18 B00 J38 N19 E23 K12 N76 Cardiovascular J95 I12 Gastrointestinal L40 J10 D70 L53 L27 J86 N03 N08 Hematological K81 K82 I47 I44 J81 J15 E87 E27 D61 J47 M31 N61 N64 E83 Hepatobiliary pancreas I42 J43 J13 A49 M35 A41 I51 I50 J90 M32 D46 N60 J96 D68 D47 Immunological D24 N62 I34 J84 D69 I31 L89 C25 D89 C92 J69 D45 K74 Infectious B97 H83 I05 J18 I27 M05 C22 D73 C15 E14 E16 E86 N17 G62 K86 Joint N72 D06 I38 M06 E88 K70 I07 B95 C16 K83 I08 I35 D86 B18 Male genital organs K02 K08 I46 G63 L08 A40 K85 M45 M00 I85 I49 I82 K75 K05 K06 Metabolic I87 K72 I86 H36 K07 K01 H20 E10 K31 Muscular H47 H35 M86 L97 K09 H44 F10 N45 Neoplasm H34 G93 I89 I98 K76 H18 I70 K65 N50 I71 K55 Neurological N43 H54 L03 H33 D32 J03 C71 I74 I73 K66 I78 J35 H52 M95 I72 G30 K56 D37 L71 H00 Nutritional J36 H43 I77 C18 H01 G40 G91 K91 H40 H57 J34 I62 I61 F52 Ophthalmological H21 K43 Ophthalmological J33 G31 C19 H04 N47 J32 G81 G20 K46 J31 K50 N48 I67 Oral C67 I69 K51 D41 H50 E07 H05 H49 I60 I65 Psychiatric N30 F05 H10 H53 I63 N42 H11 F06 Respiratory G51 D33 D07 G45 N41 F31 Spine H90 F43 F20 M20 L84 H72 H73 Urinary H91 F33 F60 L85 M77 H60 H93 Psychiatric H81 F45 L57 D04 Ear, Nose, Throat H65 H92 F32 H71 H74 H61 medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . C 35 A within-category 60 comorbidity 30 cross-category non-comorbidity P=8.3e-17 P=0.677 50 P=8.3e-271 P=4.6e-17 25 P=0.024 40 20 P=8.2e-62 P=0 30 P=2.5e-98 15 P=2.6e-128 20 10 P=4.6e-4 Ratio of comorbidities Ratio of comorbidities 10 P=2.1e-20 5 explained by genetics, % explained by genetics, % 0 0 SNP PPI SNP PPI Gene Total Gene Pathway Genetic Pathway Genetic correlation B Bone Bone Throat Cardiovascular , SNP Cardiovascular Dermatologicalose Gene N Dermatological rine Ear, PPI Ear, Nose, Throat doc En tinal Pathway Endocrine Female genital organs Genetic correlation Female genital organs astrointes G Gastrointestinal Hematological gical Hematological Hepatobiliary pancreas Hepatobiliary pancreas munolo Im Immunological fectious In Infectious Joint Joint Male genital organs Male genital organs Metabolic Metabolic Muscular Muscular Neoplasm ical Neoplasm Neurological olog Neurological Nutritionalthalm Nutritional Oph Ophthalmological Oral chiatric Oral Psy Psychiatric Respiratory Respiratory Spine Spine Urinary Urinary medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . A B C 1.00 60 1198 248 2605 Region 50 0.75 Genic region Intergenic region 40 Comorbidity-genes Noncoding RNA Essential genes 0.50 30 Ratio of variants CADD score 20 0.25 4393 909 1944 10 0 0.00 Essential genes Disease-genes Comorbidity-SNPs Comorbidity-SNPs Non-diseaseOther SNPs disease-SNPs Non-disease SNPsOther disease-SNPs D E F Number of comorbidities 0 200100 400300 500 600 700 1.2 1.00 1 cell_adhesion_molecules_cams 1.0 No. of tissues 1 2 antigen_processing_and_presentation 0.75 2-5 0.8 6-42 3 interferon_gamma_signaling 43-53 0.6 4 interferon_signaling pLI 0.50 5 intestinal_immune_network_for_iga_production 0.4 Ratio of genes 6 mhc_class_ii_antigen_presentation 0.2 0.25 7 meiosis Top 10 pathways 0.0 8 pd1_signaling -0.2 0.00 9 generation_of_second_messenger_molecules 10 phosphorylation_of_cd3_and_tcr_zeta_chains Comorbidity-genes Comorbidity-genes Non-disease genesOther disease-genes Non-disease genesOther disease-genes medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . Endocrine-Hematological-Ophthalmological A Gastrointestinal-Immunological Hepatobiliary pancreas Male genital organs-Neoplasm-Urinary H35 I24 K30 I65 K81 N43 K92 K41 J40 N47 H52 K82 L53 D75 K22 E21 H40 I45 C67 N48 M35 K50 H25 K83 N50 M81 M86 E05 E14 A04 D41 I20 K70 K90 H33 K42 N42 K80 E16 N40 M31 K29 D64 K43 B96 C61 N41 E10 D86 K52 N18 K40 K74 K75 G35 N21 F41 B37 M10 M06 D50 E83 K76 M07 N94 N39 N32 K51 E27 H26 H91 I85 N23 G62 D70 N35 I12 K20 N92 C19 J47 L97 D69 N84 N13 G56 N81 N20 M72 H43 I77 H36 K62 I47 K26 E89 D51 K56 M20 K58 M67 I84 L40 K63 K44 M05 M18 C18 K57 I25 K60 E78 I10 E11 J32 J33 E03 J45 J31 G45 G93 G51 N62 L03 I50 I67 G31 N60 F31 N28 F32 M43 I74 M79 F05 C50 I71 I21 L82 M48 M25 M54 L98 C43 J22 I35 I26 I80 I83 G99 G54 I73 M85 J18 D04 M13 G20 C44 J30 I63 C34 M47 M19 E66 M51 M15 I95 D68 J84 I70 F17 L30 L85 M16 H53 L57 M17 M24 H83 M84 J69 K14 J42 H81 N80 J96 C64 M22 M75 K25 G47 N83 K66 G40 J43 L72 M65 J86 H11 J90 Joint Neurological-Spine Cardiovascular-Respiratory Dermatological-Neoplasm Cardiovascular-Ophthalmological Male genital organs-Urinary E53 Gastrointestinal I42 B F31 G45 N50 K59 B34 N31 G40 Ear, Nose, Throat F10 H00 I71 N17 D73 G35 N47 J43 K75 K60 L30 L98 J31 K62 N48 B07 J32 J18 I47 D04 H04 C34 J47 K85 K22 C19 I69 K92 I27 J33 F32 F17 J84 C44 M75 C43 L57 H65 C18 N12 K44 I95 C61 I84 F41 N40 J34 K50 I08 L82 L72 I63 H92 E10 H93 A09 M10 K29 H36 I21 I50 K63 M35 K90 N42 N20 L02 D51 I35 N30 H33 N32 I65 N39 K51 N21 I31 K31 K30 K91 F40 D41 N45 H35 I67 N43 D70 L08 N18 H26 I73 E87 K56 G93 K40 K66 K42 H11 K41 H40 L29 A04 H25 L53 K43 E78 E66 E11 J42 I20 I10 E03 E21 J45 I25 D64 M46 M84 L60 D75 M06 H54 M70 E27 B98 M13 K20 D69 K57 M20 K52 M22 K58 D50 E86 I44 L03 G81 L97 M31 M07 M17 M67 I85 K14 G43 B37 K12 G56 J90 L43 M16 M19 M51 N13 M53 M15 M47 I83 G99 M72 I80 M85 I46 I45 M48 K74 B96 B95 M71 K82 K70 M79 I74 L40 M54 M65 N72 G62 G20 N83 M86 G31 H91 E14 E16 J98 M87 M05 K81 G47 N81 D06 K80 I77 E89 M25 E83 K76 A41 M66 M81 G54 N80 H72 B97 K83 I89 C82 M43 N92 I72 C50 I12 D86 J96 M80 I61 E04 N84 H83 F05 I26 K86 D68 J22 M18 G95 N93 K26 H53 J40 Female genital organs Hepatobiliary pancreas Hematological-Infectious Bone-Joint-Muscular-Neurological-Spine Bone Endocrine Hepatobiliary pancreas Male genital organs Neurological Psychiatric Cardiovascular Female genital organs Immunological Metabolic Nutritional Respiratory Dermatological Gastrointestinal Infectious Muscular Ophthalmological Spine Ear, Nose, Throat Hematological Joint Neoplasm Oral Urinary medRxiv preprint doi: https://doi.org/10.1101/2021.01.15.21249242; this version posted January 15, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . ANKRD11 FANCA CDKN2A CDK10 PRRC2A AGER A NCOA6 PSORS1C1 L30 Other dermatitis RNF5 RALY MSH5 LSM2 OR2B3 Neoplasm of uncertain OR2J3 TRIM27 MMP24 CNBD2 EGFL8 APOM BACH2 FOXP1 or unknown behaviour CPNE7 CDKN2B ALS2CR12 Other disorders of skin HLA-B of urinary organs TNXB BEND3 and subcutaneous tissue, D41 SPATA33 LOC102724392 not elsewhere classified KLK2 HNF1B Carcinoma in situ of skin L98 LOC100287036 VPS9D1 GAS8 CASP8 HSPA1L N40 D04 N32 C6orf15 Other epidermal thickening Hyperplasia of prostate L85 LOC101927817 Parkinson's disease Other disorders IRF4 G20 PRRT1 Malignant neoplasm HLA-DQB1 Skin changes due to CLPTM1L of bladder C67 SPATA2L MIR499A of bladder chronic exposure to Other malignant PPP1R14A Obstructive and N13 CHMP1A nonionising radiation neoplasms of skin TMEM237 N42 Other disorders of prostate reflux uropathy TYR GGT7 L57 C44 TERT KLK3 HIBADH CENPBD1 DBNDD1 C61 TCF25 HLA-C EIF2S2 Malignant neoplasm HLA-A PSORS1C2 of prostate TP53INP2 MIR499B TRPC4AP AGPAT1 ITCH CCHCR1 Melanoma in situ PBX2 MYH7B CLIC1 L82 VWA7 ZBTB38 VPS9D1-AS1 C43 Seborrhoeic keratosis RPL13 PRDM7 FAM157C DYNLRB1 ZNF276 RBKS COLCA1 SMAD7 GSS TUBB3 PIGU DEF8 Malignant neoplasm of colon SLC45A2 CHMP4B Other disorders of HLA-DQA1 N39 C19 C18 MAP1LC3A GAS8-AS1 SPG7 RBM39 urinary system MC1R Malignant neoplasm of rectum EDEM2 SPIRE2 DPEP1 CBFA2T3 Other diseases of Other diseases ASIP UQCC1 K62 COLCA2 K63 anus and rectum of intestine lipid_digestion_mobilization_and_transport cytosolic_sulfonation_of_small_molecules B C Cholecystitis K81 mitochondrial_protein_import Other diseases of liver K76 myogenesis Alcoholic liver disease K70 phase_ii_conjugation Malignant neoplasm C34 Other diseases K82 of bronchus and lung of gallbladder fatty_acid_triacylglycerol_and_ J43 Emphysema IREB2 ketone_body_metabolism Cholelithiasis K80 J86 Pneumothorax CHRNA3 biological_oxidations Obesity J96 Respiratory failure, ADGRB2 not elsewhere classified a6b1_a6b4_integrin_pathway E66 ppara_activates_gene_expression Pneumonia, J18 F17 Mental and behavioural amb2_neutrophils_pathway organism unspecified disorders due to use of tobacco Psoriatic and enteropathic arthropathies M07 J42 Other chronic obstructive spliceosome pulmonary disease K25 Gastric ulcer Psoriasis L40 cell_adhesion_molecules_cams MED27 Seropositive I71 Aortic aneurysm rheumatoid arthritis M05 and dissection costimulation_by_the_cd28_family Other rheumatoid arthritis CHRNA4 DRD2 I70 Atherosclerosis M06 rna_pol_i_rna_pol_iii_and_ K14 I73 Mononeuropathies of lower limb Other peripheral mitochondrial_transcription G56 Diseases of tongue vascular diseases Migraine G54 Nerve root and plexus Gonarthrosis M16 M17 M20 G99 M51 G43 E21 M81 compressions in diseases [arthrosis of knee] Coxarthrosis Other disordersOther ofintervertebral nervous Osteoporosis withoutclassified elsewhere Acquired deformities Hyperparathyroidism[arthrosis of hip] system in diskdiseases disorders pathological fracture of fingers and toes and other disorders classified elsewhere of parathyroid gland