<<

www.nature.com/scientificreports

OPEN Network‑based analysis of key regulatory implicated in Type 2 Diabetes Mellitus and Recurrent Miscarriages in Turner Syndrome Anam Farooqui1, Alaa Alhazmi2, Shaful Haque3, Naaila Tamkeen4, Mahboubeh Mehmankhah1, Safa Tazyeen1, Sher Ali5 & Romana Ishrat1*

The information on the genotype– relationship in Turner Syndrome (TS) is inadequate because very few specifc candidate genes are linked to its clinical features. We used the microarray data of TS to identify the key regulatory genes implicated with TS through a network approach. The causative factors of two common co-morbidities, Type 2 Diabetes Mellitus (T2DM) and Recurrent Miscarriages (RM), in the Turner population, are expected to be diferent from that of the general population. Through microarray analysis, we identifed nine signature genes of T2DM and three signature genes of RM in TS. The power-law distribution analysis showed that the TS network carries scale-free hierarchical fractal attributes. Through local-community-paradigm (LCP) estimation we fnd that a strong LCP is also maintained which means that networks are dynamic and heterogeneous. We identifed nine key regulators which serve as the backbone of the TS network. Furthermore, we recognized eight interologs functional in seven diferent organisms from lower to higher levels. Overall, these results ofer few key regulators and essential genes that we envisage have potential as therapeutic targets for the TS in the future and the animal models studied here may prove useful in the validation of such targets.

Te medical systems and scientists throughout the world are under an unprecedented challenge to meet the medical needs of much of the world’s population that are sufering from chromosomal anomalies. As there is no cure for such anomalies, early detection and specifc interventions are likely to be the key aspect of economically efcient and high-quality healthcare systems. Turner Syndrome (TS), also known as monosomy of X, is one such chromosomal disorder where there is a complete or partial deletion of an X in all or some of the somatic cells­ 1,2, usually due to sporadic chromosomal non-disjunction. TS is the only chromosome that represents haploinsufciency compatible with life in less than 1% surviving pregnancy­ 3. Te reason why some TS fetuses are miscarried while others make it through pregnancy without many complications is still unknown. Te most likely reason for the TS fetus’s survival may be the level of mosaicism that enables them to remain alive and develop. It has been reported that about 50% of TS patients have haplotype 45, X, while about 20–30% have mosaicism where 45, X line is accompanied by one or more other cell lines having a complete or structurally abnormal sex chromosome (X or Y)4,5. Some of the distinctive symptoms of TS include short height, webbed neck, low hairline at the back of the neck, low-set ears, markedly elevated levels of follicle-stimulating hormone (FSH), chronic otitis media (OM), lymphedema of extremities, small mandible, and multiple pigmented ­nevi6. TS is ofen accompanied by many other co-morbidities like autoimmune diseases (AD), hypothyroidism, kidney dysfunction, loss of ovarian function or other reproductive disorders, neurological or ophthalmological abnor- malities, osteoporosis, diabetes mellitus (DM), dyslipidemia, recurrent miscarriages (RM), hypertension, and heart disease­ 7–12. Since TS is non-heritable and no good animal model is available, this further complicates its ­status13. Terefore, the causal genes of TS co-morbidity are still unstated. Te prevailing theory of dosage imbalance due to the loss of an is not enough to justify this burden of co-morbidity which has worsened the living circumstances of these patients. Two such acquired

1Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi 110025, India. 2Medical Laboratory Technology Department, Jazan University, Jazan, Saudi Arabia. 3Research and Scientifc Studies Unit, College of Nursing and Allied Health Sciences, Jazan University, Jazan, Saudi Arabia. 4Department of Biosciences, Jamia Millia Islamia, New Delhi 110025, India. 5Department of Life Sciences, Sharda University, Greater Noida 201310, India. *email: [email protected]

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 1 Vol.:(0123456789) www.nature.com/scientificreports/

co-morbidities of TS which is the focus of our present work are Type 2 Diabetes Mellitus (T2DM) and Recur- rent Miscarriages (RM). Even though T2DM and RM seem to be extremely frequent, there is a void of literature related to them. Reports suggest that the prevalence of T2DM occurs at any age group in life. Some studies suggest that impaired beta-cell function or reduced sensitivity may be the causal factor for T2DM in TS­ 14. Te latest clinical practice guidelines for TS states that glucose intolerance and T2DM prevalence in TS is 15–50% and 10%, respectively; nevertheless, the frequency of T1DM is yet to be determined­ 1. It is expected that the factors that trigger T2DM in TS are diferent from that of the traditional risk factors of T2DM in the general population. It is currently believed that Xp haplotype gene defciency may be the reason for the high incidence rate of T2DM in the TS population which may lead to impaired β-cell ­function15. Further, overexpression of some genes of Xq may worsen the condition. Whilst these factors may account for T2DM in TS, the exact reason and the key genes involved remain obscure. Another distinctive feature of TS is ovarian failure which renders the TS women infertile in most cases. However, in a few of the TS cases i.e. 5–10%, spontaneous puberty occurs of them only 2–5% become pregnant spontaneously­ 16. Tese pregnancies are at high risks and must be followed up carefully. Recurrent miscarriages are defned as two or more successive pregnancy losses before 22 weeks of gestation. It has been assessed that RM occurs in 0.5–5% of all reproductive-age women­ 17, however, this rate is higher in turner’s population­ 18. Despite worthy studies that have ascribed the increased risk of miscarriage in TS patients to small uterine size and reduced endometrial thickness and receptivity­ 18,19, little is known about it at the genomic level. Terefore, such genetic alterations leading to the enhanced risk of RM in TS women must be evaluated. Many studies have shown that impaired glucose tolerance, diabetes mellitus and Insulin Resistance (IR) have somehow been responsible for adverse reproductive/pregnancy outcomes, including infertility and ­miscarriages20,21. However, there is little information and data to verify this claim. Since the data claims that in the general population, T2DM and RM may somehow be associated, we expect a similar scenario in the case of the TS population. Te molecular components of a cell are functionally interdependent which means that a disease or a syndrome is a consequence of the perturbations of the complex intracellular and intercellular interactions. With the advancement in the feld of Network biology, many potential disease genes and their better drug targets have been identifed. Te emerging tool of Network medicine has important applications to systematically explore drug targets, biomarker/key genes of the network through the identifcation of disease modules and pathways, thus providing improvements in the diagnosis, prognosis, and treatment of complex ­diseases22–26. In our study, the proposed network protocol not only provides a global analysis of the TS but also presents a detailed view on specifc proteins and their association with the specifc co-morbidities i.e., T2DM and RM. Te current research also sheds light on few key genes, pathways, and some interologs which can be targeted to ofer better interventions for TS along with its associated co-morbidities. Results Te workfow of the whole integrative network-based approach followed in this study is illustrated in Fig. 1. Te fgures in this study were drawn in Adobe Illustrator CS6.

Diferentially expressed genes (DEGs) of TS. Te information regarding the datasets pertaining to the microarray series used in the study is listed in Table 1. Te box plots for the normalized data sets are illustrated in Supplementary Figures S1−S3. Only those DEGs that surpassed the cut-of criteria of “P-value < 0.05” and “|logFC|≥ 0.5” in both the series of each category were considered as the signifcant DEGs. We did not use the adjusted p-value for multiple comparisons, rather we used the p-value cut-of < 0.05 to select the maximum number of DEGs of Turner Syndrome and then proceeded with the network construction. Te DEGs of each dataset that surpassed the cut-of criteria are listed in Supplementary Table S1. A total of 355 genes were found to be diferentially expressed in TS, of them 239 genes were upregulated, and 116 genes were downregulated (Supplementary Figure S4A). Tese genes were used to construct the PPI network to get the elaborated view in terms of –protein interactions in TS.

Signature genes of T2DM and RM in TS. Here, we adopted an integrative approach for the meta-anal- ysis of multiple profles of TS and its comorbidities i.e., T2DM and RM, specifcally. Using the same cut-of criteria, a total of 185 genes were found to be diferentially expressed in T2DM, of them 138 genes were upregulated, and 47 genes were downregulated (Supplementary Figure S4B, Supplementary Table S2). And in the case of RM, 112 genes were found to be diferentially expressed, 42 of them were upregulated, and 70 genes were downregulated (Supplementary Figure S4C, Supplementary Table S3). On overlapping these sets of DEGs, 9 genes were found to be common with TS and T2DM and 3 genes were common with TS and RM (Fig. 2a). We call them the “Signature Genes” that may be responsible for T2DM and RM in TS. Tese signature genes are listed in Table 2. Te values of the fold change and p-value of the DEGs and the signature genes of TS, T2DM, and RM are listed in the Supplementary Tables.

Association between T2DM and RM in TS through pathway analysis. Te pathway enrichment and functional enrichment was analyzed for all the DEGs of TS along with their interacting partner. We found that the DEGs were signifcantly enriched in many biological, cellular, and molecular functions as well as some pathways (Table 3). Most of these genes are enriched in protein binding, ATP binding, RNA binding, and trans- lational processes. Supplementary Figure S5 illustrates the involvement of diferent pathways in TS. Further- more, in the current study, we evaluated the association between RM and T2DM in TS. For this, we selected the

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 2 Vol:.(1234567890) www.nature.com/scientificreports/

Microarray Analysis of TS Microarray Analysis of T2DM Microarray Analysis of RM GSE46687 & GSE58435 GSE23343 & GSE25724 GSE22490 & GSE26787

Through Geo2r Through Geo2r Through Geo2r

Expressed Genes (DEGs) Expressed Genes (DEGs) Expressed Genes (DEGs) P value <0.05 P value <0.05 P value <0.05 |log2(FC)| > 0.5 |log2(FC)| > 0.5 |log2(FC)| > 0.5

Chromosomal contribution and Alteration in Gene Expression from

TS foetus to TS adult

n o

i DEGs of

t

s

a l

e Degree Distribution i

u TS and signature genes of T2DM and RM

t

c

r

l

e

a p

c (Similar Expression)

o P

r Avg. Clust. Coef.

P

D

- l

P STRING Database

a

c

C i

L Neighborhood

g

d

o l

n ConnecConnecttivitivityy o

a Construction of

p

) o

E Primary Network

T

H

f (

Betweeness

o

y

n

g r

Through R o

i

e

t

n

a

E

s

i

r Eigen value

n

e

a

i t

Community Detection c

n

a

o

r t

l (Newman and Girvan’s method)

a i

h Closeness

m

C

a H

Set of sub-communities (Communities of each community) Tracing of genes (Motif level)

of Key regulators

Predicting essential interactions (interologs) by integrating orthology with PPI

Figure 1. Illustration of the workfow of the integrative network-based approach of our study.

Geo accession Platform No. of probes No. of samples (control/disease) Sample type Organism (A) Turner Syndrome GSE46687 GPL570 54,675 36 samples (10/26) peripheral blood mononuclear cells Homo sapiens GSE58435 GPL570 54,675 10 samples (5/5) Second trimester amniotic fuid Homo sapiens (B) Type 2 Diabetes Mellitus GSE23343 GPL570 54,675 17 samples (7/10) Liver tissue Homo sapiens GSE25724 GPL96 22,283 13 samples (7/6) Pancreatic islets Homo sapiens (C) Recurrent Miscarriage GSE22490 GPL570 54,675 10 samples (6/4) Placenta Homo sapiens GSE26787 GPL570 54,675 15 samples (5/10) Endometrium Homo sapiens

Table 1. Microarray datasets of (A) Turner Syndrome (B) Type 2 Diabetes Mellitus (C) Recurrent Miscarriage.

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 3 Vol.:(0123456789) www.nature.com/scientificreports/

P<0.05 min row expressionmax row expression S S S S S S S S S S S S S S S S S S S S S S S S S S |log2(FC)| > 0.5 S S S S S 1026_C 1027_C 1028_C 1029_C 1030_C 1021_T 1022_T 1023_T 1024_T 1025_T 134051_T 134050_T 134049_T 134048_T 134047_T 134046_T 134045_T 134044_T 134043_T 134042_T 134041_T 134040_T 134039_T 134038_T 134037_T 134036_T 134035_T 134034_T 134033_T 134032_T 134031_T 134030_T 134029_T 134028_T 134027_T 134016_C 134017_C 134018_C 134019_C 134020_C 134021_C 134022_C 134023_C 134024_C 134025_C 134026_T Log FC in GSE58435 (Before Birth) Log FC in GSE46687 (After Birth) GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM141 GSM141 GSM141 GSM141 GSM141 GSM141 GSM141 GSM141 GSM141 GSM141 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GSM1 GENE NAME TREML1 OFCC1 BIRC5 T2DM NT5DC2 TS GABRB1 9 DEGs LOC100507053 344 DEGs 174 DEGs LINC00645 RNF217-AS1 MMP19 POLQ HIST1H1T CTDSPL ZNF280A CEND1 VN1R3 3 DEGs 2 DEGs ADAM12 NCS1 SEPP1 SOBP SLC35G1 CCDC171 RM CASC15 FAM78B HAGLR 107 DEGs WFDC9 TRDN LINC01198 [a] USP25 FBXO28 FILIP1 LOC100289090 PAPPA FGF10-AS1 FMO5 MCL1 TIA1 CXADR SAMHD1 DDX3Y BAIAP2 USP10 MCCC2 GART ATRX FCGR2C MICAL2 GNAL COPA FRMPD4 SCAND2P QTRT2 C11orf63 LRRC8E UEVLD MRVI1 SGPP2 AFF3 PFKFB4 USP9X ZNF226 STRIP2 CNOT7 DMBX1 GREM2 ZNF652 ACO1 NAIP LIPJ DLX6 X Y LPCAT2 ARHGAP19 [b] [c]

Figure 2. (a) Signature genes of T2DM and RM in TS. (b) Te percentage of genes on each chromosome that was diferentially expressed in TS at the signifcance threshold “P-value < 0.05” and “|logFC|≥ 0.5”. (c) Te heat map showing the change in expression of the DEGs from TS fetus (GSE58435) to TS adult (GSE46687).

S. no Gene name Chromosomal location Status in TS Status in T2DM 1 SLC29A2 11q13.2 Upregulated Upregulated 2 THBS1 15q14 Upregulated Upregulated 3 GPRC5B 16p12.3 Upregulated Upregulated 4 CSHL1 17q23.3 Upregulated Upregulated 5 ADAM22 7q21.12 Upregulated Upregulated 6 IGHM 14q32.33 Upregulated Upregulated 7 WIZ 19p13.12 Upregulated Upregulated 8 IGHD 14q32.33 Upregulated Upregulated 9 COX11 17q22 Downregulated Downregulated S. no Gene name Chromosomal location Status in TS Status in RM 1 ATXN7L1 7q22.3 Upregulated Upregulated 2 UBE3B 12q24.11 Upregulated Upregulated 3 FANCM 14q21.2 Downregulated Downregulated

Table 2. List of signature genes of T2DM and RM in TS.

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 4 Vol:.(1234567890) www.nature.com/scientificreports/

Go term Total no. of genes involved from the list Seed genes present P-value Molecular function ZNF703, ZNF24, ZKSCAN3, ZFYVE27, ZFHX3, ZBTB17, ZBTB10, YTHDC2, XPO5, WWOX, WRNIP1, WIZ, WBP2, VPS37B, USP22, UBE2Z, TYRP1, TTC23, TSPAN2, TPM1, TOPORS, TOB2, TNXB, TMEM8A, TMEM67, THY1, THBS1, TBX5, TAF1C, SYNE2, SYNCRIP, STMN2, STK16, STAT2, STAT1, SRGAP2, SRC, SPON1, SPG7, , SOX13, SOCS3, SNRPN, SMYD2, SLC9A9, SFI1, SETDB1, SEPT6, RYK, RPL31, RPH3AL, RAB3IP, PXN, PTPN22, PPP6R2, POU2F1, POSTN, POGK, PMEPA1, PKHD1, PKD1, PHF1, PEX3, PDLIM5, PDE4DIP, PCNX4, PBXIP1, PBX2, PAPOLA, OLFM1, OCIAD1, NR4A1, NPC1, NIPSNAP1, NFATC3, NFATC1, NF2, NEK8, MKL1, MAPT, MAPK1, Protein binding 403 MAGEC2, LRR1, LIFR, LGALS3, LCP2, LAPTM4B, KRR1, 1.3E-21 KRAS, KMT2A, KIF1C, KCNJ15, JRK, ITPR3, ITIH4, IL23A, IL1R2, IKZF3, IGHM, IFT27, HSPA1L, HIPK2, HIPK1, HFE, HEXIM1, HBB, GUCY1A3, GSE1, GOPC, GNAS, GADD45B, FOXN3, FOXK1, FLT1, FKBP8, FGFR1OP2, FBLIM1, FANCM, FANCC, FAM20A, FAM161A, FAF1, ETV6, ETS2, ENO2, ENAM, EDC3, DYRK1A, DTNA, DOT1L, DGKZ, DCAF8, CXCR5, CUL9, CRTC2, CNOT2, CEP152, CELF1, CEACAM1, CDK15, CDC34, CASP2, CASK, CARS, CACNG2, BICD1, BCL6, BCL2L1, BAZ2A, ATXN7L1, ATF6B, ASXL1, ARID4B, ARHGAP22, APLP2, AP5Z1, ANKRD44, ALOX5, AGO4, AGER, ADRB1, ADAM22, ACHE, ABHD1 YTHDC2, XPO5, SYNCRIP, SAMD4B, RPL31, PURB, OASL, poly(A) RNA binding 103 6.8E-21 LGALS3, KRR1, KIF1C, FNDC3B, FBRSL1, CELF1 YTHDC2, YME1L1, WRNIP1, UBE2Z, TRIB2, STK16, SRC, SPG7, SCYL1, RYK, RPS6KA6, RECQL5, PKDCC, PFKP, PDXK, PDK4, PCCB, PAPOLA, OASL, NEK8, MAST4, ATP binding 82 MAPK1, KRAS, KIF1C, KALRN, HSPA1L, HIPK2, HIPK1, 8.8E-6 FLT1, FANCM, EPHA8, EIF2AK4, EARS2, DYRK1A, DTYMK, DNAH1, DGKZ, CUL9, CDK15, CDC34, CASK, CARS, ATP9B, ABCC1 XPO5, SYNCRIP, SNRPN, SAMD4B, RPL31, PAPOLA, RNA binding 71 1.3E-22 EARS2, CELF1, BAZ2A IKZF3, IFIT3, EDC3, CASP8, SEPT7, IL12B, CASP2, AKT1, ABCD1, EPHA4, APAF1, EED, ACTN2, APLP2, DYRK1A, WRNIP1, CREB1, PRPF3, FKBP8, VWA1, RAF1, structural constituent of 48 SNRNP200, TP53, KMT2A, XIAP, THBS1, AGER, MAPK1, 9.9E-25 CSK, FYN, BAK1, AMELX, PLK4, JUN, STAT1, sSTAT2, STAT3, FN1, BRAF, S100B, VEGFA, RAD52, BCL6, BCL2, , BAX, FAS, GRB2, BCL2L1 Cellular component ZNF782, ZNF703, ZNF684, ZNF24, ZKSCAN3, ZFHX3, YTHDC2, XPO5, WWOX, WRNIP1, WIZ, UBE3B, TXNDC2, TRIB2, TOPORS, TOB2, TMEM57, TBX5, SYNE2, SYNCRIP, STC1, STAT2, STAT1, SRGAP2, SRC, SOX2, SOX13, SMYD2, SETDB1, SCYL1, SBF1, SAMD4B, RYK, RPS6KA6, RECQL5, RAB3IP, PURB, PTPN22, POU2F1, POGK, PKD1, PHF1, PFKP, PERM1, PDXK, PDE4DIP, PBXIP1, PBX2, PAPOLA, NR4A1, NFATC3, Nucleus 252 2.2E-12 NFATC1, NF2, MTF2, MOB2, MKL1, MAPK1, MAGEC2, LMO7, LGALS3, KRR1, KMT2A, KLF16, KDM4B, KANSL3, JRK, IKZF3, HIVEP2, HIPK2, HIPK1, HEXIM1, HES2, GPRC5B, GNAS, GADD45B, FOXN3, FOXK1, FANCC, FAM71B, FAF1, ETV6, ETS2, EFCAB13, DYRK1A, DTYMK, DOT1L, DGKZ, DCAF8, CRTC2, CNOT2, CELF1, CDC34, CASP2, CAMTA1, BTBD7, BCL6, BAZ2A, ATF6B, ARID4B, ARHGAP22, ARG1, APLP2, AP5Z1, AGO4, ACHE ZNF703, ZKSCAN3, ZFHX3, XPO5, WWOX, VPS37B, UBE3B, UBE2Z, TXNDC2, TRIB2, TOB2, TBX5, TBC1D32, SYNE2, STMN2, STK16, STC1, STAT2, STAT1, SRGAP2, SRC, SOX2, SOCS3, SNRPN, SMYD2, SETDB1, SCYL1, SAMD4B, RYK, RPS6KA6, RPH3AL, RNF213, RGS5, RECQL5, PXN, PTPN22, PPP6R2, POSTN, PKHD1, PKD1, PHF1, PFKP, PERM1, PDXK, PDLIM5, PDE4DIP, PAPOLA, OASL, NR4A1, NFATC3, NFATC1, NF2, NEK8, MYADML2, 230 MTF2, MOB2, MMP28, MKL1, MAST4, MAPT, MAPK1, 1.1E-8 MAGEC2, MAFIP, LST1, LMO7, LGALS3, KRR1, KRAS, KMT2A, KDM4B, JRK, ITPR3, IL1R2, IKZF3, HIPK2, HIPK1, HEXIM1, GOPC, GNB1L, GNAS, GADD45B, FGD6, FANCC, FAM161A, ETV6, ETS2, EML4, EFCAB13, EARS2, DTNA, DNAH1, DGKZ, DCAF8, CUL9, CRTC2, CNTLN, CNOT2, CELF1, CDH3, CDC34, CD96, CASP2, CASK, CARS, CAMTA1, BCL2L1, BAZ2A, ARID4B, ARG1, AQP4, AP5Z1, ANKRD13D, AKAP10, AGO4 Continued

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 5 Vol.:(0123456789) www.nature.com/scientificreports/

Go term Total no. of genes involved from the list Seed genes present P-value XPO5, WWOX, TPM1, THY1, TAT, STAT2, STAT1, SRGAP2, SRC, SOX2, SOCS3, SMYD2, SFI1, RPS6KA6, RPL31, RNF213, RGS1, RAB3IP, PXN, PFKP, PEX3, PDXK, PDLIM5, PCCB, PBXIP1, OASL, NFATC3, NFATC1, MTHFR, MKL1, MAPT, MAPK1, LCP2, KRAS, KALRN, 211 1.1E-25 HSPA1L, HBG2, HBG1, HBB, HAL, GNAS, FGFR1OP2, FERMT1, FBLIM1, FANCC, FAF1, ENO2, EDC3, DTYMK, DENND1C, CNOT2, CNBD2, CEP152, CASP2, CASK, CARS, BICD1, BET1L, BCL2L1, ARHGAP22, ARG1, ALOX5, AKAP10, AGO4 ZFHX3, ZBTB17, ZBTB10, YME1L1, XPO5, UBE2Z, TBX5, TAF1C, SYNE2, SYNCRIP, STAT2, STAT1, SRGAP2, SOX2, SOX13, SMYD2, SETDB1, RPS6KA6, RECQL5, PXN, POU2F1, PMEPA1, PHF1, PEX3, PDXK, PAPOLA, Nucleoplasm 186 NR4A1, NFATC3, NFATC1, MTF2, MKL1, MAPK1, KRR1, 1.5E-24 KMT2A, KDM4B, KANSL3, ITPR3, HSPA1L, HIVEP2, HIPK2, HIPK1, HEXIM1, FANCM, FANCC, EZH1, ETS2, EFCAB13, DYRK1A, DOT1L, DGKZ, CRTC2, CEP152, CELF1, CDC34, BOD1L1, BCL6, BAZ2A, ARID4B, AP5Z1 YME1L1, WRNIP1, TM7SF2, SYNCRIP, STMN2, STK16, SLC1A6, SLC12A9, SCYL1, RYK, RPL31, RNF213, PRSS12, PFKP, PEX3, PDLIM5, PCSK6, PCDHGC3, OCIAD1, OASL, NPC1, NNT, NF2, MUC4, LST1, LGALS3, LAPTM4B, Membrane 139 1.6E-15 KRR1, KRAS, KIAA2013, KCNJ15, ITPR3, GOPC, GNAS, GGCX, GALNT1, FKBP8, EML4, EDC3, CNOT2, CELF1, CEACAM1, CDH3, CASP2, BICD1, BET1L, BCL2L1, B4GALT1, AQP4, APLP2, ALG12, AGO4, ACHE, ABCC1 Biological process rRNA processing 62 RPL31, KRR1 1.4E-39 WWOX, WBP2, TBX5, STK16, STAT1, SOX2, PKD1, PBX2, positive regulation of from RNA polymerase II NR4A1, NFATC3, NFATC1, MTF2, MKL1, KMT2A, IL23A, 54 5.0E-4 IKZF3, HIPK2, EZH1, ETV6, ETS2, DOT1L, CRTC2, CASK, CAMTA1, ASXL1, ARID4B Translational 51 RPL31 1.4E-24 translational initiation 49 RPL31 5.1E-36 nuclear-transcribed mRNA catabolic process, nonsense- 47 RPL31 8.1E-37 mediated decay Reactome pathway Formation of a pool of free 40S subunits 53 RPL31 1.11E-16 Nonsense Mediated Decay (NMD) independent of the 49 RPL31 1.11E-16 Junction Complex (EJC) GTP hydrolysis and joining of the 60S ribosomal subunit 53 RPL31 1.11E-16 L13a-mediated translational silencing of 53 RPL31 1.11E-16 expression SRP-dependent co-translational to mem- 52 SEC11C, RPL31 1.11E-16 brane KEGG pathway hsa03010: Ribosome 48 RPL31 2.7E-27 hsa03040: 32 HSPA1L 6.2E-13 STAT2, STAT1, SRC, NFATC3, NFATC1, MAPK1, KRAS, hsa05161: Hepatitis B 32 7.1E-12 ATF6B hsa03460: Fanconi anemia pathway 20 FANCM, FANCC 5.7E-12 hsa04360: Axon guidance 28 SRGAP2, NFATC3, MAPK1, KRAS, EPHA8, EFNB3 2.0E-10

Table 3. Te and pathway enrichment of DEGs of TS, signature genes of T2DM and RM and their interacting partners. Key genes are bold and italicised and signature genes are bold.

signature genes of RM and T2DM in TS along with their interacting partners in the TS network. Tese genes are THBS1 (MIM: 188,060), NFATC1 (MIM: 600,489), GPRC5B (MIM: 605,948), FANCL (MIM: 608,111), WIZ (GenBank: NM_021241.2), JRK (MIM: 603,210), XAB2 (MIM: 610,850), CSHL1 (MIM: 603,515), LSM5 (MIM: 607,285), COX11 (MIM: 603,648), RPL3L (MIM: 617,416), ADAM22 (MIM: 603,709), LGI1 (MIM: 604,619), SLC29A2 (MIM: 602,110), IGHM (MIM: 147,020), IGHD (MIM: 147,170), ATXN7L1 (GenBank: NM_020725.1), SF3B3 (MIM: 605,592), UBE3B (MIM: 608,047), SLC9A9 (MIM: 608,396), FANCM (MIM: 609,644), SMARCAL1 (MIM: 606,622), MUS81 (MIM: 606,591), TIPIN (MIM: 610,716), RECQL5 (MIM: 603,781), MLH1 (MIM: 120,436), RMI1 (MIM: 610,404), RMI2 (MIM: 612,426), ERCC4 (MIM: 133,520), and RAD52 (MIM: 600,392). Te highlighted genes in bold are signature genes. It was found that ten of these genes (ERCC4, RMI1, FANCM, FANCL, RAD52, XAB2, MUS81, TIPIN, RMI2, and MLH1) are involved in DNA repair of which four of them i.e., ERCC4, FANCM, FANCL, and MUS81 are specifcally involved in Fanconi anemia pathway which is also a type of DNA repair pathway. We expect that these two pathways are important as they have common genes of RM and T2DM predicted in the study.

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 6 Vol:.(1234567890) www.nature.com/scientificreports/

Total no of nodes and edges in fnal No of signature genes present in fnal S. no No of nodes in 1st shell No of nodes in 2nd shell network (n,e) Clustering coefcient network 1 250 50 (561, 11,230) 0.523 8 out of 12 2 300 100 (672, 16,353) 0.532 9 out of 12 3 500 100 (887, 24,388) 0.534 9 out of 12 4 400 50 (723, 17,556) 0.539 8 out of 12 5 400 100 (775, 20,357) 0.544 9 out of 12

Table 4. Parameters of network construction (confdence level 0.4).

Chromosomal contribution of DEGs. To study the genomic imbalance, we studied the percentage con- tribution of the DEGs of TS across the . It was observed that all the chromosomes had altered gene expression except chromosomes 23 and Y. It ranged from 1.17% () to 11.17% () (Fig. 2b). Chromosome X contributes to 1.76% of DEGs. Chromosome 1 had the highest proportion of signif- cantly altered genes in the study. Tus, it was observed that TS phenotype is the result of global genomic imbal- ance, rather than the genes of individual X chromosome alone­ 27. One could therefore explain the TS phenotype only by the additive efect of genes based on diferent loci.

Change in expression of genes in TS fetus and TS adult. Genes identifed as expressed in fetal tis- sues may provide clues to developmental processes and are a candidate set for further analysis in disease studies. Tere are few genes whose expression changes on transformation from fetus to adult contributing to diferent developmental processes of that individual. As mentioned in Table 1, GSE58435 signifes the expression of genes in TS fetus while GSE46687 signifes the expression of genes in TS adult. We represented the transformation from TS fetus to TS adult in the form of a heat map (Fig. 2c) which was constructed using Morpheus online tool. In a comparison of genes expressed in fetal vs. adult turner patients, we identifed 72 genes whose expressions are altered from fetus to adult in TS (P-value < 0.05 and |logFC|≥ 1). Of these 72 genes, 24 genes are downregulated in fetal turner patients and upregulated in adult turner patients while 48 genes are upregulated in fetal turner patients and downregulated in adult turner patients. Tese genes are of interest as candidate genes because their expression levels are just the opposite in TS individuals (fetus vs adult) in comparison to healthy controls. Terefore, it is expected that when these genes are activated in adult or fetal tissues in an altered fashion, this may hinder the developmental processes. Te names of these genes are listed in Supplementary Table S4. Tese fuctuations of gene expressions from fetus to adult may contribute to the phenotypic features of the TS. Te study presented here is only a subset of the types of information that these data sets can yield. Although the role of these 72 genes in TS is still unclear, our results elucidate a new aspect of TS which is also crucial for under- standing its etiology which requires additional future analyses to provide further insights.

Turner Syndrome network follows hierarchical scale‑free features. Te PPI network of TS was constructed with the DEGs of TS including the DEGs of T2DM and RM that were common with TS (signature genes). So, a total of 355 genes that were diferentially expressed were used to construct the PPI network of TS. Our goal was to get a network that carries most of the signature genes in the same network with a condition that our network’s clustering coefcient must be greater than 0.5. Te clustering coefcient being greater than 0.5 signifes that the network and its genes are fnely clustered together. In our study, we tried diferent scenarios by changing the number of nodes in the frst shell and second shell as mentioned below in Table 4. We found that most numbers of signature genes are incorporated in the 2nd, 3rd, and 5th cases. We, however, selected the 5th case i.e., 400 nodes in the 1st shell and 100 nodes in the 2nd shell as we get the best clustering coefcient here. Terefore, we proceeded with this case. Of 355 DEGs, only 271 genes made it to the main network. We call these genes seed genes. Te main con- structed network consisted of 775 nodes and 20,357 edges. Te nodes here are the proteins and the edges are the interaction between these proteins. Protein–protein interactions (PPIs) can be conveniently represented as networks, allowing the use of graph theory for their study. Studying the topological properties of the TS network may reveal patterns associated with TS in . Te topological properties used here are the probability of degree distribution P(k), clustering coefcient C(k), and neighborhood connectivity C­ N(k). Tey characterize the structural and organizational features of the TS network. It was observed that these topological properties obey power-law behavior as a function of degree k (Fig. 3a). Te power law of the datasets of the topological variables of the TS network is ftted and verifed following a standard statistical ftting procedure proposed by Clauset et al.28. Te values of the exponents are attained from the power-law fttings. Te summarised results for the complete network are as follows, γ P k− 0.581 α C k+ 0.0929 (1) ∼  β  → � CN � k+ � 0.3467 �   Tese values suggest that the TS network follows a weak hierarchy. Te value of γ signifes that the number of nodes increases with the advancement of disease as a power of 0.581, thus, giving us the idea of the TS network

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 7 Vol.:(0123456789) www.nature.com/scientificreports/

100 0.7 1 0.1 1 100 10

0.4 ) 0.5

(k) (k)

(k)

(k 0.0001

B

C

N 0.0001 E P(k) C C C C(k) C

1 0.2

1e-008 1e-009 [a] 0.1 110100 1000 110100 1000 0.1110 100 1000 110 100 1000 k 110100 1000 110100 1000 k k k k k

e

m

a

N 1.1

e

n

e

1.05 G

1

0.95

0.9

0.85 Level01LevelLevel23LevelLevel4 Level5 [b] [c] [d] [e] Degree

Figure 3. (a) Te behaviours of degree distributions (P(k)), clustering coefcient (C(k)), neighborhood connectivity (C­ N(k)), betweenness ­(CB(k)), closeness (C­ C(k)) and eigen-vector ­(CE(k)) measurements as a function of degree k for TS network. (b) Corresponding Hamiltonian Energy (HE) as a function of levels of organization. (c) Corresponding modularity ­QN as a function of levels of organization (d) Variation in the calculated average LCP-corr for TS network as a function of network-level. (e) Characterization of top ffy leading hubs of the network by degrees. EFNB3 and LCP2 are the key regulators denoted by yellow color and THBS1 are the signature gene of T2DM denoted by green color.

being hierarchical as it shows the presence of modules in our clustering experiment. Te graphical representa- tion of degree distribution P(k) shows that the network is dominated by lower degree hubs than that of higher degree hubs. Tis signifes that the network follows the power law and thus the hierarchical, scale-free, and has fractal attributes. Te positive value of β suggests that the network carries the assortive mixing specifying that a large cluster of degree nodes (formation of the rich club) regulates the TS network. Te centrality measurements correspond to the fow of information in the network and predict the infuential candidates in the network that play important role in the fow of information in the network. Two such measures are betweenness centrality ­CB(k) and closeness centrality ­CC(k). Te eigenvector centrality ­CE(k) depicts the efcacy of the spreading (receiving) power of information of nodes from the network. Tese properties obey power-law behaviors as follows, µ CB k 0.4361 δ CC k 0.0979 (2) ∼  θ  → � CE � k � 1.376 �   Following the procedure of Clauset et al. (2009), these three centrality measurements are again verifed and confrmed for their statistical power-law fts. We found that only a few higher degree nodes have large centrality values which means that most of the infuencing hubs that can control the network are few. Tus, the TS network is predominated by the low degree nodes (genes/proteins). It is these low-degree nodes that control the working and organization of the network. However, some of the leading hubs that are scarcely distributed might show signifcant involvement in regulating as well as stabilizing the network. Here, the positive values of these centrality measurements show that the network exhibits hierarchical scale-free or fractal features. Tus, the overall topological properties of the TS network show that it self-organizes into a scale-free fractal state and is composed of successive interconnected communities which means the network has hierarchical organization.

Validation of the biological signifcance of TS network. A network that difers signifcantly from a random network could be viewed as containing relevant information, assuming that the comparison with the random network is meaningful. Construction of the null model will allow us to assess the signifcance of the TS network features. A null model consists of one network (or a set of networks) that matches a graph under study in some structural aspects while being random in all other characteristics. In our study, we constructed two dif- ferent null random networks of the same size and did the comparative analysis (Table 5). We can see that the clustering coefcient and the diameter of the random networks drastically decrease in comparison to the TS network. Te exponents of other topological properties are also diferent and do not fol- low the scale-free fractal attributes. Te graphical representation of topological properties of TS network and random networks as null models is illustrated in Supplementary Figure S6. Trough this comparative analysis, we found that the TS network constructed from the diferentially expressed genes of TS is exclusively associated with TS and is not by chance and is biologically signifcant.

Identifcation of key regulators and properties. To identify the systematic arrangements and modular structure of the TS network at their various levels of the organization, we followed Newman and Girvan’s stand- ard community fnding ­algorithm29. It was found that the TS network is hierarchically organized through six dif-

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 8 Vol:.(1234567890) www.nature.com/scientificreports/

Degree preserved random Random network (Erdos Properties Main turner syndrome network network of TS Renyi algorithm) Nodes 775 775 775 Edges 20,357 20,357 20,357 Clustering co-efcient 0.544 0.294 0.068 Network diameter 7 5 3 Network radius 4 3 3 Exponent of average clustering 0.092 − 0.091 − 0.0645 co-efcient Exponent of degree distribution − 0.581 − 0.581 0.125 Exponent of neighbourhood 0.346 − 0.0463 − 0.00934 connectivity Exponent of betweenness 0.436 2.1 2.029 centrality Exponent of closeness centrality 0.0979 0.092 0.084 Exponent of Eigen vector 1.357 0.915 1.016 centrality

Table 5. Comparative analysis of TS PPI network with random networks of the same size.

ferent levels. As one moves from top to down level of organization, the corresponding Hamiltonian Energy (HE) and modularity ­QN as a function of levels of the organization are found to be decreased (Fig. 3b,c, respectively). Te proteins that are deeply rooted from top to bottom of the network where the network cannot be further divided into sub-community and form motif are said to be the key regulators of the network which serve as the backbone of the network ­organization30. We identifed nine key regulators LCP2, PTPN22, CCL22, CXCL5, S1PR4, POU2F1, FAM20A, ENAM, and EFNB3 (Fig. 4a) in the TS network. Surprisingly, none of these KR genes fall among the categories of the top ten leading hubs. However, two of the key regulators LCP2 and EFNB3 were among the top 50 high degree hubs (Fig. 3e). Tus, we can say that it is not necessary for these KRs to be the large leading hubs in the network, however, their popularities are randomly changed at various levels of organization (Fig. 4a,b). Since the network qualifes hierarchical characteristics, the elimination of the leading hubs will not cause its breakdown. But it is expected that these KRs, if eliminated, may cause maximum local and global perturbations, especially at a deeper level of organization. Tese perturbations may reach out to the deeper levels of organization causing the topological change in the ­network30. Few more proteins of the seed genes i.e., CD96, CXCR5, IKZF3, SPON1, KALRN, and BTBD7 reached the sixth level but did not form the motif. Tus, they cannot be considered as the KRs. All the KRs maintained a low profle/popularity thereby regulating the network till the bottom level of the organization. None of the signature genes of T2DM and RM in TS reached the motif level, however, THBS1 and ADAM22 supported the network reached till the 5th level. THBS1 was among the top 50 high-degree hub genes. Tese KRs may propagate signals from top to bottom levels and vice versa of the network to maintain network stability and inherent properties. Tese key regulators are deeply rooted in the network, they serve as the backbone of the network for any network activities and regulations and could be a possible target gene for this disease control mechanisms. Since these identifed key regulators and the signature genes of TS are expected to play an important role in TS, we further explored them by searching the possible microRNAs that could regulate them. We used MIENTURNET tool­ 31 which is an interactive web-based tool for microRNA-target enrichment analysis. Te Supplementary Table S5 presents the MIENTURNET enrichment results of miRTarBase which gives the most up-to-date results for validated interactions. We used the p-value cut-of of < 0.05 to get the list of signifcant microRNAs responsible for the regulation of these key regulators and signature genes of TS. l Te graphical representation of probability Px y of all the key regulatory genes show an increase in ­Px from level 0 till 6th level (Fig. 5). Tis means the regulating ability of each key gene becomes more important and signifcant at the deeper level of organization. 

Evidence of self‑organization: local‑community‑paradigm (LCP) approach. Te TS network was analyzed to assess the maintenance of its self-organization at diferent levels of its organization using the LCP technique. We calculated the LCP-correlation of all the communities/sub-communities through six diferent lev- els of its organization. Te modules/communities having zero LCP-correlation were excluded in average. It was found that the average values of LCP-correlation at each level are greater than 0.95 and the values do not change with the error bar (Fig. 3d). Te LCP-decomposition-plot (LCP-DP) for the main TS network (Level 0) and its sub-modules (4 sub-modules at Level 1) are shown in Fig. 6. Based on the nodes and their links of each network and its sub-modules we can conclude that the TS network and its sub-modules are more strongly characterized by small-local communities and are compact. Tis shows that the network is self-organized and compact with efcient information processing. Te TS network represents a strong LCP network which lets us conclude that the network is dynamic and heterogeneous which enables network evolution and reorganization. Such architec- ture assists quick delivery of information across networks both locally and globally.

Status of essential interactions of TS in non‑human systems by integrating the Orthology with PPI. We examined the top hundred genes in each category i.e., degree distribution, betweenness cen-

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 9 Vol.:(0123456789) www.nature.com/scientificreports/

MAST4

PBXIP1 ADAM20 WBP2

TXNDC2 EFCAB13

OCIAD1 PPFIBP1

OGFOD2 ST7L POMGNT1 SOX13 FGD6 PERM1

LAPTM4B B4GALT1 LRRN3 PITPNM2 AMELX LIMS2 ADAM22 FAM20A PDK4 NPC1 ENAM AMBN IL1R2 BTBD7 PDE4DIP FERMT1 VWA1 TNNI1 ST3GAL6 EFNA2 FGFR1OP2 UROC1 POMT2 WWOX SBF1TSPAN2 EPHA5 TRPM1 TNXB DPF3 PARVB TFR2 CASK FBLIM1PBX2 NPC2 FUT9 SLCO1B3 PDLIM5 GABRB3 ATG14 SLC12A9 LIMS1 DTNA HIPK1 LGI1 S1PR4 CXCR2 FERMT2 GOPC EPHB4PGF EPHA6 CRTC2 CACNG2GALNT1 APBA1 GAB1 CXCL5 TBX5 TRIB2 ALOX5AP BRAF CXCL13 APLP2 EPHA3 PPP3CAIL23R TNNT2 EFNA4POSTN FLNA BECN1 EPHB6 PARVA ZFHX3 HAL EPHB1 IL23A VEGFB ZNF24 LIN7C RGS1 PTK2 MTF2 LALBA TSPAN33 EFNB3 CLEC4E TYRP1 ACTC1 EPHA4 EFNAITIH3 4IL12BBCL2L1TPM1NKX2−5FYB IFIT1 IFT27 EPHA7 EFNA1 FLNC LST1 CHGA PTPRREPHA1 MTHFR SEPT6 KCNJ15 EFNB1BCAR1CRADDPTGER3 NANOG FOXK1MKL1 PMAIP1 NFATC3 BICD1 EPHA8 PTK2BGNB3TNFRSF10BFN1 NEK8 RYK ALOXBAZ25A LAT PIK3CGCRK DUSP1 CCL22 PKD1 SSPN LIN7A GNB1 CDH1ACTN2 HIPK2 NTN5 ZKSCAN3 CAMTAEPHB1 ADRB3 1 HMGB1 YES1 SYKIRF7 EED HPD TAC9orf3T CD96 LIN7B ATF6B STMN2 GRB2SOX2CXCR5IRF9 SYNESEPT42 BAK1 AQP4 ILK BCL2 PIK3CA NRP1 FKBP8 AEBP2 EPHB2EFNA5 PXN EGFRFLT1GRAP2ZAP70THY1ITPR3 AP5ZRAB3IP1 VEGFVCLAIL6 MUC4 AGERGNG2 MAPTFYN BAX IL1B MYH6 SEPT9 STAT2 BAD SETDB1SOCS3SRFIFIT3 ZFYVE27 CREBDYRK1A1 NRASSTAT3 SUZ12 UBAP1 EFNB2 FAS CSKTNFRSF10A HFE RPH3AL DGKZ ACHE DUSP6 SHC1 PKHD1 ZNF703 BIDCASP2 ABCC1NFATC1SRCEGF PIK3C3MAPK8 KIF1C WNT10B BCL2L11 KRAS JAK1 ARG1 TMEM67RAB8A TMEM216 ABHD1 CDH3THBS1 EPHA2 HRASMAPK1SOS1 PTPN22LIFR LMO7 CEP135 JUN TSG101MAGEC2CUL2KDM4B CENPJ CTTNCASP9SIN3AAKT1 PTPN1FOS1 DLK1 SEPT2 PPP3R1CTNND1 JAK2CTNNB1 POU5F1 HGSNAT ITK GNB4 UBE2SSTAT1IRF4 KMT2A TIPIN SFI1 LIN28A CREBBPTCEB1 EZH2 IFITCAND2DOT11 L PLK4 XIAP CC2D2A FBRSL1 KALRN GNASCASP8NF2 EZH1RPS6KARNF2136 CEP152 VAV1 GNB2 RBBP4 IFNAR1TFRMLH1C FAF1 CDK15 CEP63 DENND1C STC1 CEACAM1 SAP130APAFBCL1TYK6 2 TP53 CDK9 SPG7 HES1 LRR1 DUSP16 PEA1ARID45 B ETS2 LGALS3 FANCBVPS37B PFKP JARID2 SKP1CUL9 FANCC CTNNA1 ZBTB17LCP2CELF1CYCSS100B USP18 KLF16 RPS6KAASXL1 EP3001 SUN1 TCEB2 EME1 FNDC3B HEXIM1 NR4A1 SPONRAF11 POU2F1RMISMYD2NIPSNAP2 1 STRA13 SLC15A4 ARHGAP22 NUTF2DICER1MDM2FAM161A YME1L1 HSPB11 GADD45B ETV6 CDC34BCORFANCD2MX2 GUCY2DGUCY1B3 EML4 DUT EIF2AK4 PHF1 WRN CETNSMARCAL12 MXRSAD21 CUL5 CUL1 UBE3B C17orf70 FOXN3 EMB AGO4 FANCI FANCG MTR IKZF3 VCP FANCFANCLE CNTLN ATF7IP CCNT1SNW1MDM4 BLM SEPT7PURB GUCY1A3 ISG15HP RPAUBE2Z3 PDXK MVB12B DDX6 PPP2CAUBB FANCA MVB12A GUCA1A RQCD1 PAPOLA CNOTHNRNPK1 UBCRAD18MUS81 CNOTCNOT8 4 RANBP2 OASL ENO2 EIF3IEEF1D RECQL5 ATXN7L1 TAF1C BCAS2XPO5 CDC5L UBA52 RMI1 DTYMK EDC4CNOT6L PRPF6 SF3B3 GNB2L1NEDD8 RAD5FANC2APITDFFANC1M HNRNPLSNRPKHDRBS1LSM3N RPL36 RPL10RBX1 RPAHSPA1L2 SRSF2CRNKLWDR31SF3B31 SNRNP200 RPA1 WRNIP1 BOD1L1 GUCY2F HNRNPAEIF2S1EEF1A11 RPL34RPS27A PRPF1PRPF89 EIF3DRPL23RPSRPL196 RANRPL14RPL35 UQCRFS1 VPS28 LEVEL 0 RPS16 XAB2HNRNPH1DCP1RPS20A EIF2SSEC61G3 C19orf4ERCC04 DSEL CNOT7 CNOT3 SNRPF EEF2RPL12WDR75EEF1G SNRPHNRNPA2B1A EIF3KRPL13RPLP2RPL11KPNBRPL381 RPL28 LSM6 PRPF4EIF3MHNRNPUHNRNPM SEC61A1 HBC1orf8B 6 DCP1BTAF1B LSM7 CPSF3 RPS3 RPL18AHEATRRPL10ARPS271RPL32 SLC9A9 SAMD4B DDX23 PPIH SYNCRIFBLPRPLRPS216RPL30NIFKRPS24RPL15 RPL18 LSM5 SNRPD2 RPL13ARPS19 RPL24 TOP3A EME2 U2AF2GNB1L WDR3EFTUD2SNRPGDKC1SNRPD3 RPSUTP11LA EIF3E RPL36A FAM49A CNOT6 LSM1CSTF2SF3BSF3A2CSTF33 EIF4ARPL7RPS103 RPL35A RPS25 SUN2 SARTSNRPD11 RPS23RPS26 RPL37ARPSRPS182 EEF1B2PWP2 PCCA CWC1LSM25 RPL4 RPS27LRPL39DDX5RPL292 HBA1 TOB2 TAF1D DCP2 NHP2L1UTP6WDR3RPLRPS129RPS296 RPLP0RPL22RPL26RPS11 BYSL DERL1 LSM4YTHDC2SNRPRPL27B RPL21RPL7ARPL8 RPS17RPS15RPLP1 RPL37 FAURPS8RPL3LRPS15ARPL22L1 EIF2SSEC11C2 CPSF2CPSF4CTNNBL1CNOT11SNRPB2PLRG1 RPL26L1SNRPE NOP58 RPL23ARPL27ARPL3 UTP14CEARS2 PRPF3EIF3BEIF3H RPS5 RPS14UTP3SPCS2 SNRPC NOC4L RPL31 UFD1L CDC40 CNOT2 PRPF31NHP2RPL17 RPSRPL9 5 TELO2 HBG2 HBG1 TOPORSPHF5ALSM8 RPSRPS4X7EIF3G IMP3CARSBOP1 EDC3 SF3A1 SNRNP70USP22 RPL10LEMG1 DCAF13SPCS3 SRGAP2 CSTF1 CPSF1NOP56WDR43 GAR1SEC61BRPS3A CIRH1A SNRNP40 RPS28RPS13 RPL39LUTP18 SNRPA1 NOP10 KRR1NOB1WDR46 RRP3RRP96 RPL36AL HBA2 PCCB RRP7TSR1A BMS1 LTV1 IMP4 UTP15 BET1L RIOK2PES1PDCD1NOP14SPCS1 1 SLC16A7 PEX19 WDR42A UTP14A NOL6PNO1 MPHOSPH10 TBL3 PRSS12 PCNXL2 FCF1 COX11 ENSG00000086848 ABCD1 CYC1 EMC2EMC8 PKDCC KRI1 MMGT1 JRK PPP6R2

SCYL1 TM7SF2 EMC7 [a] CSHL1 PEX3 EMC1 EMC4 WIZ EMC3 SLC5A9

NNT ABCD3 ALG12 ZZEF1 ANKRD44 FRYL DNAH1 EMC9 PCNXL4 GPRC5BZNF782 GSE1 C11orf57TMEM8A DNHD1 UBALD1

GGCX STK16 MOB2 KIAA2013

POGK Main Network of TS

ANKRD44 DNHD1

CSHL1 ZZEF1 PKDCC DTYMK PPFIBP1 EMC2 UROC1 GABRB3 ARID4B UBAP1 SOX13 DPF3 HBG1 TSPAN33 C9orf3 OCIAD1 PBX2 COX11 CYC1 BET1L TAT SLC9A9 KLF16 HBG2 TSPAN2 HPD CUL9 MVB12A NNT SLC15A4 EMC7 EZH1 UBE2Z GNB1L LRR1 ALG12 HAL TFR2 RQCD1 VPS28 TAF1B HBB DUT TM7SF2 CAND1 CDC34 MX2 KDM4B AEBP2 MVB12B NIPSNAP1 SMYD2CDH3 IFIT2 HBA2 SEPT7 VPS37B ZFHX3 IFITPDXK3 TRIB2 TAF1C TCEB1 RBBP4 ALOX5AP MTF2 PCNXL4 UBE3B CUL2 NPC2 SPCS2EIF3M GUCY2D UBE2S EED CNOT11 EEF1D EMC8 EMC3 HSPA1L HP SEC11C RPL26L1 SAP130 FGFR1OP2 TSG101EML4 SBF1 BAZ2A CAMTA1 RPL27 TRPM1 JARID2 RSAD2 UQCRFS1 RPL23 TOP3A MKL1 RPS6KA6ARG1RNF213 IRF4 HIPK1 EIF3D TCEB2UFD1L VCP GUCY1A3 TPM1 USP18 CNOT6LZNF24 UBB FANCE SYNE2 SUZ12 LIN28A EIF3I SEC61B RPL39 SEC61G LSM7 IRF9 IFIT1 ISG15 RPL17 RANBP2 EIF2AK4 CUL5 PEA1AGER5 SETDB1BECN1 RPL22RPL36ALNOP10 MTR RAD18 SLC12A9 IRF7 FAF1 EEF1ADKC11RPL22L1RPL19 SEPT6 DERL1 C19orf40 TELO2 SSPN RGS1 MYH6 ATF7IP EIF3RPLRPLP0E8 RANRPL10A RRP7A RECQL5 SIN3A GAB1 DLK1 SPCS1RPS1RPL51 LSM3HNRNPUCPSF4 EMC4 LMO7 FNDC3B CTTN IFNARLIFR1 SOX2EZH2STAT1 TFRC ASXL1GADD45B RPL15 DCP1B FLNC ALOX5 POU2F1 LSM6 EMC1 CDK9 MAGEC2 CXCL13 TYK2 NKX2−5BCL6 TXNDC2 RPL36 RPL29 NOP5CAR8 SOASL RBX1 FANCD2 C17orf70 GUCY2F IL23A ENO2DICER1EP300ETV6 MDM4 EIF3K CDC40 SMARCAL1FANCI TNNT2 NANOG POU5F1CTNNB1 HMGB1 CREBBP ATG1ATF6B4 RPL34DCP1A RPS2RPL36RPS4A X HNRNPM SNRPN SNRPC RAD52 WRN CXCR5 CSKJAK1 DUSP16 PEX3 EIF3HRPS27 RPL13EEF1GAEIF3BCIRH1A LSM2 GUCA1A CCL22 PHF1 ITMXK 1 KMT2A NFATC3 RPS5 DNAH1 EME2 TBX5STAT2 PIK3C3 GRAP2 RPL39RPL27L RPS15AARPS7 LSM5 CCNT1 FANCC WRNIP1 SEPT9 PXN CASP9S100JUBN STC1 EMB WBP2 RPS21 RPS17 EMC9 EME1 ACTC1 MDM2 CASP8PTPRRXIAPWWOCXCL5X FGD6 HNRNPSNRPK E HNRNPL ZAP70 SOCS3 MAPK8KLF4TP53 IL12B EIF3G EIF2SRPL302RPLRPL377RPL13RPL28A ATXN7L1 MLH1 FANCG CEACAMEFNA21 NF2 GRB2 STATKRAS3 EGF ZBTB17 EIF4ARPL123 RPL37 CNOT2 DTNA CTNND1 THY1 EFNA1 VAV1 NFATC1 RPS6 NHP2 CPSF3 SNRNP200 MUS81 FANCMSTRA13 TNNI1 DUSP6 CUL1 NR4A1 RPL11EEF2RPL35A RPL9 SNRPD3 U2AF2SNRNP40SNRPA1 RPA3 RMI1 PKD1 VCL PTPN1CDH11 AGO4SOS1 RPS8 EDC3PPIH PHF5A ERCC4RPA2 DUSP1 TNFRSF10A FKBP8 RPS10 RPS15 HBA1 FYB ILVEGFA6 AKTS1PRTYRP11 4 RPS2RPL26RPS13 4 RPS3RPS20 NOC4L KCNJ15 EPHB6 JAK2MAPK1 EGFR PPP3CA USP22 FAURPL38RPL14EEF1B2 BMS1 LSM1 HNRNPH1 HES1 RPA1 GUCY1B3 ACTN2 PIK3CAFN1 HFE SKP1 EFTUD2 RPS27ANHP2L1RPS12 RPS9 NOP56 SNRPF HNRNPA2B1 PTGER3 BCAR1 RPS6KA1 GNG2BCL2L1 ST7L IMP3 FRYL PTK2 SRF CASP2 RPLSPCS33 RPS2RPL244RPS25 RPS1RPL10L8 PRPF4 APITD1RMI2FANCL BOD1L1 YES1 HRASSHC1 IL23RBCL2L11 MAPT BAK1 RPL3L RPS1RPL18RRP93 ARPL18UBA5WDR32 6 KHDRBS1 CNOT1 CDC5L FANCAFANCB EPHA2 BRAF IL1B ITIH4 ZKSCAN3 RPS28 CNOT7 DDX6 HSPB11 ILK POSTN APAF1 CRADD SEPT4 RPL31 RPS19 OGFOD2 IL1RLCP22 GNB1EPHB2 KALRN NRASSYKFOS ITPR3BAMTHFX R LALBA EDC4 PRPF3 GNB2 CREB1 UTP6EIF2S1RPL7A PNO1 PRPF19 TIPIN EPHA1 SRC NPC1 TBL3 DCP2 XAB2 LEVEL 1 CRK FYN BCL2 BID RPS16 PRPF6 FANCF EFNB3GNAS PTPN2FLT12 FACTNNA1S YME1L1 RPS27LPES1RPL35RPL21 RPS2KPNB19 SNRPD2 SUN1 DOT1L ZNF703 EPHA4 RYK PCCA WDR7EIF2S35 UBC MPHOSPH10RPSA SART1 SF3A1 PITPNM2 EPHA5PTK2B GNB4 GNB3CXCR2 CYCS HIPK2 FBL BOPPPP2CA1 CNOT4 BLM LIN7B EFNBFLNA1 THBS1 GOPCSPG7 RPL10 LSM4SNRPSNRPD1A TNFRSF10B PMAIP1BCOR UTP18 RPS3NIFKA RPS2RPL32RPLP2RPL66 HEATR1 CNOT6 HNRNPA1 DDX23 C1orf86 EPHB4 EFNA5 IKZFPG3F ABCC1 GNB2L1 DDX52 CRNKL1CPSF1 CWC15 SF3B3 CDK15 SRGAP2 RAF1 PIK3CG HEXIM1ETS2 UBALD1 RPLP1 APBA1 EFNB2 SYNCRINEDD8PPDCD1NOL1 6RPL23A SF3A3 FERMT1 EPHA3 MUC4 UTP14C EPHA7EFNA4 AQP4 TSR1RPL4 EARS2 PRPF31YTHDC2CSTF3LSM8SRSF2 EPHA8FBLIM1LIMS1 CRTC2 DGKZ ABCD1 LTV1 WDR4FCF16 WDR43 PRPFCTNNBL18 WDR42A CETN2 NRP1 BAD SF3BWDR313 FOXK1 GPRC5B LIN7C PARVA PDK4 UTP11L CPSFPLRG2 1 SNW1 IFT27 EPHB3EFNA3 EPHA6EPHB1ACHE VEGFB LGALS3PPP3R1 RIOKSEC61A2 1 CSTF2 NUTF2 SUN2 ADRBLA1T ZFYVE27 UTP3 EMG1 SNRNP70 PLK4 LIN7A FERMT2 PWP2BYSL IMP4 PURB WNT10B DYRK1BTBD7A STK16 UTP14A PCCB STMN2 KRR1 NOP14 SNRPB LIMS2 PDLIM5APLP2 UTP15 SNRPG SPON1 DSELGALNT1 MMGT1 NOB1 VWA1 FUT9 B4GALT1 GAR1 CACNG2 WDR3DCAF1RRP33 6 SNRPB2 CD96 ST3GAL6 BCAS2 CEP152 PBXIP1 PARVB KRI1 PAPOLA PEX19 CHGA CSTF1 RAB3IP AMBN CNOT3 SF3B2 CEP135 CASK FAM20A SLCO1B3 CLEC4E LAPTM4TNXBB ENAM PERM1 SFI1 CNTLN AMELX LGI1 XPO5 HGSNAT TAF1D CEP63 ARHGAP22 ADAM20 CELF1 RAB8A AP5Z1 PDE4DIP BICD1 EFCAB13 DENND1C ENSG00000086848 TOPORS CC2D2A SEPT2 LST1 CNOT8 WIZ TMEM67 ADAM22 CENPJ FAM161A PKHD1 KIF1C TMEM216 LRRN3 PRSS12 ZNF782 ABCD3 PFKP RPH3AL JRK SLC16A7 SAMD4B FBRSL1 NEK8 POGK FOXN3 MAST4 TMEM8A C11orf57 ABHD1

PPP6R2 NTN5 SLC5A9

GGCX

UBAP1 TAF1B CYC1 ARID4B TAF1C RBBP4 TIPIN CASK CNOT4 CNOT3 CNOT2 FANCI NIPSNAP1 ALG12 SAP130 ADAM22 SSPN ENAM CARS ZNF24 PRSS12 BAZ2A UBB EMC7 PCCB EMC8 VCP LIMS2 ATF7IP MKL1 EIF2S1 RPL35 RPL19 RMI1 RECQL5 SEPT9 APLP2 LGI1 AMELX RPL26L1 KDM4B VWA1 UBC PURB PARVB ZNF782 RPL8 RPL36AEIF3I L RAD52 STRA13 ZNF703 FERMT1 RPL14 FANCA CLEC4E VEGFB AEBP2 CAMTA1 RPL18A UBA52 JRK CNOT8 RPA2 CUL9 SLC12A9 PTGER3 MTHFR DPF3 POSTN KRI1 RPL38 CSTF1 EMC9 EME2 S1PR4 PGF RPL37 EIF3D RPL29 NNT SNRPA CNOT7 UBE2Z STC1 EZH1 EED RPL5 RPA3 ALOX5 PPP2CAEIF3EIF2S2G CNOT6 EMC1 CUL2 VPS37B AGER AMBN RPL10L SF3B3 RPA1 AQP4 SEC11C YTHDC2 CXCL5 PARVA FAM20A PHF1 AGO4 GGCX RPS3IMP3 SNRPG RAD18 SYNE2 GALNT1 TFR2 SETDB1 RMI2 CDH3 LIMS1 DTNA EP300 HIPK1 RPL17 SPCS2 FOXN3 C19orf40 THBS1 HFE TNXB POU2F1 PRPF8 HMGB1 FN1 TBX5 UTP15 RPS15ARPS11 OASL RPL3ISG115 SPCSRPL33 0 SNRPSF3AN1 ERCC4 APITD1 CDC34 CXCR2 PDXK ASXL1 NOC4MPHOSPH10L RPS16 FAU EIF3B FANCM UBE2S FERMT2 PBX2 CXCR5 STMN2 NFATC1 IKZF3 JARID2 RIOK2 SEC61B RPL2RPL32 4 SNRNP40 LRR1 CEACAM1 FLNC ZBTB17 SRF BOP1 KPNB1NEDD8 SART1 MTR SMARCAL1 FANCC PCCA CXCL13 EFNA1 SMYD2 SIN3A HBA2 RPL7 MLH1 CTTN SPG7 LIFR ENO2 PEX3 TBL3NHP2L1 EIF2S3 RPL35ARPL12 CPSF2 SNRPC CNOT1 SLC9A9 UBE3B CTNND1 IL1R2 NRP1 BTBD7 KMT2A DCAF13 RPL27A PRPF3 BCAS2 EDC4 CUL5 FBLIM1 IL1B IL6 VEGFA RPL15 CPSF4 TSPAN2 BCL6 NKX2−5 RPS6KA1 UTP6 RPL13AEIF3M RPL3L FRYL SAMD4B S100B UROC1 EPHA2 MTF2 EMC4 JAK1 JAK2 ARG1 NOL6 SF3BSRSF2 2 PHF5A SYK MUC4 IL23A IRF7 HP EPHA6 EPHB4 SUZ12 NANOG LIN28AJUN FBL DKC1 SNRNP200 EMC3 WRNIP1 FANCG RBX1 OGFOD2 TAT RPS20 RPL6RPL28 FANCL PTPN11 CREBBP EFNA2 EPHB6 EEF1A1 RPL11 CTNNBL1 WRN VAV1ITK SHC1 VCL HAL EFNA5 CREB1 PES1 BMS1 WDR3 RPL10 RPL21 FANCB C1orf86 CAND1 IL12B STAT1 NOB1 RPL32 DDX6 HES1 CCNT1 LAT STAT3 EFNB1EFNA4 POGK RPL24 SF3B1 WIZ EDC3 BOD1L1 TCEB1 CTNNA1 CCL22 IFNAR1 TFRC BCOR ETV6 MYH6 RRP7A RPL22L1 RPL27 RPLP0 SNW1 SNRPA1 LSM1 DERL1 MUS81 PTPN22 PPFIBP1 IL23R IRF4 EPHA8 EZH2 NOP58 NOP14 WDR43 RPL36 DCP1A RPL9 PPIH YES1 ACTC1 IFIT1 MX1 CRK FOS RRP9 RPLPRPL36A1 HNRNPK SNRPD2 IFIT3 EPHB2 KLF4 EEF1G SPCS1 FANCE SRGAP2 PXN FLNA PITPNM2 OCIAD1 MX2 SOCS3 NPC2 RPS28RPS13RPS19 RPL26 RPL13 PRPF19 TOP3A GNG2 STAT2 ITIH4 RPL18 EEF1D FANCD2 LMO7 GRB2 FGFR1OP2 EPHA3 SOX2 DDX52 U2AF2 CSTF2 HNRNPL CELF1 LSM4 TCEB2 EPHA5 EFNA3 DICER1 STK16 PNO1 HBA1 EIF3K HNRNPH1 PRPF4 PTK2B IRF9 DUSP1 FLT1 EPHB1 EPHA1 FCF1RPS25 RPS27A GNB2L1 RPL10AEIF3RPLPE 2 EEF1B2 CDC40 EME1 GAB1 SRC ACTN2TPM1 RSAD2 EMB FYN ETS2 ILK RPS26UTP1WDR48 6 RPS9 EEF2 SYNCRIP GABRB3 DCP1B C17orf70 PTK2 CUL1 SKP1 RPS18 SNRPB2 ATXN7L1 DCP2 HBB SBF1 CTNNB1 RPS24 PRPF31 XAB2 FANCF CDK9 GRAP2 GNB1 CSK TYK2 EFNB2 CHGA RPS17 RPS12 SEC61A1 DNHD1 SNRPD1 EGFR PIK3CA USP18 NPC1 EFNB3 NOP56 RPL3EFTUD2 SEC61GRPL23A IFIT2 HBG2 EPHB3 POU5F1 TNNT2 CDH1 RPL37A SF3A3PRPF6 KHDRBS1 CD96 SOS1 ZFHX3 NHP2 UTP3 SNRNP70 CSTF3 DOT1L NRAS RNF213 TYRP1 IMP4 RPL23 CWC15 MAGEC2 EPHA4 EGF RPS7RPS29 HNRNPA1HNRNPU CPSF3 LSM2 UFD1L TNNI1 EPHA7 DUSP6 EIF4A3 RPL39L BET1L FYB ARHGAP22 THY1 RPS21PDCD11EMG1RRP36 RPL3EIF3H9 DNAH1 DUT FNDC3B HNRNPM SNRPF ANKRD44 BLM ZAP70 GUCA1A HBG1 LEVEL 2 ST7L ENSG00000086848RPSA LSM8 BCAR1 PKD1 EARS2 COX11 SNRPWDR3HNRNPA2BE 3 1 PIK3CGBRAF RPS6UTP11L RPL4 PLRG1 XPO5 LSM6 TELO2 HSPB11 LST1 LCP2 TSG101 DLK1 BYSLNIFKRPS14RPS10 RPL7A ZZEF1 CPSF1 CRNKL1 LSM3 PDLIM5 WNT10B LGALS3 RPS23 TM7SF2 SNRPD3 RYK WDR75 NF2 GUCY1A3 TRPM1 SLCO1B3 RPS8 RPS3AUTP14C C11orf57 SLC15A4 CDC5RANBP2L ALOX5AP C9orf3 TSR1 RPS4XHEATR1 RPS15 DDX23 NUTF2 TSPAN33 KALRN LTV1 SNRPPAPOLAB PFKP MVB12B GAR1 RPS27 CDK15 GUCY2F CIRH1A LSM7 FOXK1 PBXIP1 RPS27L LSM5 PDE4DIP VPS28 WDR36 GPRC5B HGSNATRGS1 PWP2 TRIB2 SOX13 UTP14A EIF2AK4 GUCY1B3 MVB12A UBALD1 LRRN3 RPS2 CSHL1 GNB1L KRR1RPS5 NOP10 PCNXL4 WDR42A SPON1 TMEM8A DSEL EMC2

RBBP4 DNHD1 SLC9A9

CASK NNT GPRC5B ALOX5AP BTBD7 SMARCAL1 EFNA4 ERCC4 SLCO1B3 RECQL5 RAD18 UBE2S MTHFR ALOX5 CSHL1 EDC4 IL1R2 EPHA6 TFR2 ADAM22 FLT1 RPA3 LRRN3 EPHA8 VWA1 UBE3B NRP1 LSM1 APITD1 RMI2MLH1 TAF1B EFNA2 FANCE HMGB1 EPHA7 SOX13 RPL37A COX11 MUS81 RPA2 RYK RPL38 EMC2 CDC34 PGF SEC61G LSM5 DCP1B LSM3 TIPIN LGI1 EFNAEPHB65 EPHA3 DUSP6 STRA13 KDM4B ENO2 NIPSNAP1 EPHB2 WNT10B RPL6 PHF5ACPSF3 FANCG FANCL PURB CXCR2 EFNA1 FAM20A CTNNB1 WDR33 BLM GRAP2 HFE ENAM FN1 RPL19 WIZ RAD52 RPA1 LCP2 TFRC VEGFB EGF ATXN7L1 FYB LAT PTK2B ZBTB17 PTGER3 RPLPSPCS0 RPL36AUBA52 RPL282 EDC3 FOXK1 EPHA1 EFNA3 ILK SEC61A1 SEC11CRPL30RPL13 CPSF2 CPSF1 FANCC C19orf40 FYN EFNB1 RPL26 RPL9 FANCM WRNIP1 AQP4 XAB2 LSM7 EME1 LRR1 ST7L RPL35 DCP2 WRN PIK3CG CXCL13 S1PR4 EPHB4 RPS6KA1 ISG15 CPSF4 CRNKLSNRPB21 LSM6 FANCB TOP3A VAV1 PTPN11 EPHB3 VEGFA TNXB JUN PPP2CA UBB WDR42A C17orf70 ZAP70 EPHA2 EFNB2 CDH1 RPL24RPL22L1 PTPN22 CCL22 AMELX RPL34SPCS1 RPL29RPL23A RPL21 SART1 RMI1 UBE2Z DPF3 CREBBP EPHA4 FOS RPL8RPL18 SF3B3 FANCD2 YES1 CRK ATF7IP RPL11RPL23 C1orf86 BCL6 S100B EPHA5 CREB1 RPL27A LSM2 LEVEL 3 SHC1 PRPF3 SYK GAB1 CXCL5 POSTN RPL37 RPLP1 SNRNP200 PLRG1 LSM4 LMO7 SRC CHGA RPL10A RPL18A DDX23 FANCF ITK GNB1 CXCR5 EFNB3 POU2F1 RPL15 HES1 PARVA SPCS3 NEDDRPL368 CSTF1 FANCA BOD1L1 EGFR ETS2 SRF RPL35ARPL17 LSM8 EME2 VPS37B STMN2 RPL27DCP1RPL36AA LRPL22 PPIH PRPF4 SOS1GRB2 EPHB1 ZNF24 DICER1 RPL3L RPL5 RPL39LRPL26L1 CD96 KALRN PCNXL4 RPL32UBC SEC61B FANCI PIK3CA NFATC1 RPL14RPL39 CWC15 PRPF31 JRK TELO2 EARS2 BCAS2 NRAS RPL12 PAPOLA EIF2AK4 GUCA1A MKL1 RPL31 OASL IKZF3 NF2 BRAF SRGAP2 C9orf3 MTR SPON1 C11orf57 THBS1 TSPAN2 GUCY2F WIZ CSHL1 CASK POU2F1 EFNB3 CREB1 NNT HES1 EFNB1 COX11 UBE3B ETS2 RMI1 CD96 JUN ADAM22 WDR33PAPOLA THBS1 EPHB1 C11orf57 CPSF3 MLH1 S1PR4 RAD52 EFNB2 ENAM VWA1 FOXK1 PTPN22 ZBTB17 RPS6KA1 PRPF3 BCAS2 CPSF2 EPHA4 FOS PHF5A FANCL RMI2 CCL22 SNRNP200 PPIH SNRPB2 ITK NFATC1 IKZF3 CXCL13 EPHB2 DICER1 RPL3L CPSF4 GRAP2 SPCS3 CRNKL1 CXCR5 SRF RECQL5 SMARCAL1 CRK RPL37 PLRG1 DDX23 MUS81 ALOX5AP VAV1 LGI1 RYK AMELX CWC15 SLC9A9 FN1 SEC61A1 KALRN SEC11C CPSF1 SF3B3 LEVEL 4 SYK MKL1 RPL17 SEC61G EMC2 LCP2 C9orf3 CSTF1 ERCC4 ZAP70 CXCL5 CXCR2 RPLP1 FAM20A RPL39 WDR42A GPRC5B TIPIN FYB CREBBP BTBD7 ATF7IP SPCS2 FANCM AQP4 SEC61B LAT RPL39L SPCS1 YES1 BCL6 IL1R2 PTK2B STMN2 ATXN7L1 SPON1

DNHD1 GPRC5B

SPON1

CREB1

CXCL13 CCL22 POU2F1 FAM20A COX11 RPS6KA1 S1PR4 NFATC1 KALRN FYB LGI1 CXCL5 PTPN22 BTBD7 JUN EPHB1 FN1 FOS CXCR5 ENAM

GRAP2 EPHA4 ATF7IP ITK AMELX EFNB3 ADAM22 ETS2 THBS1 LEVEL 5 IKZF3 LCP2 CD96 ALOX5AP THBS1 ADAM22 LCP2 POU2F1 CD96 ENAM CXCR5 CXCL13 BTBD7 S1PR4 SPON1 EPHB1 JUN ALOX5AP AMELX PTPN22 EFNB3 ITK GRAP2 CREB1 CCL22 FN1 RPS6KA1 LEVEL 6 CXCL5 FAM20A IKZF3 KALRN EPHA4

LCP2 FYB

PTPN22a CCL22 EFNB3 ENAM POU2F1 CXCL5 FAM20A S1PR4

CD9 PTPN2 6 X5AP 2 2 [b] ALO GRAP ITK FYB LCP2 c1a 1 1b1c1d1e2 c1a1b1c1d1e

c1d1 c1a1b1c1d2 c1a1b1 c1a1b2c1d1 d1 c1 b2c1 a1b2 c2a1 c2d1 d1 c1 2c2 a1 1b b2c2 c2a d2 c 2 1a d 1b c2 3c b2 2d a1 1 c2 c1a 1 2b 4d 1c c 1d 1b2 1 c2a c 1a c1a1b1c1 c1a1b1c2 c1a1b1c3 2 2a1b1c1 c1a1b2c1 b b1c2 c c1 1 2 c3 c2a1 a1b2c2 c1 d a1b1 d c2 c1a1b2 2 c4 2c1 c3 2 c2a1b c1a1b b c2 2c4 1 b2 c1 a a1 a1 c c2 b3 1 c2 c4 c1 a2 2 c1 1b a a 1b b c1 c2 3c 2 1 b1 2 c d 2 c1 1 2a a d c2 c 1b 1 CCL 2 3c 1c 3 b2 b c1a 2 2 2 a 2a b1 c 2 1 c c1 1 c1a2b6c1d1e2 c 2c a C b c1 2 a2 a2 b 22 c2 b1 2 XCL1 2 c2 c c2 2 c1a 2 b d2 2 2b d a 2 2 c 2 c 1 3 c 2c 2 b c1 b 2 c2a1b2 c2a1b1 a 2 a 2 c1 2 c2a2b1 b c 2 3 2a c a 1 c1a1b1 2 c c c2a2b2 2 1 c1 b b a 1 2 3 a b c1 3 c1a1b2 3 c c c3a1b1 1 d 2 2 d1 c c C 1 a 2 c1 b3 1 3a1b1 c1a1b3 XCL c c b c3a1b2 2 c 2 1 a c1 a2 6 c S1PR 3 1 1b a2b3 b c a 5 3 c1a2b1 3 c c c c3a1b3 3 2 2 c d c 6 1 2 2 b a 1 2b 2d 3a c1a2b2 4 c c1 4 c c3a1b4 c 1 c 1 7 c2a1 c 1 a b a2b1 1 c2a2 a 2 2 b 3a b c 4 c3 6 c1a2b3 c 2 c1a1 2 c c 1 7 b d c3a1b5 c 1 1 a 1 3 a c 2 b

5

1 c c 1 1 c1a2b4 b c c3a1 c 2 1 1

a a a 3

2 c c3a1b6 2 b

5 b

2

c c 7 2

1 c

b c1a2 1 2 c

a 1 d

3 a

c 1 c c1a2b5 2

b

5

1 C2

c 1

3 1a3b ENAM

1c c3a1b7

b c

1 c

1 a

a 1 2

4

a

c b

6 2 c

2 b 1 c c1a2b6

7 1

F b

c c

1a 1

2 a

AM 2

4 d

c c3a2b1 b

1

6 c

1 2

c

2 c3a2 C3 c b

1 20

1

c a

a

1 2

c1a2b7 b

c4

a

c

7

3

c 2

1 A

b

c

2 c3a2b2

2 1

b CN

c2

c1

a1

a

c4 d

2

d1e2

b 1

1 7

c AM c1

c1a2b8 2

b1

C c

2

1

a

a

2

c4a1b1 c c4

b

1

c4a1 7

EL

c

c1a3 a

3

2

3

c

1 b

X

b c1

1

2

a a

c1a3b1 c

3 4

2

c b

1 d

3

c4a1b2 c1

c C4

2

1

b

2

c

1

4a

a c

c4a2 3

c

b

1

1c

1

c

c1a3b2 a 2

2 c1a4 3

b

b c 1

a2 c4a2b1 2

a 4

3

c c1

b

2

2

c

c

d

2 1

c4a3 1 c1a3b2c1d1e1

2b

c

1

4a a

c c1a4b1 3

3 b

c

2

2 c c4a2b2 c

2

b

1

2 c1a5

a a

4

3 c

c

1

4

b a

c

4

2 2

c4a4 b

1

c1a4b2 c 2b

c

2

1

4a c4a5 d

c c4a3b1

1

c

1

1

c

a

1

4 E E

b b

3 1

c c

2

4a c1a4b3

PH 1 P c

2

a

c

c

1 3

c4a3b2 1

b

a

b2 HA4

3

4

a

b

4 2

c c c

c1a5b1 1 2

B 3

c

d 1

c4a3b3 c 2 1

3b

a

1

4

4a b

c 2

1

c

2

c c1a5b2

2

b

c c

3 c4a4b1

1

a 1

a

4

4 a

c

b3

2

4 c

c

2 c1a5b3

b1 1

b c4a4b2 3

c a

1

c1a5b4 c a

c4 1d 4

b3

3 c4a4b3

c

2 c

c4a5b1 c4a5b2 1

b 2

3

a

c

1

1 c4 a

4 4

d c b

E

2 3

2 c1

c

b

3 3 c

a

4 a

4

c

FN

1

c 4 1

a

c 5

b 1 b

5b

1

b

c

4 a

1

1

a

c

4

1 c2

2

a c

c

5 c1

1

b

1

d

c

4b

2

a

B

2

4

1 c

c

c

2 c1a4b1c1d1e1 1

a

5

4b

b1

a

1

3

c

4

3

c

d 1

c

c

c

2 1

1

1

a

b

5 c RPS6KA

4

b

a

a

1

1

4

c

4 c

4 4 c

3

b b

b

4

a

c 5

2

1

4 1 c

a5

c

3

b

a

2

b

c

c

4

1

c1

1

a

2

a5

4

b

3 c

c 2 c

c

4 2

d b

5

a

1 c

c 1

3

a

5

b

2

c

2

4

b

c3

5

a

1

c

c

1

a

1

5 c

b 4

2

b

3 5

c

a

1

1

c

4 c

1 c

a 3

5

b

b

d 5

3 a

c 1 2 c c 3

c 3 b 5

a 1 c 4

1

c

a

3

4

b

b

2

a5

c

1

1

c

d

1

c

1a 1 1

d

4

4

b

c

2 3

c

b

1

d

a5

2

1

c JU

c

1 1

a

d

4

b 1

2

c

c

3 2

d

1

5b N

a

1

c

c

1

a 1

4

d CR

b

2 2

c c

2

d

2 b1

5

a

1

c

c1

b a4

3c

1d

2 1 d

1

c

1

b

a5

1 c c 1 a 4 b 3 c

1

d

2

d1 1 c 1 b a5 1 c c1a4b1c1d1 EB1

U PO

e

2 2F 1

Figure 4. (a) Network/modules/sub-modules at diferent levels which accommodate leading hubs and key regulators. (b) Organization of the modules/sub-modules of the TS network.

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 10 Vol:.(1234567890)

www.nature.com/scientificreports/

0

7

6

6

.

0.

0.40 0

[CCL22] [CXCL5] [EFNB3]

0.30

0.40

0.43

0.25

0.31

0.17 0.25

P value 0.23

P value

P value

7

7

0.089

0.072

0.024

0.024

0.0031

0.00058

0.0049

0.008

0.00093 0.001

Levels Levels Levels

0.67 0.67

[ENAM] [FAM20A] [LCP2] 0.67

0.43

9

0.43

3

0.

0.33 0.33

P value

P value P value

0.22

8

0.11

0.11

3

9

51

0.09

6

0

.

0

1 0.01

0.0021

9 0.0003

1 0.01

4 0.0003

0.0018

1 0.00 0.00

Levels Levels Levels

0

0

5

4

.

0.40

0 0.

[POU2F1] [PTPN22] [S1PR4] 0

3

0.

31

.

0.25

0.25

0

1

2

. 0

P value 4

P value

P value

0.13

1

.

0

2

4

7

3

2

5 0.05

3

06

0.05

6 0.02

0.01

0.01

0.0023

0.00

9 0.0005

0.0046

3 0.0009 0.00

Levels Levels Levels

Figure 5. Te probability distribution of the KRs as a function of the level.

150

0

L C

L 100

√ LEVEL LEVEL 50

0 0 50 100 150 200 CN Main Network Nodes-775 Links-20357

100 50

150 40

1 50 L

L 30

L L

C

C L

C 100 L

50 L

√ LC √ √ 20

LEVEL LEVEL 50 10

0 0 0 0 0 50 100 0 50 100 150 200 0 50 100 0 20 40 CN CN CN CN C1 C2 C3 C4 Nodes-362 Links-3891 Nodes-191 Links-10582 Nodes-111 Links-2348 Nodes-102 Links-725

Figure 6. LCP-DP plots of main TS network (Level 0) and modules/sub-modules at the frst level of organization of TS network.

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 11 Vol.:(0123456789) www.nature.com/scientificreports/

S. no Conserved interactions (interologs) Genes forming motifs (conserved motifs) 1 ITK-GRAP2 2 ITK-LCP2 ITK, GRAP2, LCP2 3 GRAP2-LCP2 4 LCP2-FYB 5 EFNB3-EPHB1 RPL12, RPS27A, GNB2L1 6 EFNB3-EPHA4 7 ENAM-FAM20A 8 ENAM-AMELX RPS23, RPS27A, GNB2L1 9 KRR1-FBL 10 FBL-RPS23 11 RPS23-GNB2L1 RPS23, RPL31, GNB2L1 12 RPS23-RPL31 13 GNB2L1-RPL31 14 RPS23-RPS27A 15 RPS27A-GNB2L1 16 RPL12-GNB2L1 17 RPL12-RPS27A 18 JUN-CREB

Table 6. Essential PPI interactions in TS network.

trality, closeness centrality, and eigenvector centrality. Eight such genes were identifed which were common to these categories of Degree distribution and centralities and were also found to be the interacting partners of the seed genes in the TS network. Besides, we identifed 9 key regulators in the present study. Based on the assumption that the genes coding for the interacting proteins of disease-causing genes are putative, we also included the interacting partners of the key regulators of TS in the list (Supplementary Table S6). Monosomy X, commonly called Turner Syndrome is not only limited to humans but such cases have also been reported in many other animal species­ 32. In view of the facts that essential proteins evolve much slower than non- essential proteins­ 33, we identifed the orthologous counterparts of the proteins (Supplementary Table S6) in seven diferent organisms namely, Mus musculus (mice), Rattus norvegicus (Rat), Felis catus (domestic cat), Ovis aries (sheep), Macaca mulatta (rhesus macaque), Gorilla Gorilla (Gorilla), and Homo sapiens (Human) in our study. Animal models are important in generating gain- and loss-of-function of a syndrome/disease and have produced signifcant insights. In the case of TS, however, no animal models have been generated that exactly models it. In this study, we analyzed the status of essential interactions of TS in non-human systems by integrating the orthology with PPI. If two proteins physically interact in one species and they have orthologous counterparts in another species, it is likely that their orthologs interact in that species too. Such conserved inter- actions are called interologs which are of signifcant value in comparative genomics. So, the conserved interactions (interologs) were analyzed in these organisms (mentioned above). It was observed that only 18 protein–protein interactions involving 3 motifs (Table 6, seed genes highlighted in bold, Fig. 7) remained conserved in all organisms. Of these 18 interologs, 10 of them include the interaction of seed genes or key regulators with their neighbor which emphasizes their essential role in a living system. Teir loss or gain of function may somehow afect the physiology of an individual which may result in the loss of an essential function. Terefore, apart from the identifed key regulatory genes, we expect that these predicted interologs too might play a major role in the pathophysiology of TS. It is a matter of research for further insights. Te animal models studied here might prove to be useful in illuminating the biological functions of these genes and the pathophysiology of TS associated with these genes. Clearly, this study does not conclude that these non-human animals are complete models for Turner syndrome as TS involves many genes. However, this is a powerful approach that can be used to select an appropriate model to study human disease. Tus, such types of studies may identify gene targets for drug therapy of these individual pathologies in the general population and the animal models generated may prove useful in the validation of such targets. Discussion TS is a consequence of a partial or total loss of the X chromosome which results in the onset of highly variable clinical features. Surprisingly, our knowledge of genotype–phenotype relations in TS is rather inadequate where very few specifc candidate genes are linked to its clinical features. In this study, we used an integrative network- based approach to extract the information from the microarray datasets of TS. Te study presented here is only a subset of the types of information that these data sets can yield, which requires additional future analyses to provide further insights. It is expected that the causative factors of T2DM and RM in TS are diferent from that of traditional risk factors in the general population. We found out nine genes of T2DM namely, SLC29A2, THBS1, GPRC5B, CSHL1, ADAM22, IGHM, WIZ, IGHD, and COX11, and three genes of RM namely, ATXN7L1, UBE3B, and

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 12 Vol:.(1234567890) www.nature.com/scientificreports/

RPL31 RPL12 GRAP2 Conserved Protein-Protein Interactions (Interologs)

FBL RPS27A

FYB ITK

KRR1 JUN FBL

KRR1 GNB2L1 CARS RPL31 FBL EFTUD2

KRR1 RPL31 EFTUD2 GNB2L1 RPS23

LCP2 JUN RPL12 EARS2 RPL12

POU2F1 NHP2L1 AMELX FAM20A CREB1 GNB2L1

EPHA4 EPHB1 CREB1 RPS23 EFTUD2 RPS27A

FBL RPL12 CARS NHP2L1 EPHA4 EFNB3 ITK EARS2 RPS27A CREB1 RPS23 ENAM RPL31 RPS27A EPHB1 EFNB3 EFTUD2 GRAP2 PTPN22 EPHA4 EPHB1 FYB GRAP2 FBL NHP2L1

KRR1 GNB2L1 KRR1 RPL12 EFNB3 RPS23 LCP2 FYB ITK GRAP2 C-JUN GNB2L1 AMELX FAM20A CXCL13 JUN JUN AMELX FAM20A KRR1 RPL31 LCP2 S1PR4 CREB1 RPL31 ENAM RPL12 GNB2L1 FYB ITK AMELX FAM20A ENAM EARS2 RPS27A RPS23 EFTUD2 FBL ENAM Rpl12 Nhp2l1 H. sapiens FYB LCP2 EPHA4 EPHB1 LCP2 EPHA4 EPHB1 OASL RPS23 Eftud2 Rps27a CREB1 EFNB3 G. gorilla CREB1 NHP2L1 Jun EFNB3 Krr1 Rpl12 RPS27A PTPN22 ITK

Cars Gnb2l1 Fbl Rpl31 GRAP2 EPHA4 EPHB1 JUN GRAP2

Eftud2 Fbl EFNB3 AMELX FAM20A Krr1 Gnb2l1 FYB ITK M. mulatta Rps23 Pou2f1 Nhp2l1 ENAM Fyb Grap2 Creb1 Pou2f1 CARS Ears2 Rpl31 LCP2

Creb1 Rps27a Rps23 Jun AMELX FAM20A EARS2 O. aries Ptpn22 Lcp2 Efnb3 Ephb1 Itk Cxcl13 ENAM

Itk Epha4 Fyb Grap2 Efnb3 Ephb1 Amelx Fam20a S1pr4 F. catus

Epha4 Enam

Lcp2 Amelx Fam20a Cxcl13 R. norvegicus

Enam S1pr4

M. musculus

Figure 7. Interologs in the network from lower to higher organisms. Nodes in yellow are seed genes and nodes in red are seed genes that are the key regulators. Nodes in green are the interacting partners of seed genes.

FANCM in TS. We call these genes as “signature” genes of T2DM and RM in TS. Previously reported studies were found to show the involvement of these genes in these conditions. Te SLC29A2 gene which encodes the protein ENT2 (Equilibrative Nucleoside Transporter 2) has been proved sensitive to dysregulation in diabetes and acts as a target of insulin ­signaling34. It is known that dysfunction of pancreatic β-cell plays a critical role in the development of T2DM. In one of the studies, thrombospondin 1 (THBS1) has been found to play a crucial role in β-cell survival during lipotoxic stress in rat, mouse, and human models which suggests this to be an interesting therapeutic target to prevent oxidative stress in T2DM­ 35. In another study, it was proposed that with the increase in expression of GPRC5B (G Protein-Coupled Class C Group 5 Member B) there is a reduction in insulin secretion and β-cell viability in T2DM­ 36. Tus, GPRC5B too might prove to be a novel target for the prevention of T2DM. Te human GH/CSH genes (CSHL1 being one of them) regulates growth and are involved in fetal and adult glucose . Tis could act as a good target for gestational diabetes and diseases related to insulin ­resistance37,38. ADAM22 (ADAM metallopeptidase domain 22) was found to be a potential target in insulin-resistant (IR) subjects, identifed through an integrative miRNA–mRNA microarray and network approach in adipose tissue of IR and insulin-sensitive (IS) ­individuals39. A decreased expression of OXPHOS (oxidative phosphorylation) genes which included COX11 (Cytochrome c oxidase assembly protein COX11) from pancreatic islets of T2DM patients was found in one of the studies which may lead to impaired insulin ­secretion40. Similarly, FANCM, identifed to be the signature gene of RM in the present study is a DNA- damage response gene. FANCM protein was better expressed in pachytene cells where meiotic recombination occurs­ 41. Terefore, any or structural change in FANCM is expected to provoke meiotic defects result- ing in DNA damage. Te accumulation of such errors may ultimately lead to cell death. Tus, a change in the expression level of FANCM may result in pregnancy loss. While many of these identifed signature genes already have a background for the respective co-morbidity that we are studying here, some of them still are beref of literature. Tus, they could further be studied to get a better insight into their role in TS. As of now, in the turner population, the relationship between T2DM and RM has not been thoroughly iden- tifed. In the context of interacting genes coding for disease-causing putative proteins, we found that FANCM and its interacting partner participate in the DNA repair pathway and Fanconi anemia pathways (also involved in DNA repair). Te Fanconi anemia pathway repairs DNA interstrand crosslinks in the genome. Interestingly, one of the interacting partners of FANCM (signature gene of RM) is FANCL which is the interacting partner of GPRC5B (signature gene of T2DM) in the TS PPI network. Te association between DM (Type 1 and 2) and DNA damage is well ­recognized42,43 but very little is known about DNA damage in pregnancy, particularly when pregnancy is complicated by pre-gestational or gestational diabetes mellitus­ 44. By analyzing the proportions of chromosomes involved, it was observed that it is not just the genomic imbalance of the genes lying on the deleted pseudoautosomal regions of the X chromosome, but the additive infuences of the associated genes located on autosomal chromosomes as well, that may be responsible for TS

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 13 Vol.:(0123456789) www.nature.com/scientificreports/

phenotype. Te developmental transition from fetus to adult requires gene expression changes that help in this transition. When a fetus carries a partial or a total loss of the X chromosome, there is a disturbance in the gene expression as compared to a normal fetus. We identifed a list of genes that changed their expression pattern in transition from TS fetus to TS adult. It is expected that when these genes are activated in adult or fetal tissues in an altered fashion, this may hinder the developmental processes. While it cannot be explained what causes these fuctuations and how these fuctuations afect the phenotype of Turner patient, this would shed light on a new aspect which requires future insights. Tough the TS network shows a weak hierarchy, it exhibits system-level organization involving modules/com- munities which are interrelated. Being hierarchical means that there is no signifcance of individual gene activities rather they work in synchronization to regulate the network. Te leading hubs (high degree genes) present in the network play important functions by integrating the lower degree nodes for organizing and regulating activities like inter and intra crosstalk among various other essential genes and thereby maintains network stability and adjusts its signal processing. Of all the seed genes, we identifed nine key regulators, namely LCP2, PTPN22, CCL22, CXCL5, S1PR4, POU2F1, FAM20A, ENAM, and EFNB3 which infuences network/module regulation and maintains the network stability working as its backbone till the last level. Tese key regulators could be a pos- sible therapeutic target gene for TS. Earlier it has already been established that PTPN22 polymorphism is related to autoimmune disease risk in patients with Turner ­syndrome45. Surprisingly, only two of the key regulators i.e., LCP2 and EFNB3 were among the top 50 high degree hubs. Tus, it is not necessary for the leading hubs to be the key regulators in the network, their popularity can change randomly at diferent levels of an organization. All the KRs maintained a low profle/popularity thereby regulating the network till the last level of organization. Te regulating ability of KRs is more signifcant at the deeper level of the organization. Te network exhibits fractal nature because its topological properties obey a power law, and a strong LCP is also maintained which means that networks are dynamic and heterogeneous. Tis indicates that the network maintains self-organization and is compact and has efectual processing information. Essential evolutionary proteins being more conserved are expected to frequently interact with each other. Based on this fact, we found 10 important interologs (evolutionarily conserved protein–protein interactions) involving the interaction of seed/key genes and their neighbor in TS and 3 motifs in 7 diferent organisms from lower to a higher level. We considered these non-human systems because the cases of monosomy X have already been reported in these organisms­ 46–51. However, in the case of TS, no animal models have been generated that exactly models it. Terefore, as there is a great lack of non-human models to study TS, our fndings through orthologous study update current models of TS, thereby giving a bit clear picture of the interologs which are functional in other lower to a higher level of animal models. Taken together, these results ofer few key regulators and essential genes that may act as therapeutic targets for TS in the future. Although this study uncovers many aspects of TS, there are many limitations such as limited sample size and heterogeneity of the datasets used in this study. As no two turner patients are identical, every Turner is unique with respect to its genotype-phenotype. Te currently available datasets do not allow a more elaborated study at this moment. Tus, a larger sample size would provide a more elaborate result. Another limitation of our study is that we did not use the adjusted p-value for multiple comparisons, rather we used the p-value cut-of < 0.05 (nominal testing) to select maximum number of DEGs of Turner Syndrome. Also, the co-morbidities studied here are heterogeneous in nature and there may be other factors too that may contribute to their occurrence. Despite these biological limitations, our computational approach and the results ofer a comprehensive picture, elucidating the KRs of TS using network biology and demonstrating the importance of animal models in TS, which helps explore and understand diferent aspects of this syndrome. Methodology Retrieval of microarray data (TS, T2DM, and RM). Widely accessible gene expression datasets related to Turner Syndrome (TS), Type 2 Diabetes Mellitus (T2DM), and Recurrent Miscarriage (RM) of Homo sapiens were obtained from the Gene Expression Omnibus (GEO) database of ­NCBI52. Studies evaluated on Afymetrix human gene expression dataset containing samples from both normal and diseased tissues of women were taken. For TS, we retrieved GSE46687 deposited by Bondy et al. and ­GSE5843553. For T2DM, we retrieved GSE23343­ 54 and ­GSE2572455. And in the case of RM, we retrieved GSE22490­ 56 and GSE26787­ 57. Before fnding the difer- entially expressed genes, these datasets were pre-processed to remove the noise of obscure variations of these data to make them cross-comparable. Normalization is a key step in the process of pre-processing to remove such variations in the data. In this study, we used MAS5 ­algorithm58 which is sensitive and selective for identi- fying diferentially expressed genes. MAS5.0 combines the signals from the multiple Perfect-Match (PM) and Mismatch (MM) probes that target each transcript into a single value that sensitively and accurately represents its concentration by calculating a robust average of the (logged) PM-MM values. Afer normalizing the datasets, the diferentially expressed genes were identifed through GEO2r by fltering the genes based on Log­ 2FC and P-value.

Diferentially expressed genes of TS, T2DM, and RM. We performed the comparison on normal vs. disease samples in each GEO dataset to identify diferentially expressed genes (DEGs). We identifed these DEGs through the online program, ­GEO2R59, which is based on limma R ­package60. We chose the “P value < 0.05” and “|logFC|≥ 0.5” as the primary cut-of criteria to interpret the results. In each category (i.e., TS, T2DM, and RM), only those DEGs that satisfed the cut-of criteria in both its datasets were considered as the signifcant DEGs. To obtain the list of overlapping DEGs, we used Venny 2.1.0, an online tool that can calculate the intersection(s) of listed elements. Te enriched functions and biological pathways involved with these DEGs were identifed using

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 14 Vol:.(1234567890) www.nature.com/scientificreports/

DAVID (Te Database for Annotation, Visualization and Integrated Discovery) online ­server61 and REAC- TOME pathway ­browser62.

Protein–protein interaction network construction of TS and their topological properties. To analyze the interactive associations among the DEGs at the protein level, genes obtained from the TS were mapped on protein–protein interaction (PPI) data using STRING database­ 63 to construct the TS PPI network with a medium confdence score with 400 nodes in ­1st shell interactors and 100 nodes in ­2nd shell interactors so that more number of seed genes make into the network. Te Search Tool for the Retrieval of Interacting Genes/ Proteins (STRING) database aims to integrate comprehensive PPI data available from diferent databases for a large number of organisms from published literature with experimental information. Te network was visual- ized in Cytoscape 3.464. Tese DEGs are said to be the seed genes. Te structural properties of complex networks are characterized through the behaviors of their topological parameters. Te topological properties of the TS network were calculated by Network Analyzer and ­CytoNCA65 in Cytoscape. Te topological properties ana- lyzed in the present study are described below.

Degree distribution. In a PPI network, the number of contacts a node/protein has with other nodes/proteins are said to be its degree and the probability distribution of these degrees over the entire network is the degree distribution. Te networks whose degree distributions approximately follow a power law: P(k) ~ ­k−γ, where γ is a constant are termed as scale-free networks and appear linear in a log–log plot. Depending on the value of γ the networks are said to be hierarchical which further specifes the importance of hubs or modules in the ­network66. Te concept of a scale-free network is used to separate biological networks from random networks, which fol- low a Poisson distribution. For a PPI network defned by a graph G = (N, E), where N and E are the number of nodes and edges respectively, the probability of degree distribution (P(k)) is the ratio of the number of nodes with degree k to the network size. n P(k) k = N (3)

where ­nk is the number of nodes having degree k and N is the total number of nodes in the network. P(k) indi- cates the importance of hubs or modules in the network.

Neighborhood connectivity. In a PPI network, when a node/protein ‘n’ forms an association with its neighbor nodes/proteins, the average number of neighbors of all the nearest neighbors of this node ‘n’ is said to be its 67 Neighborhood ­connectivity . In the network ­(CN(k)) Neighborhood connectivity is given by, q C (k) qP N = (4) q k   q where, P(k ) is the conditional probability that a connection belonging to a node with connectivity k points to a node with connectivity q. Te positive power dependence of ­CN(k) indicates assortivity in the network topology.

Clustering co‑efcient. In a PPI network, the clustering coefcient represents the measure and strength of how connected the neighbors of a given node are in that network. It measures the tendency of a node to form a clus- ter. Identifying these modules/communities is signifcant because they can ultimately refect functional modules and protein complexes. When applied to an entire network, the clustering coefcient is its average over all the nodes in the network. It is calculated by the ratio of the number of its nearest neighborhood edges ­ei to the total likely number of edges of degree k­ i. For an undirected network, the clustering co-efcient (C(ki)) of ith node can be calculated by,

2ei C(ki) = k (k 1) (5) i i −

Betweenness centrality. Betweenness centrality (C­ B) of a node in a PPI network measures the degree of infor- mation fow in the network. It is the capacity of a protein/node to monitor communication between other pro- 68,69 teins/nodes in a ­network . If d­ ij (v) indicates the number of geodesic paths from node i to node j passing through node v, and dij indicates the number of geodesic paths from node i to j, then betweenness centrality ­(CB(v)) of a node v can be calculated by,

dij(v) CB(v) = dij (6) i,j,i j k �=�=

Closeness centrality. Closeness centrality ­(CC) measures how fast the fow of information is from a node to other nodes reachable from it in the ­network70. Terefore, it shows how close a node ‘n’ is to all other nodes in a network. C­ C of a node i is the reciprocal of the mean geodesic distance between the node and all other nodes connected to it in the network and is given by,

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 15 Vol.:(0123456789) www.nature.com/scientificreports/

n ( ) CC i (7) = j dij where d­ ij represents the geodesic path length from nodes i to j, and n is the total number of vertices in the graph reachable from node i.

Eigenvector centrality. In a PPI network, eigenvector centrality is a tendency of a node to enable the informa- tion to spread in a network. It measures the signifcance of a node while considering the signifcance of its neighbors. Te main idea behind eigenvector centrality is that connections from signifcant nodes are more important than connections from unimportant nodes. Eigenvector centrality of a node i ­(CE(i)) in a network is proportional to the sum of i’s neighbor ­centralities71, and it is given by, 1 C (i) v E = j (8) j nn(i) =

where nn(i) indicates the nearest neighbors of nodes i in the network. λ is eigen value of the eigenvector ­vi is given by, ­Avi = λvi where A is the adjacency matrix of the network (graph).

Validation of the biological signifcance of TS network. It is said that scale-free networks are robust against random removals of nodes because most nodes are poorly connected, and they play relatively unim- portant roles in organizing the global network structure. Te TS PPI network constructed in this study follows scale-free features and consists of 775 nodes and 20,357 edges. To check whether a similar network would arise if a random set of genes of the same size as the TS DEG-set are used as "seeds," or whether the topology of the resulting network would be obtained by randomly sampling the STRING-db, we constructed the null random networks of the same size and did the comparative analysis. In our study, we constructed the random networks through Network Randomizer ­App72 in Cytoscape considering two diferent scenarios:

• randomization of the constructed TS network by Preserving the Degree.

Trough Network Randomizer, we frst randomized the TS network by preserving the degree of each node. Te degree preserving shufing algorithm permits to randomize the current network considering the degree of each node. Tis means that in the randomized network, a node will have the same number of neighbors, but they can be diferent.

• construction of a random network of the same size through the Erdős and Rényi algorithm.

Next, we generated a random network with 775 nodes and 20,357 edges using an Erdös–Rényi ­model73 in which for each pair of nodes, a link was inserted with independent probability. We used the G (n, M) model to construct the uniform random graph where n is the number of nodes and M is the number of edges. We then did the comparative analysis of these random networks with the main TS network.

Community detection: leading eigen‑vector method and tracing of the genes. Te constructed PPI network is divided into discrete layers of hierarchy. Each layer or tier describes its activity which altogether defnes the modular nature, properties, and organizing principle of the hierarchical network. We used the Lead- ing Eigen Vector method (LEV)74,75 to detect the communities of the network in R from package ‘igraph’76 in this study. Te LEV method calculates the eigen value for each connection, demonstrating the importance of each connection, not nodes. Te modules were detected from the main network, then from the sub-modules of the modules, at each level of hierarchy to fnally obtain the motif. Te seed genes were then traced at each level of organization in various modules/sub-modules obtained from clustering. Te genes reaching the motif level (last level) are the main drivers of the TS network that helps in its regulation. We consider these genes as the most sig- nifcant and infuential ones within the network and call them the key regulators of the network. Te Probability l Px y of KR was then calculated to recognise the regulating ability of each of these KRs in the TS network, [l]  [l] y Px y (9) = E[l]  where x is the number of edges y[l] at level l and E[l] is the total number of edges of the network/ modules/ sub-modules.

Distribution of energy in the network: Hamiltonian energy calculation. Each level of the network is organized and maintained due to a certain level of energy. Tis energy is measured at each level using Hamil- tonian Energy (HE) within the formalism of Constant Potts ­Model77,78. HE calculates the energy distribution at the global level as well as at the modular level. HE of a network or module or sub-module can be calculated by, H[c] e γn2 =− [ c − c ] (10) c

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 16 Vol:.(1234567890) www.nature.com/scientificreports/

where ­ec and ­nc are the number of edges and nodes in a community ‘c’ and γ is the resolution parameter acting as an edge density threshold which is set to be 0.5.

Local‑community‑paradigm (LCP) approach: compactness of the network. LCP-Decomposi- tion Plot (LCP-DP) is a method to characterize the topological self-organization of a network as a local-commu- nity-paradigm (LCP). It is used to study the efect of LCP on network topology. It is a function of the common neighbors (CN) index and local community links (LCL) of each pair of interacting nodes in the network. Tis approach gives information on the number, size, and compactness of the communities in a ­network79. Te CN index between two nodes x and y is the measure of overlapping between their sets of frst node neighbors S(x) and S(y) given by, CN S(x) S y . A signifcant amount of overlapping indicates a possible likelihood of = ∩ interaction of these two nodes. Terefore, an increase in CN represents the increase in compactness of the net- work representing its faster information  processing abilities. Further, the LCLs between the two nodes x and y, 1 whose upper bound is defned by, (LCL) CN(CN 1) , is the number of internal links which is strongly = 2 − inter-linked in local-community (LC). Tese two nodes most probably link together if CN of these two nodes is members of ­LC79. LCP-DP has a linear dependence between CN and √LCL. Te LCP correlation (LCP-corr) is the Pearson correlation co-efcient between the variables CN and LCL defned by cov(CN, LCL) LCP corr (11) − = σCN σLCL

with CN > 1, where cov(CN, LCL) is the covariance between CN and LCL, σCN and σLCL are standard deviations of CN and LCL, respectively.

Status of essential interactions of TS in non‑human systems by integrating the Orthology with PPI. A positive relationship exists between essentialities (essential proteins) and topological proper- ties (centralities) of the proteins in PPI networks. Terefore, a series of network topological features based on centrality measures have been used to recognize essential proteins. Te properties considered in this study are Degree Distribution, Betweenness Centrality, Closeness Centrality, and Eigenvector Centrality. Te proteins of the TS network were graded in terms of these topological properties (top 100 in each category). Tese ranking scores were then used to predict whether a protein is essential or not. Further, the interacting partners of the seed genes/proteins were also identifed in the TS network. Te interacting partners of seed genes/proteins that were common to all these four properties were considered essential. In this study, we predicted the essential proteins in the TS network by integrating the orthology with the PPI network of TS. Te essential proteins are more evolutionarily conserved than non-essential proteins and they frequently interact with each other. To identify the conserved interaction of the TS network, the selected interactions were analyzed in 7 diferent species namely, Mus musculus (mouse), Rattus norvegicus (Rat), Ovis aries (sheep), Felis catus (domestic cat), Macaca mulatta (rhesus macaque), Gorilla Gorilla (Gorilla), and Homo sapiens (human) (lower to higher-level organisms). For this, information on orthologs of selected proteins was taken from Version 8 of the InParanoid database (an ortholog database)80 and Orthodb v10.181. Ten the net- works of the selected organisms were constructed considering these essential interacting proteins as seed genes for the analysis of the conserved interactions from the STRING database with a 0.7 confdence score to get high confdence interactions.

Received: 15 January 2021; Accepted: 6 May 2021

References 1. Gravholt, C. H. et al. Clinical practice guidelines for the care of girls and women with Turner syndrome: proceedings from the 2016 Cincinnati International Turner Syndrome Meeting. Eur. J. Endocrinol. 177, G1–G70 (2017). 2. Karyotype-phenotype analyses across the lifespan. Cameron- Pimblett, A., La Rosa, C., King, T. F. J., Davies, M. C. & Conway, G. S. Te Turner syndrome life course project. Clin. Endocrinol. (Oxf.) 87, 532–538 (2017). 3. Urbach, A. & Benvenisty, N. Studying Early Lethality of 45,XO (Turner’s Syndrome) Embryos Using Human Embryonic Stem Cells. PLoS ONE 4, e4175 (2009). 4. Sun, L. et al. Glucose Metabolism in Turner Syndrome. Front. Endocrinol. 10, 49 (2019). 5. Muntaj, S., Feroze A Ganaie, Purva, S. V., S Radhika & Tilak, P. Karyotypic variables in turner syndrome: A case series. Int. J. Sci. Study 3, 171–175 (2015). 6. Bondy, C. A. Care of girls and women with turner syndrome: A guideline of the turner syndrome study group. J. Clin. Endocrinol. Metab. 92, 10–25 (2007). 7. Davenport, M. L. Approach to the patient with turner syndrome. J. Clin. Endocrinol. Metab. 95, 1487–1495 (2010). 8. Livadas, S. et al. Prevalence of thyroid dysfunction in turner’s syndrome: A long-term follow-up study and brief literature review. Tyroid 15, 1061–1066 (2005). 9. Gravholt, C. H. et al. Body composition is distinctly altered in Turner syndrome: relations to glucose metabolism, circulating adipokines, and endothelial adhesion molecules. Eur. J. Endocrinol. 155, 583–592 (2006). 10. Bakalov, V. K. et al. Bone mineral density and fractures in Turner syndrome. Am. J. Med. 115, 259–264 (2003). 11. O’Gorman, C. S. et al. An evaluation of early cardiometabolic risk factors in children and adolescents with Turner syndrome. Clin. Endocrinol. (Oxf.) 78, 907–913 (2013). 12. Schoepp, M. et al. Coronary calcifcation in adults with Turner syndrome. Genet. Med. 20, 664–668 (2018). 13. Trolle, C. et al. Widespread DNA hypomethylation and diferential gene expression in Turner syndrome. Sci. Rep. 6, 34220 (2016). 14. Ibarra-Gasparini, D. et al. New insights on diabetes in Turner syndrome: Results from an observational study in adulthood. Endocrine 59, 651–660 (2018).

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 17 Vol.:(0123456789) www.nature.com/scientificreports/

15. Bakalov, V. K., Cheng, C., Zhou, J. & Bondy, C. A. X-Chromosome gene dosage and the risk of diabetes in turner syndrome. J. Clin. Endocrinol. Metab. 94, 3289–3296 (2009). 16. Hovatta, O. Pregnancies in women with Turner’s syndrome. Ann. Med. 31, 106–110 (1999). 17. Wang, Y. et al. Relationship between recurrent miscarriage and insulin resistance. Gynecol. Obstet. Invest. 72, 245–251 (2011). 18. Bernard, V. et al. Spontaneous fertility and pregnancy outcomes amongst 480 women with Turner syndrome. Hum. Reprod. 31, 782–788 (2016). 19. Yaron, Y. et al. Patients with Turner’s syndrome may have an inherent endometrial abnormality afecting receptivity in oocyte donation. Fertil. Steril. 65, 1249–1252 (1996). 20. Tian, L., Shen, H., Lu, Q., Norman, R. J. & Wang, J. Insulin resistance increases the risk of spontaneous abortion afer assisted reproduction technology treatment. J. Clin. Endocrinol. Metab. 92, 1430–1433 (2007). 21. Crane, J. P. & Wahl, N. Te role of maternal diabetes in repetitive spontaneous abortion. Fertil. Steril. 36, 477–479 (1981). 22. Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). 23. Silverman, E. K. et al. Molecular networks in Network Medicine: Development and applications. Wiley Interdiscip. Rev. Syst. Biol. Med. e1489 (2020). doi:https://​doi.​org/​10.​1002/​wsbm.​1489. 24. Conte, F. et al. A paradigm shif in medicine: A comprehensive review of network-based approaches. Biochim. Biophys. Acta BBA Gene Regul. Mech. 1863, 194416 (2020). 25. Fiscon, G., Conte, F., Licursi, V., Nasi, S. & Paci, P. Computational identifcation of specifc genes for glioblastoma stem-like cells identity. Sci. Rep. 8, 7769 (2018). 26. Paci, P. et al. Gene co-expression in the interactome: Moving from correlation toward causation via an integrated approach to disease module discovery. NPJ Syst. Biol. Appl. 7, 3 (2021). 27. Álvarez-Nava, F. & Lanes, R. Epigenetics in turner syndrome. Clin. Epigenetics 10, 45 (2018). 28. Clauset, A., Shalizi, C. R. & Newman, M. E. J. Power-Law Distributions in Empirical Data. SIAM Rev. 51, 661–703 (2009). 29. Newman, M. E. J. & Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E Stat. Nonlin. Sof Matter Phys. 69, 026113 (2004). 30. Farooqui, A. et al. Assessment of the key regulatory genes and their Interologs for Turner Syndrome employing network approach. Sci. Rep. 8, 10091 (2018). 31. Licursi, V., Conte, F., Fiscon, G. & Paci, P. MIENTURNET: An interactive web tool for microRNA-target enrichment and network- based analysis. BMC Bioinformatics 20, 545 (2019). 32. Raudsepp, T., Das, P. J., Avila, F. & Chowdhary, B. P. Te pseudoautosomal region and sex chromosome aneuploidies in domestic species. Sex. Dev. 6, 72–83 (2012). 33. Peng, W. et al. Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst. Biol. 6, 87 (2012). 34. Alarcón, S. et al. Defcient Insulin-mediated upregulation of the equilibrative nucleoside transporter 2 contributes to chronically increased adenosine in diabetic glomerulopathy. Sci. Rep. 7, 9439 (2017). 35. Cunha, D. A. et al. Trombospondin 1 protects pancreatic β-cells from lipotoxicity via the PERK–NRF2 pathway. Cell Death Difer. 23, 1995–2006 (2016). 36. Soni, A., Amisten, S., Rorsman, P. & Salehi, A. GPRC5B a putative glutamate-receptor candidate is negative modulator of insulin secretion. Biochem. Biophys. Res. Commun. 441, 643–648 (2013). 37. Sedman, L., Padhukasahasram, B., Kelgo, P. & Laan, M. Complex signatures of -specifc selective pressures and gene conver- sion on Human Growth Hormone/Chorionic Somatomammotropin genes. Hum. Mutat. 29, 1181–1193 (2008). 38. Männik, J., Vaas, P., Rull, K., Teesalu, P. & Laan, M. Diferential placental expression profle of human Growth Hormone/Chori- onic Somatomammotropin genes in pregnancies with pre-eclampsia and gestational diabetes mellitus. Mol. Cell. Endocrinol. 355, 180–187 (2012). 39. Kirby, T. J. et al. Integrative mRNA-microRNA analyses reveal novel interactions related to insulin sensitivity in human adipose tissue. Physiol. Genomics 48, 145–153 (2016). 40. Olsson, A. H. et al. Decreased expression of genes involved in oxidative phosphorylation in human pancreatic islets from patients with type 2 diabetes. Eur. J. Endocrinol. 165, 589–595 (2011). 41. Fouquet, B. A homozygous FANCM mutation underlies a familial case of non-syndromic primary ovarian insufciency. eLife 6, e30490 (2017). 42. Blasiak, J. et al. DNA damage and repair in type 2 diabetes mellitus. Mutat. Res. Mol. Mech. Mutagen. 554, 297–304 (2004). 43. Collins, A. R. et al. DNA damage in diabetes: Correlation with a clinical marker. Free Radic. Biol. Med. 25, 373–377 (1998). 44. Moreli, J. B. et al. DNA damage and its cellular response in mother and fetus exposed to hyperglycemic environment. BioMed Res. Int. 2014, 1–9 (2014). 45. Bianco, B. et al. PTPN22 polymorphism is related to autoimmune disease risk in patients with turner syndrome: PTPN22 poly- morphism in turner syndrome. Scand. J. Immunol. 72, 256–259 (2010). 46. Arnold, A. P. Te mouse as a model of fundamental concepts related to Turner syndrome. Am. J. Med. Genet. C Semin. Med. Genet. 181, 133–142 (2019). 47. Szczerbal, I. et al. X monosomy in a virilized female cat. Reprod. Domest. Anim. 50, 344–348 (2015). 48. Zartman, D. L., Hinesley, L. L. & Gnatkowski, M. W. A 53, X female sheep (Ovis aries). Cytogenet. Genome Res. 30, 54–58 (1981). 49. Bradford, C. M., Tupa, L., Wiese, D., Hurley, T. J. & Zimmerman, R. Unusual turner syndrome mosaic with a triple x cell line (47, X/49, XXX) in a western lowland gorilla (Gorilla Gorilla Gorilla). J. Zoo Wildl. Med. 44, 1055–1058 (2013). 50. Omoe, K. & Endo, A. Relationship between the Monosomy X phenotype and Y-linked S4 (Rps4) in several species of mammals: A molecular evolutionary analysis of Rps4Homologs. Genomics 31, 44–50 (1996). 51. Rosenbluth, J., Perle, M. A., Shirasaki, N., Hasegawa, M. & Wolf, M. E. X-chromosome monosomy in the -defcient rat mutant. Anat. Rec. 226, 396–402 (1990). 52. Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002). 53. Massingham, L. J. et al. Amniotic fuid RNA gene expression profling provides insights into the phenotype of Turner syndrome. Hum. Genet. 133, 1075–1082 (2014). 54. Misu, H. et al. A liver-derived secretory protein, selenoprotein P, causes insulin resistance. Cell Metab. 12, 483–495 (2010). 55. Dominguez, V. et al. Class II phosphoinositide 3-kinase regulates exocytosis of insulin granules in pancreatic beta cells. J. Biol. Chem. 286, 4216–4225 (2011). 56. Rull, K. et al. Increased placental expression and maternal serum levels of -inducing TRAIL in recurrent miscarriage. Placenta 34, 141–148 (2013). 57. Lédée, N. et al. Specifc and extensive endometrial deregulation is present before conception in IVF/ICSI repeated implantation failures (IF) or recurrent miscarriages. J. Pathol. 225, 554–564 (2011). 58. Pepper, S. D., Saunders, E. K., Edwards, L. E., Wilson, C. L. & Miller, C. J. Te utility of MAS5 expression summary and detection call algorithms. BMC Bioinformatics 8, 273 (2007). 59. Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2012).

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 18 Vol:.(1234567890) www.nature.com/scientificreports/

60. Smyth, G. K. limma: Linear models for microarray data. in Bioinformatics and Computational Biology Solutions Using R and Bio‑ conductor (eds. Gentleman, R., Carey, V. J., Huber, W., Irizarry, R. A. & Dudoit, S.) 397–420 (Springer-Verlag, 2005). doi:https://​ doi.​org/​10.​1007/0-​387-​29362-0_​23. 61. Dennis, G. et al. DAVID: Database for annotation, visualization, and integrated discovery. Genome Biol. 4, P3 (2003). 62. Jassal, B. et al. Te reactome pathway knowledgebase. Nucleic Acids Res. gkz1031 (2019) doi:https://doi.​ org/​ 10.​ 1093/​ ​nar/gkz10​ 31​ . 63. von Mering, C. et al. STRING: A database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261 (2003). 64. Shannon, P. et al. Cytoscape: A sofware environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). 65. Tang, Y., Li, M., Wang, J., Pan, Y. & Wu, F.-X. CytoNCA: A cytoscape plugin for centrality analysis and evaluation of protein interaction networks. Biosystems 127, 67–72 (2015). 66. Albert, R. & Barabási, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002). 67. Maslov, S. Specifcity and stability in topology of protein networks. Science 296, 910–913 (2002). 68. Brandes, U. A faster algorithm for betweenness centrality. J. Math. Sociol. 25, 163–177 (2001). 69. Mason, O. & Verwoerd, M. Graph theory and networks in Biology. IET Syst. Biol. 1, 89–119 (2007). 70. Canright, G. & Engø-Monsen, K. Roles in networks. Sci. Comput. Program. 53, 195–214 (2004). 71. Bonacich, P. Power and centrality: A family of measures. Am. J. Sociol. 92, 1170–1182 (1987). 72. Tosadori, G., Bestvina, I., Spoto, F., Laudanna, C. & Scardoni, G. Creating, generating and comparing random network models with NetworkRandomizer. F1000Research 5, 2524 (2017). 73. Erdös, P. & Rényi, A. On the evolution of random graphs. in Te Structure and Dynamics of Networks 38–82 (Princeton University Press, 2011). doi:https://​doi.​org/​10.​1515/​97814​00841​356.​38. 74. Newman, M. E. J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 1 (2006). 75. Newman, M. E. J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103, 8577–8582 (2006). 76. Csardi, C. & Nepusz, T. Te igraph sofware package for complex network research. Int. J. Comp. Syst. 1695, 1–9 (2006). 77. Traag, V. A., Van Dooren, P. & Nesterov, Y. Narrow scope for resolution-limit-free community detection. Phys. Rev. E 84, 1 (2011). 78. Traag, V. A., Krings, G. & Van Dooren, P. Signifcant scales in community structure. Sci. Rep. 3, 1 (2013). 79. Cannistraci, C. V., Alanis-Lobato, G. & Ravasi, T. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci. Rep. 3, 1 (2013). 80. Sonnhammer, E. L. L. & Östlund, G. InParanoid 8: Orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res. 43, D234–D239 (2015). 81. Kriventseva, E. V. et al. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47, D807–D811 (2019). Acknowledgements A.F. and S.T. are fnancially supported by the Indian Council of Medical Research under SRF (Senior Research Fellowship). A.A. and S.H. are grateful to Jazan University for providing access to Saudi Digital Library for this study. Author contributions R.I. and A.F. conceived the model. A.F. did the numerical experiments. A.F. and M.M. prepared the fgures of the numerical results. A.F., A.A., S.H., and N.T. analyzed and interpreted the results. A.F. wrote the manuscript. A.F., A.A., S.H., N.T., M.M., S.T., S.A., and R.I. were involved in the study and reviewed the manuscript.

Competing interests Te authors declare no competing interests. Additional information Supplementary Information Te online version contains supplementary material available at https://​doi.​org/​ 10.​1038/​s41598-​021-​90171-0. Correspondence and requests for materials should be addressed to R.I. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access Tis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

© Te Author(s) 2021

Scientifc Reports | (2021) 11:10662 | https://doi.org/10.1038/s41598-021-90171-0 19 Vol.:(0123456789)