Holird REPORT
Total Page:16
File Type:pdf, Size:1020Kb
CBA HoliRD REPORT: Ellis-Van Creveld Syndrome Marina Esteban Medina María Peña-Chilet Carlos Loucera Joaquín Dopazo Clinical Bioinformatics Area - FPS Sevilla, January 13, 2020 Collaborators: Dr.Víctor Ruiz Pérez ’s group - U760 CIBERER Research group at Instituto de Investigaciones Biomédicas "Alberto Sols" CSIC-UAM, Madrid. CBA Objectives and methodology: The Holistic Rare Disease project (HoliRD) aims to build Diseases Maps for as many Rare Diseases as possible and to model them to systematize research in drug repurposing. In order to achieve this purpose several databases such as ORPHANET, OMIM, HPO, PubMed, KEGG, STRING, as well as the literature is used to collect all the up-to-date knowledge of the diseases under study and defining a Disease Map that contains the functional relationships among the known disease genes, as well as the functional consequences of their activity. Then, a mechanistic model that accounts for the activity of such map is used. The HiPathia algorithm, which has successfully proven to predict cell activities related to cancer hallmarks (Hidalgo et al., Oncotarget 2017; 8:5160-5178; Hidalgo et al., Biol Direct. 2018;13:16) as well as the effect of protein inhibitions on cell survival (Cubuk et al., Cancer Res. 2018; 78:6059-6072) is used to simulate the activity of the disease map. Finally, machine learning algorithms are used to find other proteins, already target of drugs with another indication, which display a potential causal effect on the activity of the previously defined disease map. The drugs that target these proteins are potential candidates for repurposing. Schematic representation of the method used. Examples of the use of this approach can be found in Esteban-Medina et al., BMC Bioinformatics. 2019, 20(1):370. CBA Report This report describes the results of the different steps of the HoliRD approach applied to Ellis-Van Creveld Syndrome Identification of genes highly related to the rare disease (RD) under study in Orphanet/OMIM A total of 2 genes annotated as Ellis-Van Creveld Syndrome (EVC) were found in the OMIM database. EVC highly related genes Disease ID Entrez ID Gene Symbol OMIM:225500 132884 EVC2 OMIM:225500 2121 EVC Identification of highly related HPO to the RD under study: A total of 32 HPO codes associated to EVC with specificity >=7 were selected. EVC highly related HPOs HPO ID HPO term Specificity level HP:0000008 Abnormality of female internal genitalia 8 HP:0000028 Cryptorchidism 10 HP:0000039 Epispadias 15 HP:0000047 Hypospadias 15 HP:0000072 Hydroureter 8 HP:0000190 Abnormal oral frenulum morphology 9 HP:0000233 Thin vermilion border 10 HP:0000486 Strabismus 7 CBA HP:0000668 Hypodontia 12 HP:0000684 Delayed eruption of teeth 11 HP:0000691 Microdontia 11 HP:0000774 Narrow chest 8 HP:0001161 Hand polydactyly 18 HP:0001241 Capitate-hamate fusion 21 HP:0001249 Intellectual disability 7 HP:0001629 Ventricular septal defect 9 HP:0001631 Atrial septal defect 9 HP:0001696 Situs inversus totalis 9 HP:0001800 Hypoplastic toenails 10 HP:0001829 Foot polydactyly 16 HP:0002488 Acute leukemia 12 HP:0002857 Genu valgum 19 HP:0002967 Cubitus valgus 10 HP:0002983 Micromelia 9 HP:0005048 Synostosis of carpal bones 19 HP:0006695 Atrioventricular canal defect 7 HP:0008678 Renal hypoplasia/aplasia 8 HP:0008921 Neonatal short-limb short stature 9 HP:0009882 Short distal phalanx of finger 27 HP:0010306 Short thorax 7 HP:0011065 Conical incisor 14 HP:0011830 Abnormal oral mucosa morphology 9 CBA Identification of genes that shared at least RD-HPO codes Genes with >= 10 EVC-HPO codes Gene Symbol Entrez Gene Symbol Entrez Gene Symbol Entrez ARVCF 421 ROR2 4920 CD96 10225 COMT 1312 PTPN11 5781 CPLX1 10815 CTBP1 1487 RAD21 5885 PIGN 23556 DHCR7 1717 RFC2 5982 NIPBL 25836 DVL1 1855 RREB1 6239 SETBP1 26040 DVL3 1857 TBX1 6899 TCTN3 26123 ELN 2006 HIRA 7290 IFT172 26160 EVC 2121 UFD1 7353 TBL2 26608 FANCB 2187 KDM6A 7403 DYNC2LI1 51626 FGFR1 2260 CLIP2 7461 FGFRL1 53834 FGFR3 2261 WHCR 7467 BCOR 54880 FGFR2 2263 NSD2 7468 SETD5 55209 FLNA 2316 WNT5A 7474 HDAC8 55869 GJA1 2697 MKKS 8195 ARID1B 57492 GLI1 2735 SMC1A 8243 WDR35 57539 GLI3 2737 NAA10 8260 WDR19 57728 GP1BB 2812 OFD1 8481 PIEZO2 63895 GTF2I 2969 TP63 8626 NXN 64359 KRAS 3845 BAZ1B 9031 EVC2 132884 CBA LETM1 3954 SMC3 9126 B3GLCT 145173 LIMK1 3984 LONP1 9361 CEP120 153241 SMAD4 4089 GTF2IRD1 9569 UBR1 197131 KMT2A 4297 SEC24C 9632 HYLS1 219844 JMJD1C 221037 KIF7 374654 Genes with >= 13 EVC-HPO codes Gene Symbol Entrez Gene Symbol Entrez DHCR7 1717 TP63 8626 EVC 2121 DYNC2LI1 51626 FGFR3 2261 BCOR 54880 FGFR2 2263 WDR35 57539 FLNA 2316 EVC2 132884 GLI1 2735 KIF7 374654 GLI3 2737 Genes with >= 32 (all) EVC-HPO codes Gene Symbol Entrez EVC 2121 GLI1 2735 DYNC2LI1 51626 EVC2 132884 In order to maintain the specificity and not over expand the Disease Map of action only genes with >=13 EVC-HPO codes were selected. CBA Location of the selected disease related genes in KEGG pathways to define the Disease Map of action. After locating the RD associated genes within KEGG pathways, a total of 101 circuits belonging to 9 KEGG pathways were found as part of the disease map. KEGG pathway KEGG-pathway code MAPK signaling pathway hsa04010 Ras signaling pathway hsa04014 Rap1 signaling pathway hsa04015 cAMP signaling pathway hsa04024 PI3K-Akt signaling pathway hsa04151 Hedgehog signaling pathway hsa04340 Focal adhesion hsa04510 Signaling pathways regulating pluripotency of hsa04550 stem cells. Regulation of actin cytoskeleton hsa04810 HiPathia is a signal propagation algorithm that considers pathways as collections of circuits defined as sub-pathways or sequences of proteins connecting signal receptor proteins to effector proteins. HiPathia uses expression values genes as proxies of the level of activation of the corresponding protein in the circuit. Taking into account the inferred protein activity and the interactions between the proteins (activation or inhibition) defined in the pathway, the level of activity of a circuit is estimated using a signal propagation algorithm. Ultimately, effector proteins are annotated with a cellular function. CBA In order to enable a better visualization of the RD Map the HiPathia viewer has been used. The circuits that define the RD Map are marked in RED (please ignore the color legend). The pathways that contain these circuits are highlighted in the right window with a red arrow. The only purpose of this report is to represent the components (genes and interaction) and functions of the circuits that compose the RD Map. Click to access the RD Map Report HiPathia uses KEGG pathway for the graphical representation of the circuits. The original pathways can also be visualized in the KEGG repository https://www.genome.jp/kegg/pathway.html Select prefix: hsa (Organism) Enter keywords: e.g. FoxOsignalingpathway (any HiPathia pathway) Prediction of relevance of gene targets from approved drugs extracted from DRUGBANK database (release 5.1.4) The HoliRD approach takes the mechanistic model of the disease map as the proxy for the molecular basis of the disease outcome. Then, a Multi-Output Random Forest (MORF) regressor, a machine learning algorithm that predicts the circuit activities across the whole disease map, is trained on GTEx gene expression data to find proteins (which are targets for drugs with indications for other diseases) that correctly predict the behavior of the disease map. The drugs targeting the best predictor proteins are candidate for drug repurposing. The relevance score accounts for the accuracy of the prediction contributed by each individual protein. Relevance are absolute values and do not account for the direction of the prediction, that is, if the interaction is an activation or an inhibition. CBA From a total of 683 targets for approved drugs (AT) in the DRUGBANK database (release 5.1.4) the machine learning algorithm selected the 51 most relevant ones (top AT). Entrez Gene symbol Relevance score Entrez Gene symbol Relevance score 4921 DDR2 0.1246788241 4915 NTRK2 0.0062247423 558 AXL 0.1055667695 10544 PROCR 0.0062172438 5159 PDGFRB 0.0690853304 6608 SMO 0.0053865627 6560 SLC12A4 0.0465522187 11255 HRH3 0.0049937396 1956 EGFR 0.0431943115 7035 TFPI 0.0045877157 2335 FN1 0.0413354519 2064 ERBB2 0.0044009706 1277 COL1A1 0.0337826148 3351 HTR1B 0.0042685683 3561 IL2RG 0.0317543927 5740 PTGIS 0.0041923015 5627 PROS1 0.0304667474 5156 PDGFRA 0.0041010268 3688 ITGB1 0.025228295 3791 KDR 0.0037065533 6785 ELOVL4 0.0213337609 55800 SCN3B 0.0034422123 775 CACNA1C 0.0210501489 302 ANXA2 0.0030536432 2247 FGF2 0.0186886186 9475 ROCK2 0.0030046783 6093 ROCK1 0.0143943183 6715 SRD5A1 0.0029581293 5916 RARG 0.0139642827 784 CACNB3 0.0028630476 781 CACNA2D1 0.0137847594 8913 CACNA1G 0.0027950197 87 ACTN1 0.0127013274 5241 PGR 0.0026874201 3785 KCNQ2 0.011057306 55 ACPP 0.0026819434 774 CACNA1B 0.0103482963 2152 F3 0.0025191135 2261 FGFR3 0.0088571926 2556 GABRA3 0.0024852433 715 C1R 0.0077938263 64127 NOD2 0.0024850797 5745 PTH1R 0.0072131923 3269 HRH1 0.0024098319 136 ADORA2B 0.0068923465 3459 IFNGR1 0.0023183231 2260 FGFR1 0.0067774556 5578 PRKCA 0.0022983954 716 C1S 0.0066619628 6869 TACR1 0.0022750919 301 ANXA1 0.0062367681 CBA Relevance plot depicting the 51 most relevant gene targets. Drugs from DRUGBANK db (release 5.1.4) that target top AT. And the list of drugs that target the 51 most relevant genes follows: You can click on the hyperlink of the Drug ID to see more detailed information about the drug in DrugBank DB.