A logical functional analysis pipeline

Nicos Angelopoulos and Giorgos Giamas

WCB, 8/9/2014 – p.1 . . . introduction

... it is becoming increasingly clear, that the ability to reason at the highest possible level with the available data has a profound effect on the kind of questions scientists can address. It is indeed the view of experimental data as knowledge which is becoming an important realisation among the scientists in the biological sciences. Traditionally, data analysis has been viewed as an independent exercise it is increasingly becoming apparent though, that data analysis is now taking a more central role with important high-impact papers focusing on evaluating results rather than producing unique datasets.

WCB, 8/9/2014 – p.2 why Prolog?

Clear computational paradigm Knowledge representation and reasoning Robustness Memory management Program and data duality High level of abstraction Calling R functions Indexing Database interactions Operating systems support WCB, 8/9/2014 – p.3 what is lacking ?

Lack of programmers Language development and stability Critical mass Performance

WCB, 8/9/2014 – p.4 Real - Integrative statistics with R

Run R functions on Prolog data.

Reductionist design

? − x ← mean([1, 2, 3]). ? − X ← x. X = 2.0 ? − y ← [1, 2, 3],x ← mean(y).

WCB, 8/9/2014 – p.5 knockouts

Important molecules in signalling.

65 knock outs

WCB, 8/9/2014 – p.6 SYPL1 6 MUC5B FTH1 dysregulation patterns BTF3 TCOF1 5 ERGIC1 FTL CEACAM5 SQRDL S100P 4 RHPN2 QSOX1 CAMSAP2 MUC5AC 3 HUWE1 MAP7 NCOR1 PES1 2 AKR1A1 FHOD1 PXDN DCXR MAP7 1 NEDD8 UBTF RCN2 SUPT16H 0 SSRP1 PHF3 ASMTL DCAF8 H1F0 POLR1A BAG6 PNN IVD OXSR1 ASNS PIK3C2B ADH5 RABL6 HEATR1 AGRN LIMA1 MROH1 PREX1 ASS1 ITGB4 GALNT7 COBLL1 PLCB3 731788 NEDD4 UACA PSME3 ITGB4 ARMCX3 ANO6 ACTN1 BOP1 SCIN ABCA2 PYGM EPB41L1 SLC38A2 GYS1 SCARB1 CELSR2 ACTN4 ZNF185 WDR26 JAK1 PGR TMEM164 SQSTM1 AGL AKR1C2 CD44 CA12 WCB, 8/9/2014 – p.7 ABL2 BTK CSF1R CSK DDR1 EPHA3 EPHA6 EPHA7 EPHB2 EPHB6 ERBB3 FES FGFR1 FLT3 FYN clusters based on dysregulation

EPHA1

HCK MET EGFR NTRK3 NTRK1 SRC PDGFRB PTK6 LMTK2 MERTK IGF1R MST1R INSR EPHB1 RYK EPHB4 EPHA2 FRK FGFR2 ABL1 FLT1 AXL TYRO3 EPHA4 ROR2 EPHB3 ZAP70 ERBB2 MATK ERBB4 CSF1R LMTK3 ABL2 PTK2B EPHA6 NTRK2 BTK RET FGFR1 PTK7 TEC EPHB2 SYK ERBB3 TNK1 EPHA7 ROR1 CSK JAK2 KDR DDR1 YES1 FLT3 TYK2 JAK1

PTK2 EPHA3 FYN TNK2

FES

LYN EPHB6

STYK1

WCB, 8/9/2014 – p.8 cluster populations

15

10 Cluster 1 2 3 4 5 6

Members 7 8 9 10 5

0

1 2 3 4 5 6 7 8 9 10 Clusters

WCB, 8/9/2014 – p.9 level 2 GO biological process

1 2 Cell communication 3 4 5 Cell cycle 6 7 8 Reproduction 9 10 Metabolic process

Transport

Immune system

Growth

Development

Cellular process

Cellular component organization

Apoptosis

Biological regulation

Cell adhesion

WCB, 8/9/2014 – p.10 0 20 40 60 80 4-carbohydrate metabolic process

ARFGEF1 DOLPP1 GBA GLYR1 GNS HK1 ISYNA1 NFKB1 PHKA1 PAPSS2 PGM3 RAE1 SEC24A SLC3A2 SDC1 MUC5B

PGK1 CD44 AGRN GALNT6

PDHB PCK2

PC PFKL

GPD2 OGDH

IDH3A UGP2

DLAT GPI

ENO1 GALK1

GO:0005975, carbohydrate metabolic process. pval:6.531864e-5 count: 32 size: 804.

WCB, 8/9/2014 – p.11 doxorubicin

3 Confers MCF7 resistance 2 sensitivity DU4475 Clusters3 1 0 7 CAL51 0 Clusters2 −1 0 CAL148 2 −2 3 HCC38 7 −3 8 CAL851 Clusters 1 3 HCC2157 4 5 6 HCC1599 7 8

HCC1806 Effect resistant MDAMB468 sensitive IC_50 MDAMB134VI 2.35 −4.45 BT20

HS578T

HCC70

EVSAT

UACC812

MDAMB453

MDAMB415

HCC1395

HCC1143 PACSIN2 RNF213 TOM1 PITPNB AHNAK CAPZB BLVRB GOLGA3 H2AFX HIST1H4A ACAA2 PRCC KPNA2 MYH10 RCN2

WCB, 8/9/2014 – p.12 knockout effects on TKs Color Key and Histogram Count 0 1000 2000 3000 0 0.5 1 1.5 Value

ABL1 ABL2 AXL BTK CSF1R CSK DDR1 EGFR EPHA1 EPHA2 EPHA3 EPHA4 EPHA6 EPHA7 EPHB1 EPHB2 EPHB3 EPHB4 EPHB6 ERBB2 ERBB3 ERBB4 FES FGFR1 FGFR2 FLT1 FLT3 FRK FYN HCK IGF1R INSR JAK1 JAK2 KDR LCK LMTK2 LMTK3 LYN MATK MERTK MET MST1R NTRK1 NTRK2 NTRK3 PDGFRB PTK2 PTK2B PTK6 PTK7 RET ROR1 ROR2 RYK SRC STYK1 SYK TEC TNK1 TNK2 TYK2 TYRO3 YES1 ZAP70 AXL LCK LYN FES BTK FRK FYN TEC RET SYK CSK RYK KDR HCK MET SRC FLT1 FLT3 INSR JAK1 JAK2 ABL1 ABL2 TYK2 PTK2 PTK6 PTK7 TNK1 TNK2 YES1 DDR1 MATK EGFR ROR1 ROR2 IGF1R ZAP70 PTK2B STYK1 LMTK2 LMTK3 FGFR1 FGFR2 CSF1R NTRK1 NTRK2 NTRK3 EPHA1 EPHA2 EPHA3 EPHA4 EPHA6 EPHA7 EPHB1 EPHB2 EPHB3 EPHB4 EPHB6 ERBB2 ERBB3 ERBB4 TYRO3 MST1R MERTK PDGFRB

WCB, 8/9/2014 – p.13 software

db_facts 44/19 db-tables-as-facts & SQL layer proSQLite 109/33 SQLite interface pubmed 14/3 Access pubmed publication records Real 68/11 Integrative statistics with R

mtx working with matrices go_string GO term STRING graphs map_id mapping across nomenclatures func functional analysis pipeline

WCB, 8/9/2014 – p.14