<<

Predicting Responses by Propagating Interactions through Text-Enhanced Drug-Gene Networks

Shiyin Wang Massachusetts Institute of Technology [email protected]

ABSTRACT response analysis. At that time, researchers have already cast inter- Personalized drug response has received public awareness in recent ests in the variation of personal drug response. However, because years. How to combine gene test result and drug sensitivity records of the limitation of data collection and analysis models, precision is regarded as essential in the real-world implementation. Research medicine did not arouse widespread concern until these decades[12]. articles are good sources to train machine predicting, inference, Recent years, integrating big data, including clinical data, genetic reasoning, etc. In this project, we combine the patterns mined from data, genomic data, intervention history, etc., has received expecta- biological research articles and categorical data to construct a drug- tions for facilitating characterization of side effects and predicting gene interaction network. Then we use the cell line experimental drug resistance. records on gene and drug sensitivity to estimate the edge embed- Network Science results are applied to the data of gene expres- dings in the network. Our model provides white-box explainable sion profiles by integrating personalized data to predictive records predictions of drug response based on gene records, which achieve by similarity analysis, which reveal the hidden correlations of un- 94.74% accuracy in binary drug sensitivity prediction task. observed patterns[11, 24]. Intuitive speaking, people sharing sim- ilar genetic patterns and clinical treatments tend to show similar CCS CONCEPTS drug responses. Network-based approaches reveal potential drug- biomarker correlations effectively for some diseases. However, cur- • Information systems → Retrieval tasks and goals; Informa- rent methods in this aspect are limited in the correlation modeling tion extraction; of the network, and the data accessed is limited in the structured data. KEYWORDS Data Mining communities have been trying to discover, identify, network science, data mining, drug response structure, and summary relationships between biological entities through various text mining techniques across open-access bio- 1 INTRODUCTION logical research publications[17, 19, 20, 22]. One of the drawbacks Precision medicine depends on discovering the correlations and ca- of this approach is the accuracy of the results is not high enough sualties between observed data(gene, past clinical records, etc) and to play a deterministic role in the decision process. Instead, they unobserved future performance(side effects, drug effects, immune are provided to the human experts as an additional reference to response, etc). Categorical relations have been collected from experi- fasten the knowledge discovery process. On the other hand, Re- ments, while more complex meta-pattern information is mined from cursive Neural Networks has shown its power to deal with text scientific publications. To bridge the gap between these two types data[9], demonstrating considerable success in numerous tasks such of data collecting sources, we propose a relation embedding method as image captioning[7, 14, 26] and machine translation[2, 5, 10]. to expand the representation space of the relationships between This technique can be applied to convert text meta-patterns into genes and drug responses. To the best of my knowledge, this is the trainable embeddings to represent the relations among entities. first approach to combine text patterns into the existing network. Nodes in the network represent to drug responses(sensitivities, side 3 MODEL effects, etc.) and genes. In this section, we make definitions and formalize the problem. The arXiv:1906.08089v1 [cs.SI] 19 Jun 2019 The model pipeline consists of two parts. First, we construct a whole project is consist of two parts: constructing network skeleton relation network of gene and drug response from existing format- and inference edge embedding. ted data sources. In the same time, we mine data from scientific In the first part, we first collected all the entities Definition. 1 publications automatically, retrieving meta-patterns indicating the from multiple data sources. Then we mined the categorical rela- relations between gene and drug responses. Finally, a novel graph tionships and descriptive relationships Definition.2. Patterns are convolutional neural network based method is applied to balance extracted based on the discovered entities and relationships. After the information collected in the two types of data sources. that, meta-patterns are extracted from descriptive patterns to be a high-level representation of entities relations. Regarding entities 2 RELATED WORKS as nodes in the network, we add edges between two nodes if they Drug discovery plays an essential role in the improvement of hu- appear to have either structured or descriptive relationships. man life[8] as the had become a well-defined and Definition 1 (Entity). Entities, denoted by e, refer to the names respective scientific discipline. Dating back to the 20th century, of chemicals, genes, or diseases which can be indexed by a MESH id or researchers across distinct disciplinary areas, including analytical Entrez id. For example, SAH is a disease entity with MESH id D013345. chemistry and biochemistry, demonstrated the values for the drug Definition 2 (Relation). Relationships, denoted by r, represent Algorithm 1 Representation Learning the latent interactions between two entities ei , ej . Relations can be for Epoch 1, 2,..., N do either categorical(such as “blocker", “binder") or descriptive(such as for Iteration 1, 2,...,T do “With the increase of ei , there is a significantly drop of ej expression Calculate vˆi = G(vi ,bi , λi , Rei,ej ) by Equation. 1 level"). Calculate L by Equation. 5 Update v ← v − α ∂L Definition 3 (Pattern). The pattern, denoted by (r, ei , ej ), is i i ∂vi ∂L a tuple of one relationship and two entities. Patterns are categorical Update bi ← bi − α ∂bi or descriptive concerning the type of the corresponding relationships. ∂L Update λi ← vi − α Generally, patterns are accessed by directly parsing datasets. Meta- ∂λi end for Patterns are the simplified version of descriptive patterns. Calculate vˆi = G(vi ,bi , λi , Rei,ej ) by Equation. 1 In the second part, we estimate the representations of nodes Calculate L by Equation. 5 ∂L and edges Definition. 4. We model the estimated representation of Update R , ← v − α ei ej i ∂R , v is calculated by the average neighbor entities to multiple edge ei ej i end for representations as Equation. 1. Definition 4 (Edge Representation and Node Representa- tion). Edge representation, denoted as Rei,ej ∈ M, is the quantified version of relationship defined on a function family. In this project, and gene druggability information from 30 different sources. Figure ?? shows the frequency of drug-gene interaction types in DGIdb, because of the computation limitation, we set M = Rk×k to be a k showing a long-tailed distribution. The inhibitor occurs 9982 times matrix of size k × k. Then nodes are represented as a vector vi ∈ R , in DGIdb, which is 40.79% of the whole interaction records. The bias bi , and scale λi . The estimated value of vi is given by: second most frequent interaction is , which occurs 5333 λ i Õ times and accounts for 21.79%. There are 30 types of relationships. vˆi = Rei,ej vj + λibi (1) |N (ei )| ej ∈N (ei ) We assume the founded structured relations are correct and precise. Having built the network skeleton, we then need to learn the 4.2 Mining Descriptive Relationship representation Rei,ej , λi , bi , vi of edges and nodes. In the datasets, researchers record TPM (transcripts per million) PubMed abstracts are a good source to obtain descriptive patterns. to quantify genes and use intensity to measure drug sensitivity. We investigated a subset of annotated PubMed data by PubTator[23]. To convert representations into a numeric format, we deploy a We selected all the sentences that contain two annotated chemicals fully connected neural network with one hidden layer of size 2 and or genes. The total size of the file is approximate 2 gigabyte, which sigmoid activation. is too complex to take the original long sentences into the model. The first reason is that many patterns may occur a few timesif ui = S(vi ) (2) we use sentences to model the relations directly. Secondly, our We define the loss function as: computation power does not have enough memory to process large 1 Õ graphs. Therefore, we further process our descriptive patterns to L = (uˆ − u )2 (3) дene N i i short representations, which we call meta-patterns. дene i ∈Gene 1 Õ 2 Lchem = (uˆi − ui ) (4) 4.3 Extract Meta-Patterns Nдene i ∈Chemical For decades, numerous biological researchers describe their find- L = Lдene + βLchem (5) ings on interactions between chemicals, , genes, etc., by natu- ral languages. The adequate biological research literature provide 4 IMPLEMENTATION a good source to extract the information using information re- 4.1 Collecting Structured Data trieval methods[1, 16, 18]. We collected 3614796 abstracts from PubMed and used the PubTator dataset to discover all the named Drug-Gene interactions are well studied in the past biological re- entities in chemicals, diseases, and genes. Meta pattern[1, 21] is searches, such as CancerCommons, CF:Biomarkers, CGI, NCI, Drug- defined as a frequent, informative, and precise subsequence pat- 1 Bank, PharmGKB, DoCM, etc. SIDER provides adverse drug re- tern in certain context. The first step of meta-pattern discovery is actions (ADRs) representing drug responses, which contains 1430 context-aware segmentation, which estimates the contextual bound- drugs, 5880 ADRs and 140,064 drug-ADR pairs. The Cancer Thera- aries of meta-patterns. We use the method developed by Wang[21], 2 peutics Response Portal (CTRP) links genetic, lineage, and other which extracted relationships from a subset of the abstracts on the cellular features of cancer cell lines to small-molecule sensitivity PubMed[6] with the semi-supervise from CTD database3. For ex- with the goal of accelerating the discovery of patient-matched can- ample, the meta-patterns in this sentence from a PubMed abstract cer therapeutics. In this paper, we use the drug-gene interaction are shown in the Figure. 1 database (DGIdb)[4, 13, 15], which contains drug-gene interactions

1sideeffects.embl.de 2portals.broadinstitute.org/ctrp/ 3http://ctdbase.org 2 Figure 1: An example of pattern and meta-pattern extraction process from a sentence in a PubMed paper.

Patients with aneurysmal (SAH) often develop patients with h_1 hydrocephalus requiring an external ventricular drain (EVD).

often develop h_2 MESH:D006849 MESH:D013345

Patients with aneurysmal subarachnoid hemorrhage (SAH) often develop hydrocephalus requiring an external ventricular drain (EVD). requiring an external ventricular drain (EVD) h_3

aneurysmal subarachnoid hemorrhage (SAH) develop hydrocephalus A develop B h_meta

Table 1: Datasets summary 4.5 Inference Individual Drug Responses To determine the parameters in the model, we acquire cell line Dataset # Gene # Drug # Relations Format experiment records of 58035 gene expression on 934 human cancer DGIdb 36815 9370 42727 Categorical cell lines [3] and 3723759 drug sensitivity test records on 1065 cell PubTator 36815 9370 42727 Descriptive lines[25]. We assume the same cell lines perform similarly in these PubMed4 2575 6199 10530 MetaPattern two data sources. Then we can align these two data sources by cell RNA-seq[3] 58035 - - CL Records line names. The data summary is shown in Table. 1. GDSC[25] - 16 - CL Records Because we have to use experimental data to train the represen- Skeleton 335 587 3321 - tations, we need to shrink the graph, deleting useless nodes. We pick the nodes from observable entities(red) with distance 1 or 2 to the observable entities. The results are shown on Figure. 5a, 5b, 5c, 5d. 4.4 Linking Categorical Relations and We set the embedding dimension to be 4. The learning rate of Descriptive Relations gene optimizer is 0.1 so that it can converge quickly in 1. The One of the challenges in this project is the entity linking across learning rate of chemical is 0.001. multiple datasets, which refer to the alignment of entity mentions to We have done experiments on the network of Figure. 5b. The its entity id. This process is very tedious because of the messy index binary classification accuracy is 91.15%. Because network data is format. For example, chemicals(drugs) have CID, PMID, Chembl, not structured, it is hard to draw a ROC or some other training Cas Number, etc. We use the entity name as a validation criterion. visualization. Though the same entity may have different names in different data sources, those names are similar with high probability. Therefore, we can measure the correctness of our entity linking algorithm by 5 EVALUATION comparing merged names. The summary of all the data sources used in this project are listed in Table. 1 2. We test our model on the extracted 539 drug-gene records linked by cell-line. The data contain test records of 7 drugs and 21 genes, and about 42.12% cells are missing. The inadequacy of data limits Table 2: Datasets the performance of complex models. We compare with Logistic Regression and Support Vector Machine. We use sklearn package Dataset URL for Logistic Regression, with the L1 penalty and max interation 10000. We use radial basis function kernel in SVM model with auto DGIdb http://www.dgidb.org gamma. Our method outperforms these two baselines. PubTator https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/ Our model can be explainable because it is based on the repre- Demo/PubTator sentation of categorical and descriptive relations. Human experts PubMed https://www.ncbi.nlm.nih.gov/pubmed can guide the model by changing the network skeleton. Therefore, RNA- https://www.ebi.ac.uk/gxa/experiments/E- it can potentially avoid the risk of black-box decision and provide seq[3] MTAB-2770/Results an interpretation of its decisions. GDSC[25] https://www.cancerrxgene.org

3 Figure 2: The visualization of all mined interactions among genes and drugs. The label of the genes are their Entrez ID. The label of chemicals are their names. (# drugs=587, # genes=335, # interactions=3321)

argatrobanGENE_2147 GENE_1111 obeticholic_acid disulfiram GENE_217 folic_acid

menadione enoxacin GENE_9971 GENE_5467 GENE_316 GENE_488 GENE_2348 GENE_1181

tretinoin GENE_1524 GENE_462 capecitabine GENE_5916 GENE_5604 adapalene GENE_9002 GENE_5915 selegiline GENE_5606 rasagiline tazarotene GENE_5914 GENE_5599 GENE_7298 GENE_1558 GENE_3708 GENE_4128 GENE_7166 GENE_1573 leucovorin GENE_5338 GENE_4129 GENE_1312entacaponelansoprazolebaclofen hexamethoniumtropisetron tamibarotene hexane dicumarol GENE_7535 thioperamidetriprolidinenisoxetine GENE_3643 sperminedesloratadineloratadine GENE_6850GENE_1612GENE_6416varenicline tripelennamineacarbosetianeptinequineloranemetoclopramidebupropion pramipexolemethoctraminecitric_acidfluorouracil diphenhydramineketotifen arecolinebethanechol GENE_6788tolvaptan phenelzinedihydroergotaminesibutramineGENE_268 scopolaminebrucinealcuronium oleic_acid GENE_478 pirenperoneecopipampalmitic_acidhydrochloric_acidnitroprusside GENE_5608 GENE_5601 GENE_3716 solifenacinvincamineacetohexamide echothiophate doxepindapoxetinevenlafaxine GENE_3717 oxotremorineGENE_3620 GENE_3718 acetonesertraline GENE_6531tolcaponepirenzepineGENE_9 strychnine GENE_590 GENE_1138promethazinenafadotrideGENE_59340 oxytocin GENE_6258 GENE_4160physostigmine GENE_146GENE_6257 astemizolemazindolGENE_6580 GENE_7297 lurasidonefluphenazinequetiapineGENE_3756zotepineloxapinedesipramineepibatidineGENE_3767 GENE_1577 GENE_3768 formaldehydepyrilamine GENE_1075 GENE_6833 GENE_163702 sumatriptanpaliperidoneziprasidonefolinic_acidsertindole staurosporineGENE_1137GENE_79001 GENE_8850rivastigmine GENE_846GENE_3269methysergideGENE_6530 levodopaoxybutynin pilocarpine GENE_3290lopinavir GENE_5734 GENE_3362thioridazineamitriptylineGENE_1813quinpirole GENE_3710 racloprideGENE_1128tolterodineGENE_554 GENE_6536 tacrine ergotaminemetergolinechlorpromazineGENE_6532histamineGENE_1814methamphetaminecitalopram methimazoleGENE_728 econazoleparoxetinesarpogrelateclomipramineGENE_12aripiprazoleGENE_1815 coenzyme_aGENE_9970corticotropinGENE_43 treprostinilGENE_6256 norfluoxetine olanzapineterguride glyburidearachidonic_acid plumbagin octanol nortriptylinesulpiriderisperidoneGENE_3358apomorphineperospironetrazodoneserotoninacetylcholinevasopressinreserpine GENE_481 GENE_55244magnesiumprocainamidemianserinmaprotilineGENE_3356 ketanserinmethylphenidatetolbutamideGENE_3757 garcinolGENE_2268 doxazosinacetic_acid malathion carbenoxolonemetronidazoleGENE_2707 metoprinerauwolscineritanserinGENE_147GENE_1816doxylamine GENE_7054 GENE_3055 GENE_5731GENE_2709dihydrotetrabenazine GENE_3350haloperidolfamotidineimipramineprazosin_hydrochlorideGENE_1576salicylic_acidethacrynic_acidritonavirdonepezilGENE_7173GENE_3172GENE_7525 GENE_2697 ca2 GENE_148amphetamine phenylarsine_oxide GENE_6714 hydrochlorothiazide GENE_3783bexarotene clonidine GENE_3064GENE_25GENE_640 GENE_6559GENE_5732GENE_2701 pindololGENE_7226GENE_150GENE_3357dopamine rottlerinGENE_338442 GENE_2706riluzole atropinenicotinewarfarin GENE_5591GENE_4067 GENE_6716 cimetidinenoradrenalinefluoxetine GENE_2033GENE_4158mevastatinGENE_7225 amprenavir terbutaline_sulfateGENE_59341iloprost clenbuterolprazosinmiconazolenefazodonephenylephrine streptozotocin GENE_55107modafinil yohimbinebuspironeGENE_6571GENE_240GENE_5295 parathion lamotriginepractolol GENE_155 GENE_5293GENE_5294GENE_2475GENE_5590GENE_5296dasatinib GENE_1436GENE_613 GENE_839 bendroflumethiazideabacavirpyrimethamineranolazinechlorthalidone clotrimazoleGENE_5291linoleic_acid GENE_7498 GENE_23476 flufenamic_acidtolmetinGENE_759 diazepamGENE_5290rosuvastatin GENE_5587GENE_5289GENE_27 vemurafenibstavudine GENE_24145 carvedilolepinephrinecarbamazepineketoprofenzonisamideniacin tyrosinelovastatinGENE_5579bosentanimatinib_mesylatethymidineGENE_1910 GENE_1235 acetazolamidedobutaminechlorzoxazoneGENE_6915 aspirinmelatoninpropylthiouracilwortmanninamoxicillinGENE_5580 GENE_834nelfinavir dabrafenib terutrobanfumagillinflecainideGENE_84649alprenolol GENE_771propranololmesalamine GENE_5578cerivastatin GENE_673GENE_1909 isosorbide_dinitrateGENE_2492 pirinixic_acidmetoprololGENE_153acetaminophen quercetinGENE_3156allopurinolmercaptopurineenalaprilat suraminprocaterolfenofibric_acidverapamilgabapentinGENE_6261 fenofibraterosiglitazoneisoniazidtelmisartan GENE_4544isoprenalineGENE_6331niflumic_acidGENE_5465ibuprofenGENE_1080 losartan erlotinibGENE_10663 GENE_760ruthenium_red GENE_154 amiodaronemethohexitalhyperforin GENE_2066 dantroleneGENE_1180xylazinenaveglitazarGENE_5743caffeineGENE_5468pioglitazonegallic_acidpravastatinoxazepam cilazaprilmidostaurin GENE_5155 bisoprololnifedipineindomethacin GENE_5159nilotinib oxcarbazepinealbuterol ryanodineGENE_1719ethinyl_estradiolmetformin phenobarbitalsimvastatingemcitabineetoposidegefitinibGENE_3815GENE_2065GENE_837 endothelin_2 flurbiprofenGENE_2166diclofenactroglitazonemethotrexateangiotensin_iiflavoneenalapril sitagliptinpazopanib GENE_5154 GENE_2890 olodaterollumiracoxibnaloxonediflunisal testosteronegemfibrozil galanginvalproic_acidGENE_5156 icotinibsunitinib phosphoenolpyruvate GENE_2931 GENE_29110 procaineoleoylethanolamideaconitineGENE_3784diclofenac_sodium atenololcapsaicin sulfasalazineclofibrategenisteincurcuminvincristinechrysincandesartanimidapril GENE_2321 rimonabant naproxennimesulideketoconazolecelecoxibnebivololtamoxifenresveratrolbenzbromaronesirolimusramiprilpirfenidone betaxololGENE_2562 bezafibratemorphinedoxorubicinatorvastatin sorafenib GENE_5894 GENE_9290 lidocaineoxaprozin GENE_3791GENE_1956paricalcitol GENE_5979 GENE_3551primidoneadenosine_triphosphateGENE_775ciprofibratezileutonciglitazoneraloxifenediethylstilbestrolpaclitaxelrifampicin everolimus GENE_207GENE_2324 fenoterol piroxicamrofecoxibdexamethasone colchicineGENE_203068 GENE_9734 irinotecancholesterolvinblastineGENE_4023 GENE_2064GENE_1636GENE_4363 GENE_2260 salmeterol terbutalinephenylbutazonespironolactoneestradiol GENE_6241GENE_9759 lithium_carbonate GENE_7442bardoxolone_methylhalothanepregnenolonetheophylline GENE_185arsenic_trioxidelapatinibclofarabineGENE_5244GENE_1543GENE_55869nitroglycerin felbamatedesfluranediltiazem GENE_6554hydroxyflutamidefluticasonesulindacbisphenol_adaidzeinbortezomibzoledronic_acidGENE_2101GENE_2322 hydroxyureasulfinpyrazone formoterolmexiletineflutamidemeloxicam mitoxantrone perindoprilGENE_8841 GENE_1803 vildagliptinGENE_116085 anandamide GENE_2852progesteroneGENE_135aldosteroneestrone docetaxelGENE_4233lonidamineperifosineGENE_3066 GENE_1268GENE_2902 probenecidnimodipinetoremifenepentobarbitalGENE_2099 anastrozolethalidomideGENE_5950captopril GENE_3091 GENE_3459triazolam GENE_238 GENE_367fulvestrantGENE_120892estriolequolGENE_28234 calcitriol GENE_6774 GENE_2149capsazepinebicalutamidefelodipineGENE_6573 hesperetinGENE_2103GENE_7153valsartanGENE_3065GENE_1545benazepril isradipineresiniferatoxincodeineGENE_4988GENE_4306hydrocortisoneGENE_10599eplerenone GENE_6579 GENE_2932 GENE_9356 GENE_729230 montelukastbetamethasonerimexolone ezetimibelithocholic_acid GENE_7421trichostatinapicidin propofolneca GENE_2908 GENE_10381sodium_butyrateGENE_10376GENE_4843 azacitidine GENE_9817montelukast_sodiumprobucol pyrazinamide GENE_10000 GENE_55867GENE_2936sorbinil medroxyprogesterone_acetatecyproterone_acetateraloxifene_hydrochlorideidoxifeneGENE_81027aminoglutethimidenaringinGENE_596GENE_5705GENE_50484sildenafilcalcitoninbelinostat GENE_5682 GENE_10057 GENE_2740 GENE_7124 vorinostatlinagliptin decitabine GENE_5027GENE_9376GENE_4598fluticasone_propionatenandrolonebazedoxifeneirbesartan pentoxifyllineGENE_581GENE_5700GENE_208 GENE_1555 GENE_10498beclomethasone_dipropionatebudesonideGENE_9636mifepristonedesmosterolGENE_2159furosemide GENE_7157GENE_5698GENE_1588fludarabine cephalothin testosterone_propionateacolbifenedrospirenoneGENE_4791glycyrrhizincyclosporine GENE_887tramadollasofoxifenechlorotrianiseneGENE_2100GENE_1586 GENE_7296 GENE_10013GENE_1788 enzalutamidediarylpropionitrile GENE_2357 GENE_2645 cholecalciferol naloxone_hydrochloridedadlesufentanilestradiol_valerateGENE_5241hexestrolmitotanemetyrapone ebselen fentanylprednisoloneolmesartan lisinopril GENE_1786 bicarbonate naltrexone GENE_1017GENE_1642GENE_2224GENE_7422 GENE_231 triamcinolone_acetonidegeneticinGENE_1021GENE_1019GENE_1022 GENE_19GENE_1240GENE_10800naltrindolemorin GENE_8694 1400w GENE_2833 mometasone_furoateflunisolide adenosineprednisone GENE_79885 inositol carmustine GENE_151306GENE_5478carfilzomib GENE_5347 farnesol GENE_8647GENE_3579 GENE_1583GENE_29881doxycycline GENE_23410 GENE_213 albendazole dioscin chelerythrine GENE_3577 mebendazole GENE_1234 niclosamide GENE_1584GENE_9429fadrozole deoxycholic_acid juglone levonorgestrelGENE_7376GENE_10062GENE_2859octopamine pentagastrin norethindronedydrogesteronenorethynodrel sodium_lauryl_sulfate GENE_1232 betulinic_acid terameprocol GENE_332 cholic_acid GENE_5540 GENE_4316GENE_1585 guanine GENE_23411 GENE_22933 naphthalene hypoxanthine marimastat splitomicin GENE_142 GENE_308 GENE_6093 GENE_4313 sodium_stibogluconate methionine_sulfoxide GENE_5777 GENE_7852 GENE_4312 neurotensin GENE_4923

hydroxocobalamin iodoacetamide acetohydroxamic_acid GENE_4594 GENE_5609 GENE_59350 GENE_9475 GENE_3973

benserazide leflunomide cilastatin GENE_1644 GENE_1723 GENE_1800

(a) Display entity names (b) Not display entity names

Figure 4: 28 entities are contained in cell line records, which are colored with red. We extracted them and their neighbors within distance k. Then we pruned unimportant unobserved entities.

GENE_4160 nilotinib formaldehyde octanol imatinib finasteride corticotropin GENE_5159 terfenadine GENE_5979 dutasteride nelfinavir GENE_4594 GENE_9429 GENE_9 hydroxocobalamin GENE_673 venlafaxinefluvoxaminedapoxetine oxytocin GENE_5979 GENE_5156 GENE_5478 pirenperoneecopipam letrozole GENE_6716 GENE_3791 GENE_3756tolcapone norfluoxetine thymidine GENE_2322 GENE_3710 folinic_acidquetiapine palmitic_acid thymidine GENE_3791 sorafenib GENE_3815 GENE_6580 ergotaminezotepineziprasidone GENE_2321 sunitinib GENE_59340 metergolinefluphenazinepaliperidonepyrilaminelurasidone GENE_24145GENE_2709GENE_2707GENE_2697GENE_2706 GENE_5894 GENE_3362 GENE_146 GENE_2701 sorafenib GENE_163702 perifosine GENE_846 GENE_3269methysergide cyproheptadine vasopressin paroxetineamitriptyline chlorpromazinesertindole irbesartan GENE_4544 finasteride imatinib levodopa sarpogrelate pazopanibsunitinib GENE_2260 sufentanil trazodonethioridazine GENE_6532 phentolamine hydroxyurea GENE_338442 donepezilnortriptyline terbutaline_sulfate fluticasone_propionate GENE_5155 GENE_2322 octanol GENE_6530 GENE_12 GENE_338442GENE_6261 nilotinib GENE_5156 histamine GENE_1813apomorphine olodaterol xylazine terguridemianserin naloxone GENE_6716 cyclosporine GENE_1815GENE_1814GENE_3358 acetic_acid GENE_5734 albuterol ca2GENE_2562 metoprine GENE_3356 ca2 GENE_5159 GENE_2260 GENE_2707 GENE_554 GENE_3815 GENE_2321 GENE_2324 fenoterol beclomethasone_dipropionatecyclosporine GENE_5154 GENE_6554 GENE_1075 terbutaline_sulfate formoterol GENE_2357 GENE_2324 GENE_2706 GENE_3783 acetylcholine formaldehyde GENE_554 mercaptopurine carbenoxolone maprotilinehaloperidol GENE_3350methamphetamine GENE_6554salmeterolflufenamic_acid GENE_1816ketanserinGENE_150GENE_147pindolol GENE_59341GENE_55244 erlotinib GENE_2697 GENE_1128 clozapine procainamide GENE_3579 GENE_3357 betaxololGENE_3783 GENE_185 niacinecopipam valproic_acid GENE_2701 GENE_148 GENE_5731 loxapine tryptophan clenbuterol fumagillin olanzapine practolol nebivolol fludarabine palmitic_acid propofol GENE_7226 GENE_55107 erlotinib GENE_2709 famotidine dobutamine GENE_7226 GENE_3577 GENE_6331 melatonin procaterol sumatriptan dopamineclonidine noradrenalinecimetidine medroxyprogesterone_acetate loxapine diazepamnefazodone alprenolol GENE_846 metoprolol ethacrynic_acid GENE_3362methysergide GENE_81027 phenylephrineepinephrine practolol GENE_2852 GENE_7153 GENE_2357 salicylic_acid mirtazapine vasopressin GENE_2908 GENE_154 quinpirole GENE_10376 diflunisalterbutalinefluticasone fluphenazine GENE_2166 pyrimethamine clenbuterol GENE_55107 GENE_6241 treprostinil flufenamic_acid propranololtolmetin dobutamine isoprenaline GENE_3362 GENE_6261pentobarbital atropine metoprolol albuterol clomipramine caffeine GENE_2099 fluoxetine GENE_10381 GENE_24145 GENE_6571 GENE_1816 GENE_10599 GENE_5293 GENE_153GENE_154 ethinyl_estradiol GENE_155 GENE_84649irbesartan GENE_28234 xylazinemethamphetamine cyproheptadinehaloperidol GENE_5291 zoledronic_acid haloperidolserotonin GENE_153 metforminnaproxen gemcitabine thioridazine niflumic_acid GENE_5290 olodaterol sertraline atropine risperidone propylthiouracil GENE_1719 prazosin_hydrochloride angiotensin_ii caffeine acetaminophen GENE_5294 isoprenaline oxytocin clotrimazole GENE_2740 ergotamine glycyrrhizin clozapineserotonin metergoline clozapine iloprost GENE_5731 atenolol doxorubicinbisphenol_avincristineestrone GENE_1956 clonidine olanzapineamitriptyline hyperforinethacrynic_acid carvedilol GENE_155 GENE_5468 phentolamineapomorphine GENE_203068 aldosterone atenololGENE_6915 hydrocortisoneGENE_10599 ritanserin ibuprofenquercetin GENE_5743 telmisartan phenylephrine niacin octanol chlorpromazine gemcitabine ethinyl_estradiol losartan betaxolol GENE_6530 GENE_367 ketanserinmianserin cyproheptadine lithocholic_acid phenytoin GENE_1180 imipramine losartan mitoxantrone fenoterol quinpirole ca2 GENE_3357 indomethacin estradiol GENE_28234 isoprenaline chrysinGENE_3156 GENE_5732 GENE_6532 phenylephrine ritanserinrisperidone melatonin phenobarbitalmorphine testosteronetroglitazone doxazosin methylphenidate epinephrine GENE_203068 nebivololsalmeterol fluoxetinetryptophan indomethacin sodium_butyrate noradrenaline GENE_338442 aspirin GENE_613 GENE_6579 spironolactonerofecoxib olodaterol GENE_150 methotrexatediclofenacgemfibrozil amitriptyline GENE_1128 icotinib procaterol norfluoxetine methotrexate pregnenolone GENE_2475bisoprololfenoterol arsenic_trioxide GENE_81027 GENE_1816 GENE_27 GENE_9 desipramine noradrenaline daidzeinlapatinib clenbuterol GENE_147rauwolscine niacin prazosin_hydrochloride tamoxifen spironolactonecelecoxib angiotensin_ii GENE_338442 rauwolscine buspirone GENE_5743 gallic_acid GENE_3357 trazodone rauwolscine GENE_55869 GENE_185 fluvoxamine etoposide terbutaline GENE_2709 GENE_1719 progesterone nimesulide GENE_5465fluticasone ergotamine testosteronetroglitazone GENE_2697 pyrimethamine apigenin terbutaline formoterol GENE_3459 paclitaxel terbutaline_sulfate epinephrine sarpogrelate irinotecan GENE_25GENE_8841 diethylstilbestrol dexamethasone mirtazapine GENE_10381 dobutamine terguride fluoxetine yohimbine vinblastine melatoninniflumic_acid pindolol ritonavir gallic_acid piroxicam metergoline ritanserin prazosin clonidine yohimbine acetic_acid vincristine tamoxifen bezafibratemetformin serotonin cholesterol GENE_4160 formoterol GENE_3066 valproic_acidamiodarone resveratrol zileuton GENE_3756 methysergide aspirin gefitinib GENE_147 docetaxel capsaicin resveratrol GENE_154 folinic_acid GENE_5468 GENE_775 sarpogrelate GENE_3350GENE_3357 genistein vinblastine clofarabine GENE_1585 dopamine GENE_6573 estradiol miconazole GENE_10376GENE_775 salmeterol GENE_153 prazosin GENE_1436GENE_9759 raloxifene irinotecan GENE_3356 bortezomib flufenamic_acid bisphenol_a GENE_1816 methotrexate fumagillin bisoprolol carvedilol GENE_3815 clofibratecurcumin dutasteride GENE_147 rifampicin doxazosin carbenoxolone phentolamine GENE_150 doxorubicin GENE_5159 gemcitabine fenofibric_acid isradipine risperidone salicylic_acid GENE_9734 etoposide meloxicamflutamide GENE_3358 GENE_148 morphine arsenic_trioxide GENE_2099 GENE_367 sumatriptan folinic_acid acetylcholine prazosin_hydrochloride imatinib_mesylate GENE_5156 paclitaxeldoxorubicin zotepine corticotropin mirtazapine GENE_3064imatinibnilotinib colchicine olanzapine betaxolol metoprolol etoposide gefitinibhydrocortisone sulindac thioridazineGENE_150 hydroxyflutamide thalidomide GENE_4843 atenolol aspirin prazosin fluticasone_propionate bezafibrateapigenincolchicine GENE_1588 albuterol mirtazapine indomethacin dabrafenib phenylbutazoneGENE_2908 terguride tyrosine doxazosinpindolol GENE_2740 GENE_3362apomorphine carbamazepine nifedipine GENE_207 testosterone medroxyprogesterone_acetateestronemercaptopurine acetaminophenpropylthiouracil everolimus noradrenaline GENE_673 bortezomib vinblastine neca felodipine dopamine isoniazid GENE_3065 clozapine simvastatin fulvestrant budesonide yohimbine GENE_4158 nebivololalprenolol propranolol phenylephrine equol arsenic_trioxidemitoxantronetoremifene chlorpromazineGENE_12quinpirole beclomethasone_dipropionate ketanserin sirolimus clonidine terfenadinelurasidonecyproheptadine GENE_4023 GENE_3065 colchicine epinephrine sodium_butyrate lapatinib GENE_2852 calcitriol perifosine tamoxifen estradiol trichostatin pirfenidone estriol vincristineGENE_120892 thymidine GENE_3269 GENE_2322 carvedilol nimesulide pazopanibsunitinib GENE_1813 indomethacin GENE_10376 anastrozole GENE_10381 fluphenazine rosuvastatin practolol GENE_7153 vemurafenib GENE_3791 phentolamine mianserinGENE_1814 famotidine ketoconazolecapsaicin albendazole bicalutamideGENE_10376 GENE_1815 GENE_4023 GENE_7153 GENE_5294 letrozole exemestane bisoprolol hesperetin everolimus bisoprololacetic_acid phenytoinGENE_3156 gemcitabine GENE_81027 sorafenib loxapine prazosin_hydrochloride GENE_10381 GENE_5155 idoxifeneclofarabine GENE_203068GENE_81027 promethazine GENE_5291GENE_2475 propranolol GENE_4843 GENE_9636 resveratrol isoprenaline GENE_153 midostaurinperifosine nelfinavir GENE_6554 paclitaxel raloxifene_hydrochlorideperindoprildocetaxel histamine amphetamine GENE_5290 GENE_5154 GENE_5293zonisamide fenofibrate zoledronic_acid methamphetamineraclopride irinotecan mebendazole clenbuterol GENE_154 valproic_acid nimesulide atenolol GENE_2260 GENE_2322aminoglutethimide lonidamine GENE_6241 GENE_5465 ciprofibrate GENE_3791 trichostatin GENE_1585 GENE_1719 GENE_203068sirolimus dobutamine sulpiride GENE_6571 cyclosporine doxorubicin fadrozole nelfinavir GENE_4158 vincristine metoprolol GENE_2321 letrozole erlotinib GENE_9734 GENE_5894 GENE_1956 GENE_5156 practolol GENE_2859 GENE_6331GENE_1180 gemfibrozil arsenic_trioxide everolimus albuterol hydroxyurea finasteride tolcaponeGENE_59340levodopa GENE_5159 etoposide GENE_5979 icotinib GENE_4313 procainamide GENE_6915GENE_2166 GENE_3815 sorafenibGENE_8841 irbesartan vinblastine GENE_2324 GENE_207 pyrilamine sunitinib GENE_10599 methotrexate alprenolol mebendazole dasatinib erlotinib GENE_7296 xylazine nilotinib docetaxel formoterol GENE_1588 midostaurin mercaptopurine GENE_10000 pentobarbital GENE_9759 salmeterolterbutaline GENE_6554 fludarabine albendazole GENE_7153 betaxolol GENE_5734 metoprine GENE_2260 glycyrrhizin zoledronic_acid GENE_208 amprenavir dutasteride GENE_55869 meloxicamfurosemide GENE_4594 nebivolol imatinib GENE_2324 olodaterol reserpine pazopanib hydroxocobalamin fluticasone GENE_2224 GENE_3066 GENE_5979 fenoterol octanol GENE_2562 iloprost azacitidine pyrimethamine pirfenidone GENE_1719 ca2 stavudine exemestane marimastat fenofibric_acid rifampicin corticotropin GENE_673 GENE_2321 fludarabine GENE_4312 nitroglycerin GENE_163702 perindopril GENE_27 GENE_6716 propofol GENE_25 GENE_5894 GENE_1956 GENE_55244 GENE_1436imatinib_mesylate cyclosporine fadrozole treprostinil GENE_613 pyrimethamine GENE_2709 GENE_4312 GENE_2697 GENE_163702 marimastat GENE_4316 GENE_4160 GENE_3577 GENE_5731GENE_5732 propofol vemurafenib GENE_55244 GENE_2740 flufenamic_acid GENE_1180 isradipine GENE_10599 carbenoxolone GENE_1585 GENE_1584 sufentanil GENE_3577

(a) k=1 (b) k=1, pruned (c) k=2 (d) k=2, pruned

Table 3: Model Performance Attention Mechanism. Because of the limitation of gene-drug records, we do not apply sophisticated design in this model. Intu- Method Logistic Reg. SVM Ours itively, the drug responses will be better modeled if we allow the Accuracy 93.75% 90.14% 94.74% weight of different relations to change according to their current Explainable ✗ ✗ ✓ value.

7 CONCLUSION Research articles are excellent sources for machines to predict, infer, 6 FUTURE WORK reasoning, etc. By aggregating multiple data sources, we study and Collect More Records. The experiment records are not adequate analyze the association of drug and gene by constructing networks to test our model in scale. Patient profiles will be a good data source with categorical relations and descriptive relations. The resulted for our model. But due to the privacy restrictions, we do not have network skeleton presents the relationships between entities. We them right now. The application of our model on clinical data can model the relations as a kernel function between entities. After also help to provide precise and interpretable patient profiles. that, we use cell line experiment records to estimate the parameters 4 of our model, which achieves 94.74% accuracy on predicting the 2017). arXiv:cs.CL/1702.04457v2 drug sensitivity for cell lines. [17] Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han. 2018. Automated phrase mining from massive text corpora. IEEE Transactions on Knowledge and Data Engineering 30, 10 (2018), 1825–1837. ACKNOWLEDGMENTS [18] Jingbo Shang, Meng Qu, Jialu Liu, Lance M Kaplan, Jiawei Han, and Jian Peng. 2016. Meta-Path Guided Embedding for Similarity Search in Large-Scale Hetero- Thank Dr. Qi Li for supporting meta-pattern results on PubTator. geneous Information Networks. arXiv.org (Oct. 2016). arXiv:cs.SI/1610.09769v1 [19] Dibakar Sigdel, Vincent Kyi, Aiden Zhang, Shaun P Setty, David A Liem, Yu Shi, Xuan Wang, Jiaming Shen, Wei Wang, JiaWei Han, et al. 2019. Cloud-Based REFERENCES Phrase Mining and Analysis of User-Defined Phrase-Category Association in [1] Meng Jiang 0001, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M Kaplan, Biomedical Publications. JoVE (Journal of Visualized Experiments) 144 (2019), Timothy P Hanratty, and Jiawei Han 0001. 2017. MetaPAD - Meta Pattern e59108. Discovery from Massive Text Corpora. CoRR cs.CL (2017). [20] Xuan Wang, Yu Zhang, Qi Li, Yinyin Chen, and Jiawei Han. 2018. Open in- [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma- formation extraction with meta-pattern discovery in biomedical literature. In chine translation by jointly learning to align and translate. arXiv preprint Proceedings of the 2018 ACM International Conference on Bioinformatics, Compu- arXiv:1409.0473 (2014). tational Biology, and Health Informatics. ACM, 291–300. [3] Jordi Barretina, Giordano Caponigro, Nicolas Stransky, Kavitha Venkatesan, [21] Xuan Wang, Yu Zhang, Qi Li, Yinyin Chen, and Jiawei Han. 2018. Open Informa- Adam A Margolin, Sungjoon Kim, Christopher J Wilson, Joseph Lehár, Gre- tion Extraction with Meta-pattern Discovery in Biomedical Literature. ACM, New gory V Kryukov, Dmitriy Sonkin, Anupama Reddy, Manway Liu, Lauren Murray, York, New York, USA. Michael F Berger, John E Monahan, Paula Morais, Jodi Meltzer, Adam Korejwa, [22] Xuan Wang, Yu Zhang, Qi Li, Cathy H Wu, and Jiawei Han. 2018. PENNER: Judit Jané-Valbuena, Felipa A Mapa, Joseph Thibault, Eva Bric-Furlong, Pichai Pattern-enhanced Nested Named Entity Recognition in Biomedical Literature. Raman, Aaron Shipway, Ingo H Engels, Jill Cheng, Guoying K Yu, Jianjun Yu, In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Peter Aspesi, Melanie de Silva, Kalpana Jagtap, Michael D Jones, Li Wang, Charles IEEE, 540–547. Hatton, Emanuele Palescandolo, Supriya Gupta, Scott Mahan, Carrie Sougnez, [23] Chih-Hsuan Wei, Hung-Yu Kao, and Zhiyong Lu. 2013. PubTator: a web-based Robert C Onofrio, Ted Liefeld, Laura MacConaill, Wendy Winckler, Michael Reich, text mining tool for assisting biocuration. Nucleic research 41, W1 (2013), Nanxin Li, Jill P Mesirov, Stacey B Gabriel, Gad Getz, Kristin Ardlie, Vivien Chan, W518–W522. Vic E Myer, Barbara L Weber, Jeff Porter, Markus Warmuth, Peter Finan, Jennifer L [24] Haixiu Yang, Yunpeng Zhang, Jiasheng Wang, Tan Wu, Siyao Liu, Yanjun Xu, and Harris, Matthew Meyerson, Todd R Golub, Michael P Morrissey, William R Sellers, Desi Shang. 2018. Global view of a drug-sensitivity gene network. Oncotarget 9, Robert Schlegel, and Levi A Garraway. 2012. The Cancer Cell Line Encyclopedia 3 (Jan. 2018), 3254–3266. enables predictive modelling of anticancer drug sensitivity. Nature 483, 7391 [25] Wanjuan Yang, Jorge Soares, Patricia Greninger, Elena J Edelman, Howard Light- (March 2012), 603–607. foot, Simon Forbes, Nidhi Bindal, Dave Beare, James A Smith, I Richard Thompson, [4] A. Basu, N. E. Bodycombe, J. H. Cheah, E. V. Price, K. Liu, G. I. Schaefer, R. Y. Sridhar Ramaswamy, P Andrew Futreal, Daniel A Haber, Michael R Stratton, Ebright, M. L. Stewart, D. Ito, S. Wang, A. L. Bracha, T. Liefeld, M. Wawer, J. C. Cyril Benes, Ultan McDermott, and Mathew J Garnett. 2013. Genomics of Drug Gilbert, A. J. Wilson, N. Stransky, G. V. Kryukov, V. Dancik, J. Barretina, L. A. Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in Garraway, C. S. Hon, B. Munoz, J. A. Bittker, B. R. Stockwell, D. Khabele, A. M. cancer cells. Nucleic Acids Research 41, Database issue (Jan. 2013), D955–61. Stern, P. A. Clemons, A. F. Shamji, and S. L. Schreiber. 2013. An interactive [26] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. resource to identify cancer genetic and lineage dependencies targeted by small Image captioning with semantic attention. In Proceedings of the IEEE conference molecules. Cell 154, 5 (Aug 2013), 1151–1161. on computer vision and pattern recognition. 4651–4659. [5] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). [6] Allan Peter Davis, Cynthia J Grondin, Robin J Johnson, Daniela Sciaky, Benjamin L King, Roy McMorran, Jolene Wiegers, Thomas C Wiegers, and Carolyn J Mat- tingly. 2017. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Research 45, D1 (Jan. 2017), D972–D978. [7] Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geof- frey Zweig, and Margaret Mitchell. 2015. Language models for image captioning: The quirks and what works. arXiv preprint arXiv:1505.01809 (2015). [8] Jürgen Drews. 2000. Drug discovery: a historical perspective. Science 287, 5460 (2000), 1960–1964. [9] Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015). [10] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effec- tive approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015). [11] Jörg Menche, Emre Guney, Amitabh Sharma, Patrick J Branigan, Matthew J Loza, Frédéric Baribaud, Radu Dobrin, and Albert-László Barabási. 2017. Integrating personalized gene expression profiles into predictive disease-associated gene pools. npj Systems Biology and Applications 3, 1 (March 2017), 10. [12] Reza Mirnezami, Jeremy Nicholson, and Ara Darzi. 2012. Preparing for precision medicine. New England Journal of Medicine 366, 6 (2012), 489–491. [13] M. G. Rees, B. Seashore-Ludlow, J. H. Cheah, D. J. Adams, E. V. Price, S. Gill, S. Javaid, M. E. Coletti, V. L. Jones, N. E. Bodycombe, C. K. Soule, B. Alexander, A. Li, P. Montgomery, J. D. Kotz, C. S. Hon, B. Munoz, T. Liefeld, V. Dan?ik, D. A. Haber, C. B. Clish, J. A. Bittker, M. Palmer, B. K. Wagner, P. A. Clemons, A. F. Shamji, and S. L. Schreiber. 2016. Correlating chemical sensitivity and basal gene expression reveals . Nat. Chem. Biol. 12, 2 (Feb 2016), 109–116. [14] Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In CVPR, Vol. 1. 3. [15] Brinton Seashore-Ludlow, Matthew G Rees, Jaime H Cheah, Murat Cokol, Ed- mund V Price, Matthew E Coletti, Victor Jones, Nicole E Bodycombe, Christian K Soule, Joshua Gould, et al. 2015. Harnessing connectivity in a large-scale small- molecule sensitivity dataset. Cancer discovery 5, 11 (2015), 1210–1223. [16] Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han. 2017. Automated Phrase Mining from Massive Text Corpora. arXiv.org (Feb. 5