Predicting Drug Responses by Propagating Interactions through Text-Enhanced Drug-Gene Networks
Shiyin Wang Massachusetts Institute of Technology [email protected]
ABSTRACT response analysis. At that time, researchers have already cast inter- Personalized drug response has received public awareness in recent ests in the variation of personal drug response. However, because years. How to combine gene test result and drug sensitivity records of the limitation of data collection and analysis models, precision is regarded as essential in the real-world implementation. Research medicine did not arouse widespread concern until these decades[12]. articles are good sources to train machine predicting, inference, Recent years, integrating big data, including clinical data, genetic reasoning, etc. In this project, we combine the patterns mined from data, genomic data, intervention history, etc., has received expecta- biological research articles and categorical data to construct a drug- tions for facilitating characterization of side effects and predicting gene interaction network. Then we use the cell line experimental drug resistance. records on gene and drug sensitivity to estimate the edge embed- Network Science results are applied to the data of gene expres- dings in the network. Our model provides white-box explainable sion profiles by integrating personalized data to predictive records predictions of drug response based on gene records, which achieve by similarity analysis, which reveal the hidden correlations of un- 94.74% accuracy in binary drug sensitivity prediction task. observed patterns[11, 24]. Intuitive speaking, people sharing sim- ilar genetic patterns and clinical treatments tend to show similar CCS CONCEPTS drug responses. Network-based approaches reveal potential drug- biomarker correlations effectively for some diseases. However, cur- • Information systems → Retrieval tasks and goals; Informa- rent methods in this aspect are limited in the correlation modeling tion extraction; of the network, and the data accessed is limited in the structured data. KEYWORDS Data Mining communities have been trying to discover, identify, network science, data mining, drug response structure, and summary relationships between biological entities through various text mining techniques across open-access bio- 1 INTRODUCTION logical research publications[17, 19, 20, 22]. One of the drawbacks Precision medicine depends on discovering the correlations and ca- of this approach is the accuracy of the results is not high enough sualties between observed data(gene, past clinical records, etc) and to play a deterministic role in the decision process. Instead, they unobserved future performance(side effects, drug effects, immune are provided to the human experts as an additional reference to response, etc). Categorical relations have been collected from experi- fasten the knowledge discovery process. On the other hand, Re- ments, while more complex meta-pattern information is mined from cursive Neural Networks has shown its power to deal with text scientific publications. To bridge the gap between these two types data[9], demonstrating considerable success in numerous tasks such of data collecting sources, we propose a relation embedding method as image captioning[7, 14, 26] and machine translation[2, 5, 10]. to expand the representation space of the relationships between This technique can be applied to convert text meta-patterns into genes and drug responses. To the best of my knowledge, this is the trainable embeddings to represent the relations among entities. first approach to combine text patterns into the existing network. Nodes in the network represent to drug responses(sensitivities, side 3 MODEL effects, etc.) and genes. In this section, we make definitions and formalize the problem. The arXiv:1906.08089v1 [cs.SI] 19 Jun 2019 The model pipeline consists of two parts. First, we construct a whole project is consist of two parts: constructing network skeleton relation network of gene and drug response from existing format- and inference edge embedding. ted data sources. In the same time, we mine data from scientific In the first part, we first collected all the entities Definition. 1 publications automatically, retrieving meta-patterns indicating the from multiple data sources. Then we mined the categorical rela- relations between gene and drug responses. Finally, a novel graph tionships and descriptive relationships Definition.2. Patterns are convolutional neural network based method is applied to balance extracted based on the discovered entities and relationships. After the information collected in the two types of data sources. that, meta-patterns are extracted from descriptive patterns to be a high-level representation of entities relations. Regarding entities 2 RELATED WORKS as nodes in the network, we add edges between two nodes if they Drug discovery plays an essential role in the improvement of hu- appear to have either structured or descriptive relationships. man life[8] as the pharmacology had become a well-defined and Definition 1 (Entity). Entities, denoted by e, refer to the names respective scientific discipline. Dating back to the 20th century, of chemicals, genes, or diseases which can be indexed by a MESH id or researchers across distinct disciplinary areas, including analytical Entrez id. For example, SAH is a disease entity with MESH id D013345. chemistry and biochemistry, demonstrated the values for the drug Definition 2 (Relation). Relationships, denoted by r, represent Algorithm 1 Representation Learning the latent interactions between two entities ei , ej . Relations can be for Epoch 1, 2,..., N do either categorical(such as “blocker", “binder") or descriptive(such as for Iteration 1, 2,...,T do “With the increase of ei , there is a significantly drop of ej expression Calculate vˆi = G(vi ,bi , λi , Rei,ej ) by Equation. 1 level"). Calculate L by Equation. 5 Update v ← v − α ∂L Definition 3 (Pattern). The pattern, denoted by (r, ei , ej ), is i i ∂vi ∂L a tuple of one relationship and two entities. Patterns are categorical Update bi ← bi − α ∂bi or descriptive concerning the type of the corresponding relationships. ∂L Update λi ← vi − α Generally, patterns are accessed by directly parsing datasets. Meta- ∂λi end for Patterns are the simplified version of descriptive patterns. Calculate vˆi = G(vi ,bi , λi , Rei,ej ) by Equation. 1 In the second part, we estimate the representations of nodes Calculate L by Equation. 5 ∂L and edges Definition. 4. We model the estimated representation of Update R , ← v − α ei ej i ∂R , v is calculated by the average neighbor entities to multiple edge ei ej i end for representations as Equation. 1. Definition 4 (Edge Representation and Node Representa- tion). Edge representation, denoted as Rei,ej ∈ M, is the quantified version of relationship defined on a function family. In this project, and gene druggability information from 30 different sources. Figure ?? shows the frequency of drug-gene interaction types in DGIdb, because of the computation limitation, we set M = Rk×k to be a k showing a long-tailed distribution. The inhibitor occurs 9982 times matrix of size k × k. Then nodes are represented as a vector vi ∈ R , in DGIdb, which is 40.79% of the whole interaction records. The bias bi , and scale λi . The estimated value of vi is given by: second most frequent interaction is agonist, which occurs 5333 λ i Õ times and accounts for 21.79%. There are 30 types of relationships. vˆi = Rei,ej vj + λibi (1) |N (ei )| ej ∈N (ei ) We assume the founded structured relations are correct and precise. Having built the network skeleton, we then need to learn the 4.2 Mining Descriptive Relationship representation Rei,ej , λi , bi , vi of edges and nodes. In the datasets, researchers record TPM (transcripts per million) PubMed abstracts are a good source to obtain descriptive patterns. to quantify genes and use intensity to measure drug sensitivity. We investigated a subset of annotated PubMed data by PubTator[23]. To convert representations into a numeric format, we deploy a We selected all the sentences that contain two annotated chemicals fully connected neural network with one hidden layer of size 2 and or genes. The total size of the file is approximate 2 gigabyte, which sigmoid activation. is too complex to take the original long sentences into the model. The first reason is that many patterns may occur a few timesif ui = S(vi ) (2) we use sentences to model the relations directly. Secondly, our We define the loss function as: computation power does not have enough memory to process large 1 Õ graphs. Therefore, we further process our descriptive patterns to L = (uˆ − u )2 (3) дene N i i short representations, which we call meta-patterns. дene i ∈Gene 1 Õ 2 Lchem = (uˆi − ui ) (4) 4.3 Extract Meta-Patterns Nдene i ∈Chemical For decades, numerous biological researchers describe their find- L = Lдene + βLchem (5) ings on interactions between chemicals, drugs, genes, etc., by natu- ral languages. The adequate biological research literature provide 4 IMPLEMENTATION a good source to extract the information using information re- 4.1 Collecting Structured Data trieval methods[1, 16, 18]. We collected 3614796 abstracts from PubMed and used the PubTator dataset to discover all the named Drug-Gene interactions are well studied in the past biological re- entities in chemicals, diseases, and genes. Meta pattern[1, 21] is searches, such as CancerCommons, CF:Biomarkers, CGI, NCI, Drug- defined as a frequent, informative, and precise subsequence pat- 1 Bank, PharmGKB, DoCM, etc. SIDER provides adverse drug re- tern in certain context. The first step of meta-pattern discovery is actions (ADRs) representing drug responses, which contains 1430 context-aware segmentation, which estimates the contextual bound- drugs, 5880 ADRs and 140,064 drug-ADR pairs. The Cancer Thera- aries of meta-patterns. We use the method developed by Wang[21], 2 peutics Response Portal (CTRP) links genetic, lineage, and other which extracted relationships from a subset of the abstracts on the cellular features of cancer cell lines to small-molecule sensitivity PubMed[6] with the semi-supervise from CTD database3. For ex- with the goal of accelerating the discovery of patient-matched can- ample, the meta-patterns in this sentence from a PubMed abstract cer therapeutics. In this paper, we use the drug-gene interaction are shown in the Figure. 1 database (DGIdb)[4, 13, 15], which contains drug-gene interactions
1sideeffects.embl.de 2portals.broadinstitute.org/ctrp/ 3http://ctdbase.org 2 Figure 1: An example of pattern and meta-pattern extraction process from a sentence in a PubMed paper.
Patients with aneurysmal subarachnoid hemorrhage (SAH) often develop patients with h_1 hydrocephalus requiring an external ventricular drain (EVD).
often develop h_2 MESH:D006849 MESH:D013345
Patients with aneurysmal subarachnoid hemorrhage (SAH) often develop hydrocephalus requiring an external ventricular drain (EVD). requiring an external ventricular drain (EVD) h_3
aneurysmal subarachnoid hemorrhage (SAH) develop hydrocephalus A develop B h_meta
Table 1: Datasets summary 4.5 Inference Individual Drug Responses To determine the parameters in the model, we acquire cell line Dataset # Gene # Drug # Relations Format experiment records of 58035 gene expression on 934 human cancer DGIdb 36815 9370 42727 Categorical cell lines [3] and 3723759 drug sensitivity test records on 1065 cell PubTator 36815 9370 42727 Descriptive lines[25]. We assume the same cell lines perform similarly in these PubMed4 2575 6199 10530 MetaPattern two data sources. Then we can align these two data sources by cell RNA-seq[3] 58035 - - CL Records line names. The data summary is shown in Table. 1. GDSC[25] - 16 - CL Records Because we have to use experimental data to train the represen- Skeleton 335 587 3321 - tations, we need to shrink the graph, deleting useless nodes. We pick the nodes from observable entities(red) with distance 1 or 2 to the observable entities. The results are shown on Figure. 5a, 5b, 5c, 5d. 4.4 Linking Categorical Relations and We set the embedding dimension to be 4. The learning rate of Descriptive Relations gene optimizer is 0.1 so that it can converge quickly in 1. The One of the challenges in this project is the entity linking across learning rate of chemical is 0.001. multiple datasets, which refer to the alignment of entity mentions to We have done experiments on the network of Figure. 5b. The its entity id. This process is very tedious because of the messy index binary classification accuracy is 91.15%. Because network data is format. For example, chemicals(drugs) have CID, PMID, Chembl, not structured, it is hard to draw a ROC or some other training Cas Number, etc. We use the entity name as a validation criterion. visualization. Though the same entity may have different names in different data sources, those names are similar with high probability. Therefore, we can measure the correctness of our entity linking algorithm by 5 EVALUATION comparing merged names. The summary of all the data sources used in this project are listed in Table. 1 2. We test our model on the extracted 539 drug-gene records linked by cell-line. The data contain test records of 7 drugs and 21 genes, and about 42.12% cells are missing. The inadequacy of data limits Table 2: Datasets the performance of complex models. We compare with Logistic Regression and Support Vector Machine. We use sklearn package Dataset URL for Logistic Regression, with the L1 penalty and max interation 10000. We use radial basis function kernel in SVM model with auto DGIdb http://www.dgidb.org gamma. Our method outperforms these two baselines. PubTator https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/ Our model can be explainable because it is based on the repre- Demo/PubTator sentation of categorical and descriptive relations. Human experts PubMed https://www.ncbi.nlm.nih.gov/pubmed can guide the model by changing the network skeleton. Therefore, RNA- https://www.ebi.ac.uk/gxa/experiments/E- it can potentially avoid the risk of black-box decision and provide seq[3] MTAB-2770/Results an interpretation of its decisions. GDSC[25] https://www.cancerrxgene.org
3 Figure 2: The visualization of all mined interactions among genes and drugs. The label of the genes are their Entrez ID. The label of chemicals are their names. (# drugs=587, # genes=335, # interactions=3321)
argatrobanGENE_2147 ximelagatran GENE_1111 gw501516 lepirudin bivalirudin obeticholic_acid disulfiram GENE_217 folic_acid
menadione enoxacin GENE_9971 GENE_5467 GENE_316 GENE_488 heparin lubiprostone GENE_2348 GENE_1181
guggulsterone lanosterol tretinoin GENE_1524 GENE_462 capecitabine GENE_5916 fenfluramine tranylcypromine GENE_5604 adapalene GENE_9002 GENE_5915 selegiline GENE_5606 rasagiline tazarotene GENE_5914 GENE_5599 GENE_7298 GENE_1558 GENE_3708 isotretinoin GENE_4128 GENE_7166 GENE_1573 leucovorin GENE_5338 GENE_4129 GENE_1312entacaponelansoprazolebaclofen hexamethoniumtropisetron tamibarotene hexane dicumarol GENE_7535 thioperamidetriprolidinenisoxetine GENE_3643 tofacitinib sperminedesloratadineloratadine GENE_6850GENE_1612GENE_6416varenicline tripelennamineacarbosetianeptinequineloranemetoclopramidebupropion dizocilpine pramipexolemethoctraminecitric_acidfluorouracil diphenhydramineketotifen arecolinebethanechol GENE_6788tolvaptan finasteride phenelzinedihydroergotaminesibutramineGENE_268 scopolaminebrucinealcuronium oleic_acid GENE_478 fexofenadine pirenperoneecopipampalmitic_acidhydrochloric_acidnitroprusside GENE_5608 GENE_5601 GENE_3716 fluvoxamine solifenacinvincamineacetohexamide meclizine echothiophate doxepindapoxetinevenlafaxine ropinirole cisapride GENE_3717 trifluoperazine terfenadine oxotremorineGENE_3620 GENE_3718 acetonesertraline GENE_6531tolcaponepirenzepineGENE_9 strychnine allylestrenol GENE_590 GENE_1138promethazinenafadotrideGENE_59340 oxytocin GENE_6258 GENE_4160physostigmine GENE_146GENE_6257 astemizolemazindolGENE_6580 GENE_7297 lurasidonefluphenazinequetiapineGENE_3756zotepineloxapinedesipramineepibatidineGENE_3767 GENE_1577 digitoxin GENE_3768 formaldehydepyrilamine cyproheptadine GENE_1075 GENE_6833 GENE_163702 sumatriptanpaliperidoneziprasidonefolinic_acidsertindole staurosporineGENE_1137GENE_79001 GENE_8850rivastigmine GENE_846GENE_3269methysergideGENE_6530 levodopaoxybutynin pilocarpine GENE_3290lopinavir GENE_5734 GENE_3362thioridazineamitriptylineGENE_1813quinpirole darifenacin GENE_3710 phentolamine racloprideGENE_1128tolterodineGENE_554 GENE_6536 tacrine ergotaminemetergolinechlorpromazineGENE_6532histamineGENE_1814methamphetaminecitalopram methimazoleGENE_728 econazoleparoxetinesarpogrelateclomipramineGENE_12aripiprazoleGENE_1815 coenzyme_aGENE_9970corticotropinGENE_43 treprostinilGENE_6256 norfluoxetine olanzapineterguride glyburidearachidonic_acid plumbagin octanol nortriptylinesulpiriderisperidoneGENE_3358apomorphineperospironetrazodoneserotoninacetylcholinevasopressinreserpine GENE_481 GENE_55244magnesiumprocainamidemianserinmaprotilineGENE_3356 ketanserinmethylphenidatetolbutamideGENE_3757 garcinolGENE_2268 doxazosinacetic_acid clozapine tryptophan malathion carbenoxolonemetronidazoleGENE_2707 metoprinerauwolscineritanserinGENE_147GENE_1816doxylamine GENE_7054 GENE_3055 GENE_5731GENE_2709dihydrotetrabenazine GENE_3350haloperidolfamotidineimipramineprazosin_hydrochlorideGENE_1576salicylic_acidethacrynic_acidritonavirdonepezilGENE_7173GENE_3172GENE_7525 GENE_2697 ca2 GENE_148amphetamine phenylarsine_oxide GENE_6714 hydrochlorothiazide GENE_3783bexarotene clonidine GENE_3064GENE_25GENE_640 GENE_6559GENE_5732GENE_2701 pindololGENE_7226GENE_150GENE_3357dopamine rottlerinGENE_338442 GENE_2706riluzole atropinenicotinewarfarin GENE_5591GENE_4067 GENE_6716 cimetidinenoradrenalinefluoxetine GENE_2033GENE_4158mevastatinGENE_7225 amprenavir terbutaline_sulfateGENE_59341iloprost clenbuterolprazosinmiconazolenefazodonephenylephrine streptozotocin GENE_55107modafinil yohimbinebuspironeGENE_6571GENE_240GENE_5295 parathion lamotriginepractolol GENE_155 GENE_5293GENE_5294GENE_2475GENE_5590GENE_5296dasatinib GENE_1436GENE_613 GENE_839 bendroflumethiazideabacavirpyrimethamineranolazinechlorthalidone mirtazapine clotrimazoleGENE_5291linoleic_acid GENE_7498 GENE_23476 flufenamic_acidtolmetinGENE_759 diazepamGENE_5290rosuvastatin GENE_5587GENE_5289GENE_27 vemurafenibstavudine GENE_24145 carvedilolepinephrinecarbamazepineketoprofenzonisamideniacin tyrosinelovastatinGENE_5579bosentanimatinib_mesylatethymidineGENE_1910 GENE_1235 acetazolamidedobutaminechlorzoxazoneGENE_6915 aspirinmelatoninpropylthiouracilwortmanninamoxicillinGENE_5580 GENE_834nelfinavir dabrafenib terutrobanfumagillinflecainideGENE_84649alprenolol GENE_771propranololmesalamine GENE_5578cerivastatin GENE_673GENE_1909 isosorbide_dinitrateGENE_2492 pirinixic_acidmetoprololGENE_153acetaminophen quercetinGENE_3156allopurinolmercaptopurineenalaprilat macitentan suraminprocaterolfenofibric_acidverapamilgabapentinGENE_6261 fenofibraterosiglitazoneisoniazidtelmisartan imatinib tetrodotoxin GENE_4544isoprenalineGENE_6331niflumic_acidGENE_5465ibuprofenGENE_1080 losartan erlotinibGENE_10663 GENE_760ruthenium_red GENE_154 phenytoin amiodaronemethohexitalhyperforin GENE_2066 dantroleneGENE_1180xylazinenaveglitazarGENE_5743caffeineGENE_5468pioglitazonegallic_acidpravastatinoxazepam cilazaprilmidostaurin GENE_5155 topiramate bisoprololnifedipineindomethacin apigenin GENE_5159nilotinib oxcarbazepinealbuterol ryanodineGENE_1719ethinyl_estradiolmetformin phenobarbitalsimvastatingemcitabineetoposidegefitinibGENE_3815GENE_2065GENE_837 endothelin_2 mibefradil flurbiprofenGENE_2166diclofenactroglitazonemethotrexateangiotensin_iiflavoneenalapril sitagliptinpazopanib GENE_5154 GENE_2890 olodaterollumiracoxibnaloxonediflunisal testosteronegemfibrozil galanginvalproic_acidGENE_5156 icotinibsunitinib phosphoenolpyruvate GENE_2931 GENE_29110 procaineoleoylethanolamideaconitineGENE_3784diclofenac_sodium atenololcapsaicin sulfasalazineclofibrategenisteincurcuminvincristinechrysincandesartanimidapril GENE_2321 rimonabant mephenytoin naproxennimesulideketoconazolecelecoxibnebivololtamoxifenresveratrolbenzbromaronesirolimusramiprilpirfenidone betaxololGENE_2562 bezafibratemorphinedoxorubicinatorvastatin sorafenib GENE_5894 GENE_9290 lidocaineoxaprozin GENE_3791GENE_1956paricalcitol GENE_5979 GENE_3551primidoneadenosine_triphosphateGENE_775ciprofibratezileutonciglitazoneraloxifenediethylstilbestrolpaclitaxelrifampicin everolimus GENE_207GENE_2324 cannabidiol isoflurane fenoterol piroxicamrofecoxibdexamethasone colchicineGENE_203068 GENE_9734 dutasteride irinotecancholesterolvinblastineGENE_4023 GENE_2064GENE_1636GENE_4363 GENE_2260 salmeterol terbutalinephenylbutazonespironolactoneestradiol GENE_6241GENE_9759 lithium_carbonate GENE_7442bardoxolone_methylhalothanepregnenolonetheophylline GENE_185arsenic_trioxidelapatinibclofarabineGENE_5244GENE_1543GENE_55869nitroglycerin felbamatedesfluranediltiazem GENE_6554hydroxyflutamidefluticasonesulindacbisphenol_adaidzeinbortezomibzoledronic_acidGENE_2101GENE_2322 hydroxyureasulfinpyrazone lorazepam formoterolmexiletineflutamidemeloxicam mitoxantrone perindoprilGENE_8841 GENE_1803 vildagliptinGENE_116085 anandamide sevoflurane GENE_2852progesteroneGENE_135aldosteroneestrone docetaxelGENE_4233lonidamineperifosineGENE_3066 GENE_1268GENE_2902 probenecidnimodipinetoremifenepentobarbitalGENE_2099 anastrozolethalidomideGENE_5950captopril GENE_3091 GENE_3459triazolam GENE_238 GENE_367fulvestrantGENE_120892estriolequolGENE_28234 calcitriol GENE_6774 GENE_2149capsazepinebicalutamidefelodipineGENE_6573 hesperetinGENE_2103GENE_7153valsartanGENE_3065GENE_1545benazepril isradipineresiniferatoxincodeineGENE_4988GENE_4306hydrocortisoneGENE_10599eplerenone letrozole methadone GENE_6579 GENE_2932 GENE_9356 GENE_729230 montelukastbetamethasonerimexolone ezetimibelithocholic_acid GENE_7421trichostatinapicidin piperine propofolneca GENE_2908 GENE_10381sodium_butyrateGENE_10376GENE_4843 azacitidine GENE_9817montelukast_sodiumprobucol pyrazinamide GENE_10000 GENE_55867GENE_2936sorbinil medroxyprogesterone_acetatecyproterone_acetateraloxifene_hydrochlorideidoxifeneGENE_81027aminoglutethimidenaringinGENE_596GENE_5705GENE_50484sildenafilcalcitoninbelinostat methyltestosterone GENE_5682 GENE_10057 GENE_2740 mibolerone danazol GENE_7124 vorinostatlinagliptin decitabine ticlopidine GENE_5027GENE_9376GENE_4598fluticasone_propionatenandrolonebazedoxifeneirbesartan pentoxifyllineGENE_581GENE_5700GENE_208 GENE_1555 GENE_10498beclomethasone_dipropionatebudesonideGENE_9636mifepristonedesmosterolGENE_2159furosemide GENE_7157GENE_5698GENE_1588fludarabine amylin cephalothin testosterone_propionateacolbifenedrospirenoneGENE_4791glycyrrhizincyclosporine GENE_887tramadollasofoxifenechlorotrianiseneGENE_2100GENE_1586 GENE_7296 GENE_10013GENE_1788 enzalutamidediarylpropionitrile GENE_2357 GENE_2645 cholecalciferol naloxone_hydrochloridedadlesufentanilestradiol_valerateGENE_5241hexestrolmitotanemetyrapone exemestane ebselen fentanylprednisoloneolmesartan lisinopril GENE_1786 bicarbonate naltrexone GENE_1017GENE_1642GENE_2224GENE_7422 GENE_231 triamcinolone_acetonidegeneticinGENE_1021GENE_1019GENE_1022 GENE_19GENE_1240GENE_10800naltrindolemorin GENE_8694 1400w GENE_2833 mometasone_furoateflunisolide adenosineprednisone GENE_79885 inositol carmustine methylprednisolone GENE_151306GENE_5478carfilzomib GENE_5347 farnesol GENE_8647GENE_3579 GENE_1583GENE_29881doxycycline GENE_23410 GENE_213 albendazole dioscin chelerythrine GENE_3577 mebendazole GENE_1234 niclosamide GENE_1584GENE_9429fadrozole deoxycholic_acid juglone levonorgestrelGENE_7376GENE_10062GENE_2859octopamine pentagastrin norgestimate norethindronedydrogesteronenorethynodrel sodium_lauryl_sulfate GENE_1232 betulinic_acid terameprocol GENE_332 cholic_acid GENE_5540 inosine GENE_4316GENE_1585 guanine GENE_23411 GENE_22933 naphthalene bumetanide hypoxanthine marimastat splitomicin GENE_142 GENE_308 GENE_6093 GENE_4313 sodium_stibogluconate methionine_sulfoxide GENE_5777 GENE_7852 GENE_4312 fasudil neurotensin GENE_4923
hydroxocobalamin iodoacetamide acetohydroxamic_acid GENE_4594 GENE_5609 GENE_59350 GENE_9475 itraconazole GENE_3973
benserazide leflunomide cilastatin GENE_1644 GENE_1723 GENE_1800
(a) Display entity names (b) Not display entity names
Figure 4: 28 entities are contained in cell line records, which are colored with red. We extracted them and their neighbors within distance k. Then we pruned unimportant unobserved entities.
GENE_4160 nilotinib formaldehyde octanol imatinib finasteride promethazine carbenoxolone corticotropin GENE_5159 terfenadine GENE_5979 dutasteride sertraline nelfinavir GENE_4594 GENE_9429 GENE_9 hydroxocobalamin GENE_673 venlafaxinefluvoxaminedapoxetine oxytocin GENE_5979 sumatriptan GENE_5156 GENE_5478 pirenperoneecopipam letrozole sulpiride GENE_6716 GENE_3791 GENE_3756tolcapone norfluoxetine thymidine GENE_2322 GENE_3710 procainamide folinic_acidquetiapine palmitic_acid thymidine GENE_3791 sorafenib GENE_3815 GENE_6580 ergotaminezotepineziprasidone GENE_2321 sunitinib GENE_59340 metergolinefluphenazinepaliperidonepyrilaminelurasidone loxapine GENE_24145GENE_2709GENE_2707GENE_2697GENE_2706 GENE_5894 GENE_3362 GENE_146 GENE_2701 sorafenib desipramine GENE_163702 perifosine GENE_846 GENE_3269methysergide cyproheptadine vasopressin paroxetineamitriptyline chlorpromazinesertindole rauwolscine irbesartan pazopanib GENE_4544 finasteride imatinib levodopa sarpogrelate pazopanibsunitinib GENE_2260 sufentanil clomipramine trazodonethioridazine GENE_6532 quinpirole phentolamine hydroxyurea GENE_338442 donepezilnortriptyline ritanserin terbutaline_sulfate fluticasone_propionate GENE_5155 GENE_2322 octanol GENE_6530 olanzapine GENE_12 GENE_338442GENE_6261 nilotinib GENE_5156 histamine GENE_1813apomorphine olodaterol xylazine terguridemianserin naloxone budesonide GENE_6716 cyclosporine GENE_1815GENE_1814GENE_3358 acetic_acid GENE_5734 albuterol methylphenidate raclopride fentanyl ca2GENE_2562 metoprine GENE_3356 doxazosin ca2 GENE_5159 GENE_2260 GENE_2707 GENE_554 serotonin risperidone GENE_3815 GENE_2321 GENE_2324 fenoterol beclomethasone_dipropionatecyclosporine GENE_5154 amitriptyline GENE_6554 GENE_1075 terbutaline_sulfate formoterol GENE_2357 GENE_2324 GENE_2706 GENE_3783 acetylcholine formaldehyde GENE_554 mercaptopurine carbenoxolone imipramine maprotilinehaloperidol GENE_3350methamphetamine GENE_6554salmeterolflufenamic_acid GENE_1816ketanserinGENE_150GENE_147pindolol GENE_59341GENE_55244 erlotinib GENE_2697 GENE_1128 clozapine procainamide GENE_3579 GENE_3357 betaxololGENE_3783 GENE_185 niacinecopipam valproic_acid GENE_2701 GENE_148 flutamide GENE_5731 iloprost zotepine loxapine ergotamine tryptophan clenbuterol fumagillin fluphenazine olanzapine fluoxetine practolol alprenolol nebivolol idoxifene fludarabine palmitic_acid propofol GENE_7226 GENE_55107 erlotinib GENE_2709 famotidine prazosin dobutamine GENE_7226 thioridazine GENE_3577 GENE_6331 melatonin procaterol meloxicam glycyrrhizin sumatriptan dopamineclonidine noradrenalinecimetidine toremifene medroxyprogesterone_acetate loxapine metergoline miconazole diazepamnefazodone yohimbine alprenolol treprostinil GENE_846 metoprolol ethacrynic_acid GENE_3362methysergide GENE_81027 amphetamine phenylephrineepinephrine practolol GENE_2852 GENE_7153 GENE_2357 salicylic_acid buspirone mirtazapine carvedilol vasopressin GENE_2908 GENE_154 quinpirole GENE_10376 telmisartan diflunisalterbutalinefluticasone fluphenazine GENE_2166 pyrimethamine clenbuterol GENE_55107 GENE_6241 treprostinil flufenamic_acid carbamazepine propranololtolmetin dobutamine isoprenaline pregnenolone GENE_3362 GENE_6261pentobarbital zonisamide atropine metoprolol albuterol clomipramine caffeine GENE_2099 fulvestrant chlorpromazine fluoxetine GENE_10381 GENE_24145 GENE_6571 nortriptyline phenylbutazone docetaxel lurasidone GENE_1816 GENE_10599 GENE_5293 GENE_153GENE_154 ethinyl_estradiol estriol estradiol GENE_155 GENE_84649irbesartan paroxetine maprotiline sulindac GENE_28234 xylazinemethamphetamine cyproheptadinehaloperidol testosterone GENE_5291 zoledronic_acid felodipine methysergide haloperidolserotonin clotrimazole rosuvastatin niacin GENE_153 metforminnaproxen gemcitabine thioridazine niflumic_acid GENE_5290 olodaterol sertraline atropine aldosterone lonidamine dopamine risperidone colchicine propylthiouracil aspirin GENE_1719 prazosin_hydrochloride angiotensin_ii raloxifene caffeine acetaminophen GENE_5294 isoprenaline oxytocin clotrimazole GENE_2740 ergotamine mianserin glycyrrhizin propranolol clozapineserotonin metergoline clozapine tyrosine iloprost GENE_5731 venlafaxine atenolol doxorubicinbisphenol_avincristineestrone GENE_1956 clonidine olanzapineamitriptyline resveratrol hyperforinethacrynic_acid carvedilol GENE_155 GENE_5468 phentolamineapomorphine GENE_203068 aldosterone atenololGENE_6915 pindolol ibuprofen nimesulide hydrocortisoneGENE_10599 ritanserin ibuprofenquercetin GENE_5743 telmisartan phenylephrine niacin progesterone octanol chlorpromazine gemcitabine ethinyl_estradiol losartan betaxolol GENE_6530 GENE_367 ketanserinmianserin cyproheptadine lithocholic_acid phenytoin GENE_1180 imipramine losartan mitoxantrone fenoterol quinpirole ca2 GENE_3357 indomethacin trazodone estradiol GENE_28234 isoprenaline chrysinGENE_3156 simvastatin reserpine GENE_5732 GENE_6532 irinotecan phenylephrine ritanserinrisperidone melatonin phenobarbitalmorphine testosteronetroglitazone nifedipine doxazosin methylphenidate epinephrine diclofenac GENE_203068 ketanserin paclitaxel nebivololsalmeterol fluoxetinetryptophan indomethacin sodium_butyrate isradipine noradrenaline GENE_338442 aspirin GENE_613 GENE_6579 celecoxib spironolactonerofecoxib olodaterol GENE_150 rifampicin methotrexatediclofenacgemfibrozil diflunisal amitriptyline GENE_1128 equol icotinib procaterol norfluoxetine apomorphine methotrexate pregnenolone GENE_2475bisoprololfenoterol arsenic_trioxide GENE_81027 GENE_1816 GENE_27 dasatinib cholesterol GENE_9 desipramine noradrenaline tamoxifen daidzeinlapatinib clenbuterol GENE_147rauwolscine niacin prazosin_hydrochloride tamoxifen spironolactonecelecoxib angiotensin_ii GENE_338442 rauwolscine buspirone GENE_5743 gallic_acid GENE_3357 trazodone rauwolscine GENE_55869 naproxen GENE_185 fluvoxamine etoposide terbutaline GENE_2709 GENE_1719 progesterone nimesulide GENE_5465fluticasone ergotamine testosteronetroglitazone diethylstilbestrol GENE_2697 terguride pyrimethamine apigenin terbutaline formoterol GENE_3459 dexamethasone paclitaxel terbutaline_sulfate epinephrine sarpogrelate irinotecan GENE_25GENE_8841 diethylstilbestrol dexamethasone mirtazapine GENE_10381 dobutamine terguride fluoxetine yohimbine vinblastine hydroxyflutamide genistein melatoninniflumic_acid piroxicam ritonavir pindolol ritonavir gallic_acid ketoconazole piroxicam metergoline ritanserin prazosin clonidine curcumin yohimbine acetic_acid vincristine isoniazid tamoxifen bezafibratemetformin serotonin cholesterol GENE_4160 formoterol GENE_3066 valproic_acidamiodarone resveratrol zileuton GENE_3756 methysergide aspirin gefitinib methamphetamine GENE_147 docetaxel capsaicin nefazodone resveratrol GENE_154 folinic_acid GENE_5468 GENE_775 sarpogrelate GENE_3350GENE_3357 genistein vinblastine clofarabine GENE_1585 dopamine GENE_6573 estradiol miconazole GENE_10376GENE_775 salmeterol GENE_153 prazosin GENE_1436GENE_9759 raloxifene irinotecan GENE_3356 bortezomib flufenamic_acid bisphenol_a GENE_1816 methotrexate fumagillin bisoprolol carvedilol GENE_3815 clofibratecurcumin dutasteride ziprasidone GENE_147 rifampicin anastrozole fadrozole doxazosin carbenoxolone phentolamine GENE_150 doxorubicin GENE_5159 gemcitabine fenofibric_acid isradipine dapoxetine risperidone salicylic_acid GENE_9734 etoposide meloxicamflutamide quetiapine GENE_3358 GENE_148 morphine arsenic_trioxide GENE_2099 GENE_367 sumatriptan folinic_acid acetylcholine quercetin prazosin_hydrochloride imatinib_mesylate GENE_5156 thalidomide paclitaxeldoxorubicin zotepine gemfibrozil corticotropin fluticasone mirtazapine GENE_3064imatinibnilotinib rofecoxib colchicine olanzapine betaxolol metoprolol etoposide gefitinibhydrocortisone sulindac ciprofibrate thioridazineGENE_150 hydroxyflutamide thalidomide GENE_4843 atenolol aspirin prazosin daidzein fluticasone_propionate haloperidol bezafibrateapigenincolchicine GENE_1588 albuterol mirtazapine indomethacin dabrafenib phenylbutazoneGENE_2908 terguride tyrosine doxazosinpindolol GENE_2740 GENE_3362apomorphine carbamazepine nifedipine GENE_207 testosterone medroxyprogesterone_acetateestronemercaptopurine paliperidone acetaminophenpropylthiouracil everolimus noradrenaline GENE_673 bortezomib vinblastine neca felodipine dopamine isoniazid GENE_3065 fenofibrate sirolimus clozapine simvastatin fulvestrant budesonide yohimbine chrysin GENE_4158 nebivololalprenolol propranolol phenylephrine equol arsenic_trioxidemitoxantronetoremifene chlorpromazineGENE_12quinpirole beclomethasone_dipropionate ketanserin sirolimus clonidine terfenadinelurasidonecyproheptadine clofibrate GENE_4023 GENE_3065 colchicine epinephrine sodium_butyrate lapatinib GENE_2852 calcitriol sertindole tolmetin perifosine tamoxifen estradiol trichostatin pirfenidone estriol vincristineGENE_120892 thymidine GENE_3269 GENE_2322 carvedilol nimesulide pazopanibsunitinib GENE_1813 phenobarbital indomethacin GENE_10376 anastrozole GENE_10381 fluphenazine rosuvastatin practolol GENE_7153 vemurafenib GENE_3791 phentolamine mianserinGENE_1814 famotidine ketoconazolecapsaicin amiodarone albendazole bicalutamideGENE_10376 GENE_1815 hyperforin GENE_4023 GENE_7153 furosemide GENE_5294 letrozole exemestane bisoprolol hesperetin everolimus bisoprololacetic_acid diazepam phenytoinGENE_3156 gemcitabine GENE_81027 sorafenib loxapine prazosin_hydrochloride GENE_10381 GENE_5155 idoxifeneclofarabine GENE_203068GENE_81027 promethazine GENE_5291GENE_2475 propranolol GENE_4843 GENE_9636 resveratrol isoprenaline GENE_153 midostaurinperifosine nelfinavir GENE_6554 paclitaxel raloxifene_hydrochlorideperindoprildocetaxel histamine amphetamine GENE_5290 aminoglutethimide GENE_5154 GENE_5293zonisamide fenofibrate zoledronic_acid methamphetamineraclopride irinotecan mebendazole clenbuterol GENE_154 valproic_acid nimesulide atenolol GENE_2260 GENE_2322aminoglutethimide lonidamine GENE_6241 pirenperone GENE_5465 ciprofibrate GENE_3791 trichostatin GENE_1585 GENE_1719 GENE_203068sirolimus dobutamine sulpiride GENE_6571 cyclosporine doxorubicin fadrozole nelfinavir GENE_4158 vincristine metoprolol GENE_2321 letrozole erlotinib bicalutamide GENE_9734 GENE_5894 GENE_1956 GENE_5156 practolol GENE_2859 GENE_6331GENE_1180 gemfibrozil arsenic_trioxide everolimus albuterol hydroxyurea finasteride tolcaponeGENE_59340levodopa GENE_5159 etoposide GENE_5979 icotinib GENE_4313 procainamide cimetidine GENE_6915GENE_2166 GENE_3815 sorafenibGENE_8841 irbesartan vinblastine GENE_2324 GENE_207 pyrilamine sunitinib GENE_10599 methotrexate alprenolol mebendazole dasatinib erlotinib GENE_7296 xylazine nilotinib docetaxel formoterol GENE_1588 midostaurin mercaptopurine GENE_10000 pentobarbital GENE_9759 salmeterolterbutaline GENE_6554 fludarabine albendazole GENE_7153 betaxolol GENE_5734 metoprine GENE_2260 glycyrrhizin zoledronic_acid GENE_208 amprenavir dutasteride GENE_55869 meloxicamfurosemide GENE_4594 nebivolol imatinib GENE_2324 olodaterol reserpine pazopanib hydroxocobalamin fluticasone GENE_2224 GENE_3066 GENE_5979 fenoterol octanol GENE_2562 iloprost azacitidine pyrimethamine pirfenidone GENE_1719 ca2 stavudine exemestane marimastat fenofibric_acid rifampicin corticotropin GENE_673 GENE_2321 fludarabine linagliptin GENE_4312 nitroglycerin GENE_163702 perindopril GENE_27 GENE_6716 propofol GENE_25 GENE_5894 GENE_1956 GENE_55244 GENE_1436imatinib_mesylate cyclosporine fadrozole treprostinil GENE_613 pyrimethamine GENE_2709 GENE_4312 GENE_2697 GENE_163702 marimastat GENE_4316 GENE_4160 GENE_3577 GENE_5731GENE_5732 propofol vemurafenib GENE_55244 GENE_2740 flufenamic_acid GENE_1180 isradipine GENE_10599 carbenoxolone GENE_1585 GENE_1584 sufentanil GENE_3577
(a) k=1 (b) k=1, pruned (c) k=2 (d) k=2, pruned
Table 3: Model Performance Attention Mechanism. Because of the limitation of gene-drug records, we do not apply sophisticated design in this model. Intu- Method Logistic Reg. SVM Ours itively, the drug responses will be better modeled if we allow the Accuracy 93.75% 90.14% 94.74% weight of different relations to change according to their current Explainable ✗ ✗ ✓ value.
7 CONCLUSION Research articles are excellent sources for machines to predict, infer, 6 FUTURE WORK reasoning, etc. By aggregating multiple data sources, we study and Collect More Records. The experiment records are not adequate analyze the association of drug and gene by constructing networks to test our model in scale. Patient profiles will be a good data source with categorical relations and descriptive relations. The resulted for our model. But due to the privacy restrictions, we do not have network skeleton presents the relationships between entities. We them right now. The application of our model on clinical data can model the relations as a kernel function between entities. After also help to provide precise and interpretable patient profiles. that, we use cell line experiment records to estimate the parameters 4 of our model, which achieves 94.74% accuracy on predicting the 2017). arXiv:cs.CL/1702.04457v2 drug sensitivity for cell lines. [17] Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han. 2018. Automated phrase mining from massive text corpora. IEEE Transactions on Knowledge and Data Engineering 30, 10 (2018), 1825–1837. ACKNOWLEDGMENTS [18] Jingbo Shang, Meng Qu, Jialu Liu, Lance M Kaplan, Jiawei Han, and Jian Peng. 2016. Meta-Path Guided Embedding for Similarity Search in Large-Scale Hetero- Thank Dr. Qi Li for supporting meta-pattern results on PubTator. geneous Information Networks. arXiv.org (Oct. 2016). arXiv:cs.SI/1610.09769v1 [19] Dibakar Sigdel, Vincent Kyi, Aiden Zhang, Shaun P Setty, David A Liem, Yu Shi, Xuan Wang, Jiaming Shen, Wei Wang, JiaWei Han, et al. 2019. Cloud-Based REFERENCES Phrase Mining and Analysis of User-Defined Phrase-Category Association in [1] Meng Jiang 0001, Jingbo Shang, Taylor Cassidy, Xiang Ren, Lance M Kaplan, Biomedical Publications. JoVE (Journal of Visualized Experiments) 144 (2019), Timothy P Hanratty, and Jiawei Han 0001. 2017. MetaPAD - Meta Pattern e59108. Discovery from Massive Text Corpora. CoRR cs.CL (2017). [20] Xuan Wang, Yu Zhang, Qi Li, Yinyin Chen, and Jiawei Han. 2018. Open in- [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma- formation extraction with meta-pattern discovery in biomedical literature. In chine translation by jointly learning to align and translate. arXiv preprint Proceedings of the 2018 ACM International Conference on Bioinformatics, Compu- arXiv:1409.0473 (2014). tational Biology, and Health Informatics. ACM, 291–300. [3] Jordi Barretina, Giordano Caponigro, Nicolas Stransky, Kavitha Venkatesan, [21] Xuan Wang, Yu Zhang, Qi Li, Yinyin Chen, and Jiawei Han. 2018. Open Informa- Adam A Margolin, Sungjoon Kim, Christopher J Wilson, Joseph Lehár, Gre- tion Extraction with Meta-pattern Discovery in Biomedical Literature. ACM, New gory V Kryukov, Dmitriy Sonkin, Anupama Reddy, Manway Liu, Lauren Murray, York, New York, USA. Michael F Berger, John E Monahan, Paula Morais, Jodi Meltzer, Adam Korejwa, [22] Xuan Wang, Yu Zhang, Qi Li, Cathy H Wu, and Jiawei Han. 2018. PENNER: Judit Jané-Valbuena, Felipa A Mapa, Joseph Thibault, Eva Bric-Furlong, Pichai Pattern-enhanced Nested Named Entity Recognition in Biomedical Literature. Raman, Aaron Shipway, Ingo H Engels, Jill Cheng, Guoying K Yu, Jianjun Yu, In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Peter Aspesi, Melanie de Silva, Kalpana Jagtap, Michael D Jones, Li Wang, Charles IEEE, 540–547. Hatton, Emanuele Palescandolo, Supriya Gupta, Scott Mahan, Carrie Sougnez, [23] Chih-Hsuan Wei, Hung-Yu Kao, and Zhiyong Lu. 2013. PubTator: a web-based Robert C Onofrio, Ted Liefeld, Laura MacConaill, Wendy Winckler, Michael Reich, text mining tool for assisting biocuration. Nucleic acids research 41, W1 (2013), Nanxin Li, Jill P Mesirov, Stacey B Gabriel, Gad Getz, Kristin Ardlie, Vivien Chan, W518–W522. Vic E Myer, Barbara L Weber, Jeff Porter, Markus Warmuth, Peter Finan, Jennifer L [24] Haixiu Yang, Yunpeng Zhang, Jiasheng Wang, Tan Wu, Siyao Liu, Yanjun Xu, and Harris, Matthew Meyerson, Todd R Golub, Michael P Morrissey, William R Sellers, Desi Shang. 2018. Global view of a drug-sensitivity gene network. Oncotarget 9, Robert Schlegel, and Levi A Garraway. 2012. The Cancer Cell Line Encyclopedia 3 (Jan. 2018), 3254–3266. enables predictive modelling of anticancer drug sensitivity. Nature 483, 7391 [25] Wanjuan Yang, Jorge Soares, Patricia Greninger, Elena J Edelman, Howard Light- (March 2012), 603–607. foot, Simon Forbes, Nidhi Bindal, Dave Beare, James A Smith, I Richard Thompson, [4] A. Basu, N. E. Bodycombe, J. H. Cheah, E. V. Price, K. Liu, G. I. Schaefer, R. Y. Sridhar Ramaswamy, P Andrew Futreal, Daniel A Haber, Michael R Stratton, Ebright, M. L. Stewart, D. Ito, S. Wang, A. L. Bracha, T. Liefeld, M. Wawer, J. C. Cyril Benes, Ultan McDermott, and Mathew J Garnett. 2013. Genomics of Drug Gilbert, A. J. Wilson, N. Stransky, G. V. Kryukov, V. Dancik, J. Barretina, L. A. Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in Garraway, C. S. Hon, B. Munoz, J. A. Bittker, B. R. Stockwell, D. Khabele, A. M. cancer cells. Nucleic Acids Research 41, Database issue (Jan. 2013), D955–61. Stern, P. A. Clemons, A. F. Shamji, and S. L. Schreiber. 2013. An interactive [26] Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. resource to identify cancer genetic and lineage dependencies targeted by small Image captioning with semantic attention. In Proceedings of the IEEE conference molecules. Cell 154, 5 (Aug 2013), 1151–1161. on computer vision and pattern recognition. 4651–4659. [5] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014). [6] Allan Peter Davis, Cynthia J Grondin, Robin J Johnson, Daniela Sciaky, Benjamin L King, Roy McMorran, Jolene Wiegers, Thomas C Wiegers, and Carolyn J Mat- tingly. 2017. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Research 45, D1 (Jan. 2017), D972–D978. [7] Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong He, Geof- frey Zweig, and Margaret Mitchell. 2015. Language models for image captioning: The quirks and what works. arXiv preprint arXiv:1505.01809 (2015). [8] Jürgen Drews. 2000. Drug discovery: a historical perspective. Science 287, 5460 (2000), 1960–1964. [9] Zachary C Lipton, John Berkowitz, and Charles Elkan. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015). [10] Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Effec- tive approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015). [11] Jörg Menche, Emre Guney, Amitabh Sharma, Patrick J Branigan, Matthew J Loza, Frédéric Baribaud, Radu Dobrin, and Albert-László Barabási. 2017. Integrating personalized gene expression profiles into predictive disease-associated gene pools. npj Systems Biology and Applications 3, 1 (March 2017), 10. [12] Reza Mirnezami, Jeremy Nicholson, and Ara Darzi. 2012. Preparing for precision medicine. New England Journal of Medicine 366, 6 (2012), 489–491. [13] M. G. Rees, B. Seashore-Ludlow, J. H. Cheah, D. J. Adams, E. V. Price, S. Gill, S. Javaid, M. E. Coletti, V. L. Jones, N. E. Bodycombe, C. K. Soule, B. Alexander, A. Li, P. Montgomery, J. D. Kotz, C. S. Hon, B. Munoz, T. Liefeld, V. Dan?ik, D. A. Haber, C. B. Clish, J. A. Bittker, M. Palmer, B. K. Wagner, P. A. Clemons, A. F. Shamji, and S. L. Schreiber. 2016. Correlating chemical sensitivity and basal gene expression reveals mechanism of action. Nat. Chem. Biol. 12, 2 (Feb 2016), 109–116. [14] Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. 2017. Self-critical sequence training for image captioning. In CVPR, Vol. 1. 3. [15] Brinton Seashore-Ludlow, Matthew G Rees, Jaime H Cheah, Murat Cokol, Ed- mund V Price, Matthew E Coletti, Victor Jones, Nicole E Bodycombe, Christian K Soule, Joshua Gould, et al. 2015. Harnessing connectivity in a large-scale small- molecule sensitivity dataset. Cancer discovery 5, 11 (2015), 1210–1223. [16] Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R Voss, and Jiawei Han. 2017. Automated Phrase Mining from Massive Text Corpora. arXiv.org (Feb. 5