ICBO 2014 Proceedings A Nanopublication Framework for Biological Networks using Cytoscape.js

James P. McCusker,1,3 Rui Yan,1 Kusum Solanki,2 John Erickson,1 Cynthia Chang,1 Michel Dumontier,4 Jonathan S. Dordick,2 Deborah L. McGuinness1

Abstract—We leverage semantic technologies and Cytoscape.js A. Inferring Probabilities to create a provenance-aware, probabilistic analysis platform for and evaluate its usefulness in discovering links Two molecular biology and biochemistry experts, Michel between drugs and diseases. In our efforts to create a system- Dumontier and Pascale Gaudet, assigned a score from low to atic approach to discovering new uses for existing drugs, we high confidence of 1-3, evidence and/or technique associated have developed Repurposing Drugs with Semantics (ReDrugS). with the interaction. The confidence measure was based on the ReDrugS is a data curation and publication framework that accepts data from nearly any database containing biological or comparative analyses of techniques [2], [3], and experience chemical entity interactions and produces visualizations using of the experts in reviewing data of this kind. The confidence Cytoscape.js. A semantic web service API is provided that enables assignment is based on a number of factors including degree search, traversal, and provides composite probabilities for the of indirection in the assay, sensitivity and specificity of the resulting graph of biological entities using the SADI web service approach, and reproducibility of results under different condi- framework and Nanopublications. We show how associations between a postive control, topiramate, allows us to independently tions. The confidence scores for both experts were encoded as reconstruct a positive control of epilepsy and migraine, and classes of evidence, where each experimental method class was potential consequences on bone health. assigned two superclasses, one for each expert. This ontology 1Department of Computer Science, 2Department of was created from a spreadsheet and expanded to full inferences Chemical & Biological Engineering, Rensselaer Polytechnic using Pellet [4]. At the same time, SPARQL-based reasoning Institute, Troy, NY http://www.rpi.edu is used to classify nanopublication assertions by their available 35AM Solutions, Inc, Rockville, MD http://5amsolutions.com evidence, and thereby assign a class of confidence codes to it. 4Stanford University, Stanford, CA http://stanford.edu

B. SADI Web Service Interface

I. INTRODUCTION We developed four Semantic Automated Discovery and Integration (SADI) web services in Python1 to support easy Drug repurposing can often lead to effective new treatments access to the nanopublications. We use SADI to provide a for diseases. The ReDrugs system we are developing can discoverable, consistent API that can be re-used in other assist in this procedure through the integration of multiple applications or directly consumed by analytical tools. systems biology, pharmacology, disease association, and gene The services perform these computational tasks that would expression databases into a coherent repository of individually- otherwise be difficult to perform with SPARQL queries. The supported assertions that can each be assigned their own services return only one interaction for each triple (source, probabilistic value. We have developed an initial database that interaction type, target) but multiple, probabilities per inter- includes drug/protein, protein/protein, and protein/biological action, and more than one interaction per interaction type. process associations that is providing us a view into how drugs This is because the interaction may have been recorded in have the effects that they do. multiple databases, based on different experimental methods. To provide a single probability score for each triple, the in- II. METHODS teractions are combined. This is done to indicate that multiple experiments that produce the same results reinforce each other, We deployed an instance of the RPI semantic web and should therefore give a higher overall probability than toolsuite, Prizms, at http://redrugs.tw.rpi.edu and the would be indicated by taking their mean. Comprehensive Knowledge Archive Network (CKAN) to http://data.melagrid.org to catalog the available datasets [1]. n Cataloging is an ongoing process, but initial datasets were 1 P (x1...n)=CDF CDF (P (xi)) added to the catalog, initializing the Prizms conversion i=1 ! process. We were then able to use the Prizms infrastructure X to generate RDF for publication to our SPARQL endpoint. 1For further information on developing web services in Python We used the BigData RDF store with named graph and text using SADI, see this tutorial: https://code.google.com/p/sadi/wiki/ indexing support enabled. BuildingServicesInPython 90 ICBO 2014 Proceedings

C. User Interface gene)/disease associations like the Gene Expression Atlas (in Users can search for biological entities and processes, which progress) [5]. Further, we are very interested in integrating can then be autocompleted to specific entities that are in the newest version of the Connectivity Map dataset [6], as the ReDrugS graph. Users can then add those entities and it provides gene expression signature similarities for a large processes to the displayed graph and retrieve upstream and number of chemical and genetic perturbations. Finally, as we downstream connections and link out to more details for every develop new hypotheses about potential new drug effects, entity. Cytoscape.js is used as the main rendering and network we plan to test them using a new three-dimensional cellular tool, and provides node and edge rendering, microarray to perform high throughput drug screening [7] with layout, and network analysis capabilities. reference samples.

III. EVA L UAT I O N V. C ONCLUSION In order to evaluate this knowledge base, we developed We have developed a framework for collecting, searching, a demonstration web interface2 based on the Cytoscape.js3. analyzing, and visualizing important components of biological It lets users enter biological entity names, and as the user systems. We were able to build this by converting existing types, the text is resolved to a list of entities to be selected. databases into a common nanopublication structure that uses After that, the entity is submitted to all three SADI services the provenance of the database records to determine the quality via a basic JavaScript SADI client.4 The resulting interactions of any given piece of information through the methods used and nodes are added to the Cytoscape.js graph, which can be to provide it. We use the Semantic Automated Discovery and laid out according to a number of algorithms. Users are also Integration framework to provide simple access to data, and able to select nodes and populate upstream or downstream can visualize results using an existing interaction graph tool. connections. An example of this is shown in Figure 1. This The resulting application makes it easy to search for biological figure was obtained by putting “Topiramate” as a query in entities and see how they interact. We have already found the search box, which returned all of the biological entities some hypotheses of proteins through which drugs influence that topiramate is directly associated with. We then expanded disease conditions. We plan to expand the loaded set of data the network downstream to see what biological entities are with protein/disease associations as well as gene expression affected by topiramate’s targets. profiles, and will be using ReDrugS to produce prospective testable hypotheses.

ACKNOWLEDGMENTS A special thanks to Pascale Gaudet, who, with Michel Dumontier, evaluated the experimental methods and evidence codes listed in the Protein/Protein Interaction Ontology and Gene Ontology.

REFERENCES [1] J. P. McCusker, T. Lebo, M. Krauthammer, and D. L. McGuinness, “Next Generation Cancer Data Discovery, Access, and Integration Using Prizms and Nanopublications,” in Data Integration in the Life Sciences. Springer, Fig. 1. The ReDrugS user interface allows users to build networks of 2013, pp. 105–112. drugs, proteins, and diseases based on provenance-driven data from iRefIndex, [2] J. C. Obenauer and M. B. Yaffe, “Computational prediction of protein- DrugBank, UniProt Gene Ontology Annotations, and Online Mendelian protein interactions,” in Protein-Protein Interactions. Springer, 2004, pp. Inheritance in Man (OMIM). Users can select entities and add entities that 445–467. affect or are affected by the selected entities. They can also search for entities [3] E. Sprinzak, S. Sattath, and H. Margalit, “How reliable are experimental by name (here Topiramate was used). protein–protein interaction data?” Journal of molecular biology, vol. 327, no. 5, pp. 919–923, 2003. [4] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz, “Pellet: A practical owl-dl reasoner,” Web Semantics: science, services and agents on the World Wide Web, vol. 5, no. 2, pp. 51–53, 2007. IV. DISCUSSION AND FUTURE WORK [5] R. Petryszak, T. Burdett, B. Fiorelli, N. A. Fonseca, M. Gonzalez- We are able to successfully navigate a protein-drug-disease Porta, E. Hastings, W. Huber, S. Jupp, M. Keays, N. Kryvych, and interaction graph that is a consensus of 16 diverse sources, to et al., “Expression Atlas update–a database of gene and transcript expression from microarray- and sequencing-based functional infer prior probabilities for more than three million individual experiments,” Nucleic Acids Research, vol. 42, no. D1, p. D926–D932, assertions using their provenance and experts’ confidence in Jan 2014. [Online]. Available: http://dx.doi.org/10.1093/nar/gkt1270 different experimental methods and to find drug/disease asso- [6] J. Lamb, E. D. Crawford, D. Peck, J. W. Modell, I. C. Blat, M. J. Wrobel, J. Lerner, J.-P. Brunet, A. Subramanian, K. N. Ross et al., “The ciations that are not directly expressed by any one database. Connectivity Map: using gene-expression signatures to connect small We plan to add further data sources, especially those that molecules, genes, and disease,” science, vol. 313, no. 5795, pp. 1929– provide direct experimental results that predict protien (or 1935, 2006. [7] M.-Y. Lee, R. A. Kumar, S. M. Sukumaran, M. G. Hogg, D. S. Clark, and 2 J. S. Dordick, “Three-dimensional cellular microarray for high-throughput http://lod.melagrid.org/redrugs toxicology assays,” Proceedings of the National Academy of Sciences, vol. 3 http://cytoscape.github.io/cytoscape.js 105, no. 1, pp. 59–63, 2008. 4https://sadi.googlecode.com/svn/trunk/javascript/sadi.js 91 2/25/2014 RDF Viewer Demo

Viewing Relations, Attributes, and Entities in RDF (VRAER)

https://dl.dropboxusercontent.com/u/9752413/CSHALS2014/assertion_describe.rdf Redraw

2/25/2014 RDF Viewer Demo 2/25/2014 RDF Viewer Demo

ViewingViewing Relations, Relations, Attributes, Attributes, and and Entities Entities in inRDF RDF (VRAER) (VRAER) https://dl.dropboxusercontent.com/u/9752413/CSHALS2014/support.rdf Redraw https://dl.dropboxusercontent.com/u/9752413/CSHALS2014/assertion.rdf Redraw

2/25/2014 RDF Viewer Demo ICBO 2014 Proceedings

Viewing Relations, Attributes, and Entities in RDF (VRAER)

https://dl.dropboxusercontent.com/u/9752413/CSHALS2014/attribution.rdfA Nanopublication Redraw Framework for Biological Networks using Cytoscape.js

9% total height 19% iconJames height P. McCusker,1,2 Rui Yan,1 Kusum Solanki,1 John Erickson,1 Cynthia Chang,1 Availability: http://redrugs.tw.rpi.edu 3 1 1 White space around logo must equal atMichel least Dumontier, Jonathan Dordick, and Deborah McGuinness distance between logo icon body and left of text.

Tall version (Preferred) Icon - text ratio: the height of the bar between CORUM Use this version in the majority of cases. 5AM and Solutions is 19% of icon height NanoPub_501799_Assertion MPIDB

NanoPub_501799_SupportingSpace between the body of the icon and the text Class: GeneratedBy_MI_0096 a NanopubDerivedFrompull_down_mi_0096, is 9% total height Assertion NanoPub_501799_Assertion_Activity EquivalentClass: 1 NanoPub_501799_Assertion_Activity Ontological Resources Rensselaer Polytechnic Institute, Troy, NY a MI_0096, Activity wasQuotedFrom: MI_0463 a pull down, Activity wasGeneratedBy some ‘pull down’ pull down hadPrimarySource: pubmed:14736710 Protein/Protein Interaction Ontology, 2 direct interaction Semanticscience Integrated Ontology, Gene 5AM Solutions, Inc., Rockville, MD nanopub SubClassOf: Confidence2 curated into DIP wasGeneratedBy: NanoPub_501799_Assertion_Activity Ontology wasGeneratedBywasGeneratedBy 3Stanford University, Stanford, CA 5AM Solutions Logo Usage Guidelines vocabularies, relationships Protein/Biological Process, NanoPub_501799_AssertionNanoPub_501799_Assertion has-attribute Protein/Disease Associations NanoPub_501799_Assertion NanoPub_501799_Attribution ontology iRefIndex X Protein/Protein Interaction converted to We leverage semantic technologies and Cytoscape.js to X NanoPub_501799_AssertionNanoPub_501799_Assertion Database converted to nanopubs a MI_0407 nanopubs ReDrugS converted to a direct interaction hadPrimarySource: Regulation of the human NBC3 Class: Confidence2 + + a probability-value nanopubs create a provenance-aware, probabilistic analysis platform Quad Store has-target: SLC4A8 hadPrimarySource:Na+/HCO3-pubmed:14736710 cotransporter by has-target: SLC4A8 for systems biology and evaluate its usefulness in carbonic anhydrase II and PKA EquivalentClass: ‘has attribute’ min 1 has-value: 0.95 has-participant: CA2 wasQuotedFrom: MI_0463 has-participant: CA2 wasQuotedFrom: BioGRID evidence to Drug/Protein Interactions ⇒ probability condenses discovering links between drugs and diseases. A number of (‘probability value’ and reasoner databases have been developed that serve as a patchwork (‘has value’ value 0.95)) Experimental across the landscape of systems biology, each focused on Method ReDrugS API Different databases can provide probability Assessment SADI-based API provides interaction network Researchers different experimental methods, many species, and a wide search and expansion based on consensus the same assertions. This might Confidence scores of probabilities diversity of inclusion criteria. Systems biology has been used experimental methods. pose questions in the past to generate hypotheses for drug effects, but has be experimental replication! We view data queries graph explore network become fragmented under the large number of disparate and model this with composite z- queries graph disconnected databases. In our efforts to create a systematic scores: Analytical Tools ReDrugS approach to discovering new uses for existing drugs, we Cytoscape, R, Python, etc. Cytoscape.js App have developed Repurposing Drugs with Semantics (ReDrugS). ReDrugS is a data curation and publication Flow of data through the ReDrugS pipeline framework that can take data from nearly any database P(x) = containing biological or chemical entity interactions and display it using Cytoscape.js. ReDrugS is able to infer probability of the assertions based on its provenance using F(x): Cumulative Distribution Function experimental methods and data sources. A semantic web (converts z-scores to probabilities) service API is provided that can search, traverse, and provide composite probabilities for the resulting graph of biological entities using the SADI web service framework and Nanopublications. We show how associations between a postive control, topiramate, allows us to independently reconstruct a positive control of epilepsy and migraine, and potential consequences on bone health. Future work will incorporate additional protein/disease associations, enabling hypothesis generation on indirect drug targets, and leading to testing the resulting hypotheses using high throughput drug screening.

http://orion.tw.rpi.edu/~jimmccusker/rdfviewer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fattribution.rdf 1/1

http://orion.tw.rpi.edu/~jimmccusker/rdfviewer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fassertion_describe.rdf 1/1 http://orion.tw.rpi.edu/~jimmccusker/rdfviewer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fassertion.rdf 1/1 http://orion.tw.rpi.edu/~jimmccusker/rdfviewer/?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F9752413%2FCSHALS2014%2Fsupport.rdf 1/1 The ReDrugS framework allows users to explore protein/protein, protein/disease, and drug/protein interactions using full text search, network expansion, and statistical aggregation. The edges are rendered with their width mapped to the probability that the link exists. These statin and topiramate networks were built using a “disease finder” network expander.

92