Machine Reading for Cancer Biology

Sophia Ananiadou Naonal Centre for School of Computer Science End-to-end text mining system • Machine reading for pathways – Event extracon – Uncertainty detecon for ranking

• Integraon of machine reading tools – Argo and OpenMinTeD plaorms

• Interacve visual analycs – LitPathExplorer Movaon To support pathway construcon and design of experiments • extract evidence from literature • events, enes, contextual interpretaon

For these, we need to

• understand pathway representaons • bridge the gap between knowledge and text • read against models (deep reading) 3 From concepts to events

1 Concept recognion

2 Interacon recognion

3 Concept and interacon idenficaon

DrugBank:DB06712 DrugBank:DB00682 DrugBank:DB04610 The Big Mechanism: reading, assembly, experiments

Courtesy: Paul Cohen hp://nactem.ac.uk/big_mechanism/

5 Tools for Event Extracon EventMine hp://www.nactem.ac.uk/EventMine/ • EventMine: a machine learning pipeline event extracon system – Several parse results, diconaries – Coreference resoluon, domain adaptaon

Miwa, M., Thompson, P. and Ananiadou, S. (2012) Boosng automac event extracon from the literature using domain adaptaon and coreference resoluon. Bioinformacs, 28(13)

Miwa, M. & Ananiadou, S. (2015) Adaptable, high recall, event extracon system with minimal configuraon, BMC Bioinformacs, 16(10), S7

7 Linking interacons (events) to pathways

1. The mitoc arrest-deficient protein Mad1 forms a complex with Mad2, which is required for imposing mitoc arrest on cells in which the spindle assembly is perturbed. PMID: 18981471 2. Mad1, an upstream regulator of Mad2, forms a ght core complex with Mad2 and facilitates Mad2 binding to Cdc20. PMID: 18318601

2013 Beyond linking reacons to documents at coarse level 8 Event interpretaon

Event argument

Enty Theme argument Cause Theme 1 Theme 2

Chemical Regulaon Protein Binding Protein Results suggest that BRAF is not required for MUC1 binding to PKM2 in RAS Event trigger Event trigger

*Complex events have at SIMPLE EVENT lest one argument that is an event on its own COMPLEX EVENT Textual Menons in Context

• Our results prove that BRAF is required for MUC1 binding to PKM2 – Strong certainty • Our results suggest that BRAF is required for MUC1 binding to PKM2 – Some hedging/speculaon • Our results indicate that BRAF may be required for MUC1 binding to PKM2 – Strong hedging/speculaon • There is scarce evidence that BRAF is required for MUC1 binding to PKM2 – Hedging • We are going to test whether BRAF is required for MUC1 binding to PKM2 – Invesgaon • Oen BRAF is required for MUC1 binding to PKM2 – Frequency/me limitaon • Whether BRAF is required for MUC1 binding to PKM2 is out of the scope of this work – Admission of lack of knowledge Uncertainty cues

BioNLP-ST, GENIA-MK 11 Hybrid approach

Machine Learner (Random Forest) 1. Lexical (e.g. cues, POS tags, event-trigger surface form) 2. Syntacc (e.g. shortest path, dependency cue-trigger) 3. Semanc (e.g. event type, argument type/role)

Automated Rule Inducon (from corpus) 1. EventMine (to idenfy event triggers) 2. Deep parsing (to idenfy dependencies) 3. Cue lists Dependency relaons between cues and event triggers

Mulple menons of the same event

• Our results prove that BRAF is required for MUC1 binding to PKM2 The same interacon can be • Our results suggest that BRAF is menoned: required for MUC1 binding to PKM2 • In mulple sentences of the same • Our results indicate that BRAF may paper be required for MUC1 binding to • In mulple papers PKM2 • With different levels of certainty • There is scarce evidence that BRAF is in each menon required for MUC1 binding to PKM2 • We are going to test whether BRAF is required for MUC1 binding to We need to consolidate different PKM2 uncertainty values from each • Oen BRAF is required for MUC1 menon to one “confidence” score binding to PKM2 • Whether BRAF is required for MUC1 binding to PKM2 is out of the scope of this work

Consolidaon over several menons

Adapng subjecve logic framework: Ex ωx = (bx , dx, ux, α) disbelief base rate belief uncertainty Ex Ex negaon prior Ex Ex Ex idenficaon probabilies

uncertainty idenficaon ? Each menon of Ex mapped to a pathway is considered as the subjecve opinion of the author for the interacon described by the Ex. Evaluaon - pathway models

• B-cell acute lymphoblasc leukemia model (Pathway studio) – 72 interacons, 260 evidence passages manually selected, 1-20 sentences per interacon – 12% flagged uncertain by our system

Zerva, C., Basta-Navarro, R., Day, P. and S. Ananiadou (2017) Using uncertainty to link and rank evidence from biomedical literature for model reconstrucon, Bioinformacs Results

• Leukemia Pathway (7 annotators) ~ Pathway Studio • Average accuracy on sentence level: 0.96 • Average accuracy on interacon level: 0.87 – 1-20 sentences per interacon

Event interpretaon

• Uncertainty scoring as an expressive confidence measure • Hybrid framework • Value for each event menoned in a sentence – Consolidated uncertainty values from different papers • Effort to decrease manual effort and select more certain events

17 openminted.eu • Web-based, graphical TM workbench • Unstructured Informaon Management Architecture (UIMA) standard • Rich library of TM components • Allows Cloud and high-performance compung • Straighorward integraon of TM analycs – modular, extensible, reconfigurable, reusable workflows

Database: The Journal of Biological Databases and Curaon (2014) Text Mining-assisted Biocuraon Workflows in Argo. Rak, R., Basta-Navarro, R. T. B., Rowley, A., Carter, J. and Ananiadou, S.

Source: LEGO DUPLO Workflow Designer

19 Sample workflow (Cancer Mechanisms)

exisng components custom components exisng components supplied with custom resources Sample machine reading workflow

ü highly extensible ü can be opmised by interchanging components Annotaon Viewer/Editor

This is where the footer goes 22 Ensemble reading through federaon

• What do we gain by combining text mining tools from different groups? – enriching/updang results – comparison – merging annotaons e.g., by taking the union, intersecon, majority vote – taking advantage of best-of-breed tools

23 Federated system

Text mining workflow IASON Registry of services

Descriptor file

Text mining Text mining tool as a web tool service developer

24 Deep Reading: Integrang uncertainty

• LitPathExplorer – Visual analycs tool; maps events from literature to pathway interacons – Includes uncertainty measure

Soto, A., Zerva, C., Basta-Navarro, R., and S. Ananiadou (2017) LitPathExplorer Bioinformacs LitPathExplorer: A Visual Tool for Exploring Literature- Enriched Pathway Models • Due to their size and complexity, pathway models are typically neither complete nor error-free • Revising, updang models • Curators and consumers of these models need to contrast and revise large collecons of research arcles – Costly, me consuming

26 LitPathExplorer: a confidence-based tool for exploring pathway models 1. Enabling flexible search and exploraon of biomolecular pathway networks – different views of the data – various interacve funconalies

2. Provide a means for making exisng evidence in the scienfic literature available to support corroboraon quanfy confidence in the events 3. Facilitate the discovery of new interacons that are not yet part of a given model

4. Allow the user to become an acve parcipant of the analycal process

Soto, A. et al, 2017, Bioinformacs 27 1. Search

• A pathway model can be searched by providing: – event types, – enes, – and/or roles for each enty in the reacon • Mulple queries can be combined in a Boolean search 28 2. Network viewer Reading against the model

Enes

Reacons/ Events • Colour encodes event type • Size encodes confidence

29 3. Inspector, event confidence computaon Mapping IDs for Overall event enes and events confidence

30 3. Inspector, quanfying the confidence

Confidence breakdown

31 Adjusng event confidence

32 4. Text Analyzer – Arcles & sentences

Arcle-level language confidence

Sentence-level language confidence

33 Network Viewer: Discovery mode Extending the model with events found in the literature

34 Discovery mode Difficult to explore when too many candidate events are found

35 Verifying menons in text

36 Summary

• Text mining important to overcome silos, fragmentaon of informaon • Complex events in context and visualisaon can corroborate and extend models (deep reading) • Text mining infrastructure supporng customisaon • Towards mechanisms and system understanding

37 Naonal Centre for Text Mining

• 1st publicly funded naonal text mining centre NaCTeM • Locaon: Manchester Instute of The National Centre for Text Mining Biotechnology, University of Manchester • Since 2004- www.nactem.ac.uk • Fully sustainable 2011- • Biology, Medicine, Biodiversity

[email protected]