Machine Reading for Cancer Biology
Sophia Ananiadou Na onal Centre for Text Mining School of Computer Science University of Manchester End-to-end text mining system • Machine reading for pathways – Event extrac on – Uncertainty detec on for ranking
• Integra on of machine reading tools – Argo and OpenMinTeD pla orms
• Interac ve visual analy cs – LitPathExplorer Mo va on To support pathway construc on and design of experiments • extract evidence from literature • events, en es, contextual interpreta on
For these, we need to
• understand pathway representa ons • bridge the gap between knowledge and text • read against models (deep reading) 3 From concepts to events
1 Concept recogni on
2 Interac on recogni on
3 Concept and interac on iden fica on
DrugBank:DB06712 DrugBank:DB00682 DrugBank:DB04610 The Big Mechanism: reading, assembly, experiments
Courtesy: Paul Cohen h p://nactem.ac.uk/big_mechanism/
5 Tools for Event Extrac on EventMine h p://www.nactem.ac.uk/EventMine/ • EventMine: a machine learning pipeline event extrac on system – Several parse results, dic onaries – Coreference resolu on, domain adapta on
Miwa, M., Thompson, P. and Ananiadou, S. (2012) Boos ng automa c event extrac on from the literature using domain adapta on and coreference resolu on. Bioinforma cs, 28(13)
Miwa, M. & Ananiadou, S. (2015) Adaptable, high recall, event extrac on system with minimal configura on, BMC Bioinforma cs, 16(10), S7
7 Linking interac ons (events) to pathways
1. The mito c arrest-deficient protein Mad1 forms a complex with Mad2, which is required for imposing mito c arrest on cells in which the spindle assembly is perturbed. PMID: 18981471 2. Mad1, an upstream regulator of Mad2, forms a ght core complex with Mad2 and facilitates Mad2 binding to Cdc20. PMID: 18318601
2013 Beyond linking reac ons to documents at coarse level 8 Event interpreta on
Event argument
En ty Theme argument Cause Theme 1 Theme 2
Chemical Regula on Protein Binding Protein Results suggest that BRAF is not required for MUC1 binding to PKM2 in RAS Event trigger Event trigger
*Complex events have at SIMPLE EVENT lest one argument that is an event on its own COMPLEX EVENT Textual Men ons in Context
• Our results prove that BRAF is required for MUC1 binding to PKM2 – Strong certainty • Our results suggest that BRAF is required for MUC1 binding to PKM2 – Some hedging/specula on • Our results indicate that BRAF may be required for MUC1 binding to PKM2 – Strong hedging/specula on • There is scarce evidence that BRAF is required for MUC1 binding to PKM2 – Hedging • We are going to test whether BRAF is required for MUC1 binding to PKM2 – Inves ga on • O en BRAF is required for MUC1 binding to PKM2 – Frequency/ me limita on • Whether BRAF is required for MUC1 binding to PKM2 is out of the scope of this work – Admission of lack of knowledge Uncertainty cues
BioNLP-ST, GENIA-MK 11 Hybrid approach
Machine Learner (Random Forest) 1. Lexical (e.g. cues, POS tags, event-trigger surface form) 2. Syntac c (e.g. shortest path, dependency cue-trigger) 3. Seman c (e.g. event type, argument type/role)
Automated Rule Induc on (from corpus) 1. EventMine (to iden fy event triggers) 2. Deep parsing (to iden fy dependencies) 3. Cue lists Dependency rela ons between cues and event triggers
Mul ple men ons of the same event
• Our results prove that BRAF is required for MUC1 binding to PKM2 The same interac on can be • Our results suggest that BRAF is men oned: required for MUC1 binding to PKM2 • In mul ple sentences of the same • Our results indicate that BRAF may paper be required for MUC1 binding to • In mul ple papers PKM2 • With different levels of certainty • There is scarce evidence that BRAF is in each men on required for MUC1 binding to PKM2 • We are going to test whether BRAF is required for MUC1 binding to We need to consolidate different PKM2 uncertainty values from each • O en BRAF is required for MUC1 men on to one “confidence” score binding to PKM2 • Whether BRAF is required for MUC1 binding to PKM2 is out of the scope of this work
Consolida on over several men ons
Adap ng subjec ve logic framework: Ex ωx = (bx , dx, ux, α) disbelief base rate belief uncertainty Ex Ex nega on prior Ex Ex Ex iden fica on probabili es
uncertainty iden fica on ? Each men on of Ex mapped to a pathway is considered as the subjec ve opinion of the author for the interac on described by the Ex. Evalua on - pathway models
• B-cell acute lymphoblas c leukemia model (Pathway studio) – 72 interac ons, 260 evidence passages manually selected, 1-20 sentences per interac on – 12% flagged uncertain by our system
Zerva, C., Ba sta-Navarro, R., Day, P. and S. Ananiadou (2017) Using uncertainty to link and rank evidence from biomedical literature for model reconstruc on, Bioinforma cs Results
• Leukemia Pathway (7 annotators) ~ Pathway Studio • Average accuracy on sentence level: 0.96 • Average accuracy on interac on level: 0.87 – 1-20 sentences per interac on
Event interpreta on
• Uncertainty scoring as an expressive confidence measure • Hybrid framework • Value for each event men oned in a sentence – Consolidated uncertainty values from different papers • Effort to decrease manual effort and select more certain events
17 openminted.eu • Web-based, graphical TM workbench • Unstructured Informa on Management Architecture (UIMA) standard • Rich library of TM components • Allows Cloud and high-performance compu ng • Straigh orward integra on of TM analy cs – modular, extensible, reconfigurable, reusable workflows
Database: The Journal of Biological Databases and Cura on (2014) Text Mining-assisted Biocura on Workflows in Argo. Rak, R., Ba sta-Navarro, R. T. B., Rowley, A., Carter, J. and Ananiadou, S.
Source: LEGO DUPLO Workflow Designer
19 Sample workflow (Cancer Mechanisms)
exis ng components custom components exis ng components supplied with custom resources Sample machine reading workflow
ü highly extensible ü can be op mised by interchanging components Annota on Viewer/Editor
This is where the footer goes 22 Ensemble reading through federa on
• What do we gain by combining text mining tools from different groups? – enriching/upda ng results – comparison – merging annota ons e.g., by taking the union, intersec on, majority vote – taking advantage of best-of-breed tools
23 Federated system
Text mining workflow IASON Registry of services
Descriptor file
Text mining Text mining tool as a web tool service developer
24 Deep Reading: Integra ng uncertainty
• LitPathExplorer – Visual analy cs tool; maps events from literature to pathway interac ons – Includes uncertainty measure
Soto, A., Zerva, C., Ba sta-Navarro, R., and S. Ananiadou (2017) LitPathExplorer Bioinforma cs LitPathExplorer: A Visual Tool for Exploring Literature- Enriched Pathway Models • Due to their size and complexity, pathway models are typically neither complete nor error-free • Revising, upda ng models • Curators and consumers of these models need to contrast and revise large collec ons of research ar cles – Costly, me consuming
26 LitPathExplorer: a confidence-based tool for exploring pathway models 1. Enabling flexible search and explora on of biomolecular pathway networks – different views of the data – various interac ve func onali es
2. Provide a means for making exis ng evidence in the scien fic literature available to support corrobora on quan fy confidence in the events 3. Facilitate the discovery of new interac ons that are not yet part of a given model
4. Allow the user to become an ac ve par cipant of the analy cal process
Soto, A. et al, 2017, Bioinforma cs 27 1. Search
• A pathway model can be searched by providing: – event types, – en es, – and/or roles for each en ty in the reac on • Mul ple queries can be combined in a Boolean search 28 2. Network viewer Reading against the model
En es
Reac ons/ Events • Colour encodes event type • Size encodes confidence
29 3. Inspector, event confidence computa on Mapping IDs for Overall event en es and events confidence
30 3. Inspector, quan fying the confidence
Confidence breakdown
31 Adjus ng event confidence
32 4. Text Analyzer – Ar cles & sentences
Ar cle-level language confidence
Sentence-level language confidence
33 Network Viewer: Discovery mode Extending the model with events found in the literature
34 Discovery mode Difficult to explore when too many candidate events are found
35 Verifying men ons in text
36 Summary
• Text mining important to overcome silos, fragmenta on of informa on • Complex events in context and visualisa on can corroborate and extend models (deep reading) • Text mining infrastructure suppor ng customisa on • Towards mechanisms and system understanding
37 Na onal Centre for Text Mining
• 1st publicly funded na onal text mining centre NaCTeM • Loca on: Manchester Ins tute of The National Centre for Text Mining Biotechnology, University of Manchester • Since 2004- www.nactem.ac.uk • Fully sustainable 2011- • Biology, Medicine, Biodiversity