Detecting Anatomical and Functional Connectivity Relations in Biomedical Literature via Language Representation Models Ibrahim Burak Ozyurt Joseph Menke Anita Bandrowski FDI Lab Dept. of Neuroscience Scicrunch.com FDI Lab Dept. of Neuroscience UCSD La Jolla, USA San Diego, USA UCSD La Jolla, USA [email protected] [email protected] [email protected] Maryann E. Martone FDI Lab Dept. of Neuroscience UCSD La Jolla, USA [email protected] Abstract are they sufficiently detailed to include the granu- lar paths that these nerves travel. Such information Understanding of nerve-organ interactions is would be needed, for example, to understand where crucial to facilitate the development of effec- reliable access points to a particular nerve might tive bioelectronic treatments. Towards the end of developing a systematized and computable be so that stimulation only affects the most rele- wiring diagram of the autonomic nervous sys- vant nerve or to understand the mechanisms behind tem (ANS), we introduce a curated ANS con- stimulation applied at particular locations. Many nectivity corpus together with several neural scientific studies contain information about individ- language representation model based connec- ual nerves and at times the paths they traverse, but tivity relation extraction systems. We also to our knowledge, no systematic approach has been show that active learning guided curation for attempted to bring these large quantities of infor- labeled corpus expansion significantly outper- mation together into a computationally accessible forms randomly selecting connectivity relation candidates minimizing curation effort. Our fi- format. nal relation extraction system achieves F1 = The SPARC project is building a cross-species 72.8% on anatomical connectivity and F1 = connectivity knowledge base that contains detailed 74.6% on functional connectivity relation ex- information about individual nerves, their path- traction. ways, cells of origin and synaptic targets. To date, 1 Introduction this knowledge base has been populated through the development of detailed models of circuitry by The NIH Common Fund’s Stimulating Peripheral experts funded through the SPARC project using Activity to Relieve Conditions (SPARC) program the ApiNATOMY platform (Kokash and de Bono, aims to transform our understanding of nerve-organ 2021). ApiNATOMY provides a modeling lan- interactions to help spur the development of ef- guage for representing the complexity of functional fective bioelectronic treatments. Bioelectronic and anatomical circuitry in a standardized form. medicine represents the convergence of molecu- The circuitry contained in these models represent lar medicine, neuroscience, engineering and com- expert knowledge derived from the synthesis of puting to develop devices to diagnose and treat the expert’s own work and the synthesis of, in diseases (Olofsson and Tracey, 2017). One of the some cases , hundreds of scientific publications. projects within this large consortium is to create However, to ensure that information in the SPARC a systematized and computable wiring diagram of knowledge base is comprehensive and up to date, the autonomic nervous system, a part of the “wiring i.e., it represents the current state of knowledge system” that travels throughout the body transmit- about autonomic nervous system (ANS) connectiv- ting messages between the peripheral organs and ity, we sought to augment the expert-based model the brain or spinal cord. While diagrams of nerves approach with experimental information derived are currently available in medical texts (Standring from the primary scientific literature. As there are and Gray, 2008), the SPARC program seeks to map thousands of papers and additional sources like text these connections at higher levels of detail and with books, we utilized natural language processing to greater accuracy. Additionally, the diagrams in identify sentences that contained information on these medical texts are not generally queryable, nor neuronal connectivity in the ANS. 27 Proceedings of the Second Workshop on Scholarly Document Processing, pages 27–35 June 10, 2021. ©2021 Association for Computational Linguistics The task was approached by first gathering the surface level and represent long distance relation- relevant scientific literature by matching bodily ships via either convolutional or recurrent neural structures at a variety of anatomical levels (i.e. networks and an attention mechanism (Zhang et al., gasserian ganglion, vagus nerve, brainstem, etc.) 2017). from a constructed set of vocabulary at sentence In biomedical domain, relation extraction work level. Then, annotators classified each structure to is traditionally focused on protein-protein, gene- structure relationship using only the information disease or protein-chemical interactions. Sev- provided within the sentence based on the con- eral labeled datasets, such as GAD (Bravo nectivity types defined in our annotation guideline. et al., 2015) (a gene-disease relation dataset) and This structured data were then used to train our con- CHEMPROT (Krallinger et al., 2017) (a protein- nectivity relations models. Data from two curators chemical multi-relation dataset) are publicly avail- was used to assess the inter-curator agreement to de- able. Neural sequence models have also been termine if the annotation guidelines are sufficient to applied to protein-chemical relation extraction “teach” the task to humans. We assessed connectiv- task (Lim and Kang, 2018). ity statements into several types including, anatom- Recently, sentence level transformer based lan- ical connectivity, functional connectivity, structural guage representation models such as BERT (De- connectivity, topological connectivity and general vlin et al., 2019) have shown superior down- connectivity as well as no connectivity. The general stream performance on many NLP tasks. A connectivity and no connectivity categories can be biomedical domain adapted version of BERT called thought of as statements that are too vague to be BioBERT (Lee et al., 2019) has been shown state of of much direct use for our use case. The most im- the art performance on several biomedical relation portant statements are anatomical connectivity, elu- extraction tasks. cidating which parts are connected physically and While most of the transformer based language functional connectivity, elucidating which parts are representation models are pretrained on sentences connected functionally. A definition and an exam- where a predefined percentage of the tokens are ple for each connectivity type used for annotation masked and the model learns to predict the masked is shown in Table1. Of course with single sen- tokens, a recently introduced language representa- tences, it is difficult to define a direct functional tion model, ELECTRA (Clark et al., 2020) learns relationship, which typically rests on the latency to discriminate if a token in the original input is with which a signal is detected between two ele- replaced by a language generator model or not. The ments (Bennett, 2001). However, statements about generator model is a BERT like generative model latency are very rare in the subset of the peripheral that is co-trained with the discriminative model. nervous system literature, whereas somewhat more While there are efforts to extract brain con- general statements about functional relationships nectivity information from neuroscience litera- that, for example, describe damage to one area ture (Richardet et al., 2015), their focus is in the and altered functioning in another, are more abun- cognitive parts of the brain instead of ANS. In this dant. We hypothesize that when such statements paper, we introduce a labeled ANS connectivity cor- are reasonably abundant, a detection classifier will pus, together with four biomedical domain adapted be easier to train. ELECTRA models, that we have used to develop an anatomical and functional connectivity relation In relation extraction, long-range relations are extraction system that outperforms BioBERT. usually handled using dependency parse tree infor- mation. In traditional feature-based models, paths 2 Methods in the dependency parse tree between entities are used used as features (Kambhatla, 2004) which 2.1 Vocabulary suffered from the sparsity of the feature patterns. In order to better structure information from pa- More recently, neural models are increasingly em- pers, anatomical structure labels were drawn from ployed for relation extraction instead of feature en- a set of relevant ontologies, also approved for use gineering using vectorized word embeddings. The by the SPARC project. These ontology terms dependency information is represented as computa- include primarily FMA (RRID:SCR_003379), tion graphs along the parse tree (Zhang et al., 2018). UBERON (RRID:SCR_010668), and NIFSTD Sequence models, on the other hand, work at the (RRID:SCR_005414) terms, and they are listed 28 Relation Definition Example functional a relationship was determined to The HB reflex is a reflex initiated by lung infla- exist between two structures using tion, which excited the myelinated fibers of vagus physiological techniques nerve, pulmonary stretch receptors [11,19]. anatomical a physical synaptic relationship was Only the most prominent nervous connections, observed between two structures us- such as the penis nerve cord (pnc, Fig. 8a), con- ing anatomical techniques such as necting the ventral ganglion to the penis gan- tract tracing glion can be detected. structural a relationship
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-