RESEARCH ARTICLE Extracting microRNA-gene relations from biomedical literature using distant supervision Andre Lamurias1*, Luka A. Clarke2, Francisco M. Couto1 1 LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal, 2 BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal *
[email protected] a1111111111 a1111111111 a1111111111 Abstract a1111111111 a1111111111 Many biomedical relation extraction approaches are based on supervised machine learning, requiring an annotated corpus. Distant supervision aims at training a classifier by combining a knowledge base with a corpus, reducing the amount of manual effort necessary. This is particularly useful for biomedicine because many databases and ontologies have been made available for many biological processes, while the availability of annotated corpora is OPEN ACCESS still limited. We studied the extraction of microRNA-gene relations from text. MicroRNA reg- Citation: Lamurias A, Clarke LA, Couto FM (2017) Extracting microRNA-gene relations from ulation is an important biological process due to its close association with human diseases. biomedical literature using distant supervision. The proposed method, IBRel, is based on distantly supervised multi-instance learning. We PLoS ONE 12(3): e0171929. doi:10.1371/journal. evaluated IBRel on three datasets, and the results were compared with a co-occurrence pone.0171929 approach as well as a supervised machine learning algorithm. While supervised learning Editor: Quan Zou, Tianjin University, CHINA outperformed on two of those datasets, IBRel obtained an F-score 28.3 percentage points Received: September 22, 2016 higher on the dataset for which there was no training set developed specifically. To demon- Accepted: January 29, 2017 strate the applicability of IBRel, we used it to extract 27 miRNA-gene relations from recently published papers about cystic fibrosis.