Using Machine Learning to Understand Text for Pharmacovigilance: a Systematic Review

Using Machine Learning to Understand Text for Pharmacovigilance: A Systematic Review Patrick Pilipiec Computer Science and Engineering, master's level (60 credits) 2021 Luleå University of Technology Department of Computer Science, Electrical and Space Engineering Abstract Background: Pharmacovigilance is a science that involves the ongoing monitoring of adverse drug reactions of existing medicines. Its primary purpose is to sustain and improve public health. The existing systems that apply the science of pharmacovigilance to practice are, however, not only expensive and time-consuming, but they also fail to include experiences from many users. The application of computational linguistics to user-generated text is hypothesized as a pro-active and an effective supplemental source of evidence. Objective: To review the existing evidence on the effectiveness of computational linguistics to understand user-generated text for the purpose of pharmacovigilance. Methodology: A broad and multi-disciplinary systematic literature search was conducted that involved four databases. Studies were considered relevant if they reported on the application of computational linguistics to understand text for pharmacovigilance. Both peer- reviewed journal articles and conference materials were included. The PRISMA guidelines were used to evaluate the quality of this systematic review. Results: A total of 16 relevant publications were included in this systematic review. All studies were evaluated to have a medium reliability and validity. Despite the quality, for all types of drugs, a vast majority of publications reported positive findings with respect to the identification of adverse drug reactions. The remaining two studies reported rather neutral results but acknowledged the potential of computational linguistics for pharmacovigilance. Conclusions: There exists consistent evidence that computational linguistics can be used effectively and accurately on user-generated textual content that was published to the Internet, to identify adverse drug reactions for the purpose of pharmacovigilance. The evidence suggests that the analysis of textual data has the potential to complement the traditional system of pharmacovigilance. Recommendations for researchers and practitioners of computational linguistics, policy makers, and users of drugs are suggested. Keywords: Adverse drug reactions, Computational linguistics, Machine learning, Pharmacovigilance, Public health, Social media, User-generated content 2 Table of Contents 1 Introduction ........................................................................................................................ 5 1.1 Research Problem ...................................................................................................... 5 1.2 Background and Relevance ........................................................................................ 5 1.3 Research Gap ............................................................................................................. 6 1.4 Research Objective .................................................................................................... 7 2 Literature Review ............................................................................................................... 8 2.1 Process from Drug Development to Market Authorization ....................................... 8 2.2 Need to Monitor Drug Safety for Adverse Drug Reactions ...................................... 9 2.3 Pharmacovigilance ..................................................................................................... 9 2.4 Emergence of Machine Learning ............................................................................. 10 2.5 Computational Linguistics ....................................................................................... 11 2.6 Applications of Computational Linguistics ............................................................. 12 3 Methodology .................................................................................................................... 14 3.1 Information Sources ................................................................................................. 14 3.2 Search Strategy ........................................................................................................ 14 3.3 Study Selection ........................................................................................................ 15 3.4 Inclusion and Exclusion Criteria .............................................................................. 16 3.5 Data Analysis ........................................................................................................... 16 4 Results .............................................................................................................................. 18 4.1 Literature Search and Study Selection ..................................................................... 18 4.2 Study Characteristics ............................................................................................... 20 4.3 Effectiveness of Computational Linguistics to Identify Adverse Drug Reactions .. 23 5 Discussion ........................................................................................................................ 25 5.1 Discussion ................................................................................................................ 25 5.2 Limitations ............................................................................................................... 26 3 5.3 Recommendations .................................................................................................... 27 6 Conclusion ....................................................................................................................... 29 References ................................................................................................................................ 30 Appendices ............................................................................................................................... 50 Appendix 1. Search Strategies ............................................................................................. 50 Appendix 2. Characteristics of Publications ........................................................................ 52 Appendix 3. PRISMA Checklist .......................................................................................... 57 4 1 Introduction This chapter first introduces the research problem. Subsequently, the background and relevance of this study are discussed. Furthermore, the research gap that this study addresses is identified. This chapter concludes with specifying the research objective. 1.1 Research Problem This study investigates the problem that there exists a paucity of evidence on the application of machine learning, and in particular computational linguistics, to understand user-generated textual content for the purpose of pharmacovigilance. 1.2 Background and Relevance In drug development, there exists a strong tension between accessibility and safety. While drugs can effectively cure diseases and improve someone’s life (Santoro, Genov, Spooner, Raine, & Arlett, 2017), the required process of research and development of drugs is, however, expensive, and not surprisingly, pharmaceutical companies have a high stake in yielding a profit on their investment (Barlow, 2017). This increases the urgency to make effective drugs available to the public swiftly. In contrast, medicines can also induce adverse drug reactions that may result in mortality, for which the identification of such reactions demands thorough and time-consuming testing of the drug’s safety, and it thereby drastically increases the time-to-market of new drugs (World Health Organization, 2004). In fact, the potential consequences of adverse drug reactions are significant. In the European Union (EU), five per cent of the hospital admissions and almost 200,000 deaths were caused by adverse drug reactions in 2008, and the associated societal cost totaled 79 billion Euros (European Commission, 2008). The system that applies tools and practices from the research field of pharmacovigilance was introduced to alleviate this tension (Beninger, 2018). This system performs ongoing monitoring of adverse drug reactions of existing drugs (Beninger, 2018). It also enables that the time-to-market of effective drugs is minimized, but that its long-term safety is continuously examined post the market authorization (European Commission, 2020). The worldwide need for the swift development and public availability of a new effective drug was 5 also noticeable for COVID-19, which demonstrates the importance of the tension between reducing the time-to-market but maintaining drug safety. Overall, pharmacovigilance is the cornerstone in the regulation of drugs (Santoro et al., 2017). The traditional system that applies pharmacovigilance is, however, suboptimal, mainly because it is very expensive and often fails to monitor adverse drug reactions experienced by users if these are not reported to the authorities, pharmaceutical companies, or to medical professionals (European Commission, 2020; Schmider et al., 2019). The reporting of these adverse drug reactions is important because it may help to protect public health (Santoro et al., 2017). In today’s society, many people share personal content on social media (Kaplan & Haenlein, 2010; Monkman, Kaiser, & Hyder, 2018; Moro, Rita, & Vala,

Using Machine Learning to Understand Text for Pharmacovigilance: a Systematic Review

Drug Safety in Translational Paediatric Research: Practical Points to Consider for Paediatric Safety Proﬁling and Protocol Development: a Scoping Review

Guideline on the Conduct of Pharmacovigilance for Vaccines for Pre- and Post-Exposure Prophylaxis Against Infectious Diseases

Guidance for Industry E2E Pharmacovigilance Planning

Automatic Correction of Real-Word Errors in Spanish Clinical Texts

Ecological Impacts of Veterinary Pharmaceuticals: More Transparency – Better Protection of the Environment

Natural Language Toolkit Based Morphology Linguistics

Practice with Python

Question Answering by Bert

Arxiv:2009.12534V2 [Cs.CL] 10 Oct 2020 ( Tasks Many Across Progress Exciting Seen Have We Ilo Paes Akpetandde Language Deep Pre-Trained Lack Speakers, Billion a ( Al

Unlocking the Power of Pharmacovigilance* an Adaptive Approach to an Evolving Drug Safety Environment

Natural Language Processing Based Generator of Testing Instruments

Causality in Pharmacoepidemiology and Pharmacovigilance: a Theoretical Excursion