Data Analytics Subgroup Report
Total Page:16
File Type:pdf, Size:1020Kb
Data Analytics Subgroup report Paolo Alcini - EMA (subgroup lead) Gianmario Candore – EMA (subgroup lead) Marek Lehmann - EMA Luis Pinheiro - EMA Antti Hyvärinen - Fimea Hans Ovelgonne – CBG-MED Mateja Sajovic - JAZMP Panagiotis Telonis - EMA Kevin Horan – HPRA (until May 2018) Massimiliano Falcinelli – EMA (from September 2018) See websites for contact details Heads of Medicines Agencies www.hma.eu European Medicines Agency www.ema.europa.eu An agency of the European Union Table of content Executive summary ..................................................................................................... 7 Data Standardisation ................................................................................................ 7 Information technology ............................................................................................ 8 Data manipulation ................................................................................................... 9 Artificial intelligence ................................................................................................. 9 Conclusions ........................................................................................................... 10 Acknowledgments .................................................................................................. 11 Data Analytics – Standardisation................................................................................. 12 1. Standardisation..................................................................................................... 12 1.1. Why .............................................................................................................. 12 1.2. Objectives ...................................................................................................... 14 1.3. Defining the main concepts .............................................................................. 14 1.4. Overview ........................................................................................................ 16 1.5. Opportunities (or use) in regulatory activities ..................................................... 17 1.5.1. Clinical trial domain ................................................................................... 18 1.5.2. Genomics domain ...................................................................................... 19 1.5.3. Bioanalytics Omics domain ......................................................................... 20 1.5.4. Social media domain .................................................................................. 20 1.5.5. Observational data/Real World evidence (RWE) domain.................................. 20 1.5.6. Spontaneous ADR ...................................................................................... 21 1.6. Challenges in regulatory activities ..................................................................... 21 1.7. Regulatory implications .................................................................................... 23 1.8. Conclusions .................................................................................................... 23 1.9. Recommendations ........................................................................................... 24 1.9.1. Subgroups recommendations supporting the needs or standardisation ............. 32 1.9.2. Useful references ....................................................................................... 36 Data Analytics - Information Technology for Big Data .................................................... 38 2. Information Technology ......................................................................................... 38 2.1. Why .............................................................................................................. 38 2.2. Objectives ...................................................................................................... 38 2.3. Main concepts ................................................................................................. 38 2.3.1. Big data ................................................................................................... 38 2.3.2. Big data sources ........................................................................................ 39 2.3.3. Big data formats........................................................................................ 40 Data Analytics Page 2/147 2.3.4. Data analytics models ................................................................................ 41 2.4. Overview ........................................................................................................ 42 2.4.1. Data storage technologies .......................................................................... 42 2.4.2. Hadoop ecosystem .................................................................................... 45 2.4.3. Cloud big data storage ............................................................................... 46 2.4.4. Data integration technologies ...................................................................... 47 2.4.5. Data warehouses and data lakes ................................................................. 47 2.4.6. Architecture .............................................................................................. 50 2.4.7. Related concepts and technologies .............................................................. 53 2.5. Opportunities .................................................................................................. 55 2.6. Challenges ..................................................................................................... 56 2.7. Recommendations ........................................................................................... 57 Data Analytics – Data manipulation ............................................................................. 59 3. Data manipulation ................................................................................................. 59 3.1. Why .............................................................................................................. 59 3.2. Objectives ...................................................................................................... 59 3.3. Main concepts ................................................................................................. 59 3.4. Glossary ......................................................................................................... 60 3.5. Overview ........................................................................................................ 61 3.5.1. Data types ................................................................................................ 61 3.5.2. Reshaping Data ......................................................................................... 63 3.5.3. Transforming Data ..................................................................................... 66 3.5.4. Dealing with missing data ........................................................................... 67 3.5.5. Dealing with incorrect data ......................................................................... 69 3.5.6. Metadata .................................................................................................. 70 3.6. Opportunities in regulatory activities .................................................................. 70 3.7. Challenges in regulatory activities ..................................................................... 71 3.8. Recommendations ........................................................................................... 72 3.9. Resources ...................................................................................................... 74 Data Analytics – The impact of artificial intelligence on analytics in the regulatory setting .. 76 4. The impact of artificial intelligence on analytics in the regulatory setting ...................... 76 4.1. Why .............................................................................................................. 76 4.2. Objectives ...................................................................................................... 77 4.3. Introduction ................................................................................................... 78 4.3.1. Defining the main concepts ......................................................................... 78 Data Analytics Page 3/147 4.3.2. Two approaches to AI ................................................................................ 79 4.3.3. Machine learning ....................................................................................... 80 4.3.4. Deep learning ........................................................................................... 81 4.3.5. Natural language processing ....................................................................... 82 4.3.6. Why AI is becoming popular ....................................................................... 82 4.3.7. Aim of the AI algorithms ............................................................................ 83 4.3.8. Which AI algorithm to use .......................................................................... 84 4.3.9. Summary of the main points ....................................................................... 84 4.4. Opportunities in regulatory activities .................................................................. 85 4.4.1. Efficiency and automation