Genomics Genetics, Transcriptomics and Epigenetics Subgroup Report

Genomics Genetics, Transcriptomics and Epigenetics Subgroup report Didier Meulendijks, NL Dieter Deforce, BE Hans Ovelgönne, NL Renate König, DE Marjon Pasmooij, NL (subgroup lead) See websites for contact details Heads of Medicines Agencies www.hma.eu European Medicines Agency www.ema.europa.eu An agency of the European Union 1. Summary ................................................................................................. 4 2. Background ............................................................................................. 5 2.1. Genomics ............................................................................................................ 5 2.2. Genetics .............................................................................................................. 5 2.3. Transcriptomics .................................................................................................... 6 2.4. Epigenetics .......................................................................................................... 6 2.5. Microbiomics ........................................................................................................ 8 3. Objectives ............................................................................................... 8 4. Methods ................................................................................................... 9 5. Results of the data characterisation ........................................................ 9 5.1. General overview and history ................................................................................. 9 5.2. European regulatory scientific guidelines ............................................................... 12 5.3. U.S. Food and Drug Administration (FDA) .............................................................. 14 5.4. Data sources ..................................................................................................... 19 5.4.1. Public data sources .......................................................................................... 19 5.4.2. Genomics initiatives ......................................................................................... 23 5.4.3. Conclusions on Data Sources ............................................................................ 26 5.5. Volume ............................................................................................................. 26 5.5.1. Size of the data source ..................................................................................... 26 5.5.2. Structure (terminology, structured vs. unstructured data) .................................... 27 5.6. Veracity ............................................................................................................ 28 5.6.1. Data provenance ............................................................................................. 28 5.6.2. Data Quality ................................................................................................... 28 5.6.3. Completeness (opportunity to capture the data) .................................................. 30 5.6.4. Representativeness .......................................................................................... 31 5.6.5. Analytical tests / variability ............................................................................... 31 5.6.6. Validation ....................................................................................................... 32 5.7. Variability .......................................................................................................... 33 5.7.1. Data heterogeneity .......................................................................................... 33 5.7.2. Data standards ................................................................................................ 34 5.7.3. Data processing .............................................................................................. 36 5.8. Velocity ............................................................................................................. 39 5.8.1. Speed of change .............................................................................................. 39 5.8.2. Rate of accumulation ....................................................................................... 39 5.9. Value ................................................................................................................ 40 5.9.1. Usability of genomics big data ........................................................................... 40 5.9.2. Identify any uncertainties or unknowns which require further exploration ............... 44 5.9.3. Possible gaps in current European guidance ........................................................ 44 5.9.4. Data Accessibility - consider privacy and governance challenges and the limitations to access as this will affect the value from a regulatory context .......................................... 45 5.9.5. Data analytics - discuss current and potential new approaches .............................. 45 5.9.6. Regulatory challenges ...................................................................................... 45 6. Conclusions ........................................................................................... 46 6.1. Summarised key messages ................................................................................. 46 Genomics Genetics, Transcriptomics and Epigenetics Page 2/69 6.2. Specific recommendations from the analysis .......................................................... 47 7. References ............................................................................................ 54 Appendix 1: Definitions ............................................................................. 57 Appendix 2: Genomics initiatives ............................................................... 58 Genomics Genetics, Transcriptomics and Epigenetics Page 3/69 1. Summary This report is a deliverable from the HMA/EMA Big Data Task Force, ‘Genomics’ subgroup, and focuses on the characterisation and mapping of a type of ‘big data’, namely genomics data, including epigenetics and transcriptomics data. In addition, this report describes the potential usability/applicability of genomics data in regulatory processes as well as recommendations from the HMA/EMA Big Data Task Force regarding how to optimise the future use of genomics (big) data in regulatory processes. There are different public data sources (databases) where genomics data can be uploaded and freely accessed. Most publicly available genomics data sources are derived from investigator-initiated (non- industry-driven) initiatives and contain only genomics data, without phenotypic/clinical outcome data linked to the genomics data. Some data sources do contain phenotypic data, mainly data on the presence or absence of genetic/hereditary diseases. Other phenotypic data, in particular clinical outcome data (e.g. data on efficacy or safety of treatments) are currently only sporadically found in available public databases (e.g. PharmGKB, https://www.pharmgkb.org/), although a vast number of initiatives are now being performed which do link genomics data to clinical data. Most genomics data generated by pharmaceutical industry, which is often genomics data linked to clinical outcomes (e.g. from clinical trials), are not publicly available. These data would be of interest to regulators as the data could be used for regulatory purposes (e.g. pharmacovigilance, identification of biomarkers for efficacy, etc.). Incentives to make genomics data available to regulators and/or publicly available could be beneficial for the regulatory system, e.g. it could facilitate more individual patient-based B/R assessment as opposed to population-based B/R assessment. Therefore, it could be considered to request companies upon a marketing authorisation application to make genomics data linked to clinical data from the pivotal clinical trials available in public databases or to the EMA. By doing so, the data would be accessible for further analyses for academic as well as potentially for regulatory purposes. To facilitate data sharing in a secure way, it could be explored whether the EMA should provide a central platform for sharing of clinical trial data, or whether the EMA could provide a portal linking to industry-owned data. To be able to optimally profit from available data, it would be important to link the most important parameters related to phenotype and/or treatment outcome to the genomics data (e.g. adverse events, primary efficacy outcomes). Importantly, various privacy, security and ethical issues need to be addressed before genomics data sharing can become common practice. For example, informed consent would have to be adequately covered in relation to data sharing. Furthermore, to be able to link data from different sources, a system that includes patient identifiers that ensures both adequate linking of data and patient privacy would be needed. Linking clinical and phenotype data across datasets would both empower precision medicine, but also introduce new privacy risks. The latter is especially of concern for rare diseases where there are sometimes only a few patients with a specific mutation worldwide. Genomics data quality can be improved by improving standardisation (e.g. making use of standard operating procedures), by requiring raw data to be shared in addition to processed data, by requiring meta-data to be attached

Genomics Genetics, Transcriptomics and Epigenetics Subgroup Report

Applied Category Theory for Genomics – an Initiative

The 4D Nucleome Project

Genetic and Environmental Exposures Constrain Epigenetic Drift Over the Human Life Course

The International Human Epigenome Consortium (IHEC): a Blueprint for Scientific Collaboration and Discovery

Distinctive Regulatory Architectures of Germline-Active and Somatic Genes in C

Small Variants Frequently Asked Questions (FAQ) Updated September 2011

Integrative Analysis for Elucidating Transcriptomics Landscapes of Glucocorticoid-Induced Osteoporosis

Invitrogen Lentiarray Human CRISPR Library, 96-Well Plate User Guide

Ecological Developmental Biology and Disease States CHAPTER 5 Teratogenesis: Environmental Assaults on Development 167

DNA Sequencing and Sorting: Identifying Genetic Variations

RNA Isolation and Purification for Every Sample, RNA Type, and Application Isolate and Purify RNA with Confidence

A Pilot Analysis of DNA Methylation in Candidate Genes in Brain