Name:

Personal Genomes: Accessing,

Sharing and Interpretation

Wellcome Genome Campus Conference Centre, Hinxton, Cambridge, UK 11-12 April 2019

Scientific Programme Committee:

Stephan Beck University College London, UK

Mad Price Ball Open Humans Foundation, USA

Manuel Corpas Cambridge Precision Medicine, UK

Mahsa Shabani University of Leuven, Belgium

Tweet about it: #PersGen19

@ACSCevents /ACSCevents /c/WellcomeGenomeCampusCoursesandConferences

1

Wellcome Genome Campus Scientific Conferences Team:

Rebecca Twells Treasa Creavin Nicole Schatlowski Head of Advanced Courses and Scientific Programme Scientific Programme Scientific Conferences Manager Officer

Jemma Beard Lucy Criddle Laura Hubbard Conference and Events Conference and Events Conference and Events Manager Organiser Organiser

Sarah Offord Zoey Willard Conference and Events Office Conference and Events Administrator Organiser

2

Dear colleague,

I would like to offer you a warm welcome to the Wellcome Genome Campus Advanced Courses and Scientific Conferences: Personal Genomes: Accessing, Sharing and Interpretation. I hope you will find the talks interesting and stimulating, and find opportunities for networking throughout the schedule.

The Wellcome Genome Campus Advanced Courses and Scientific Conferences programme is run on a not-for-profit basis, heavily subsidised by the Wellcome Trust.

We organise around 50 events a year on the latest biomedical science for research, diagnostics and therapeutic applications for human and animal health, with world-renowned scientists and clinicians involved as scientific programme committees, speakers and instructors.

We offer a range of conferences and laboratory-, IT- and discussion-based courses, which enable the dissemination of knowledge and discussion in an intimate setting. We also organise invitation-only retreats for high-level discussion on emerging science, technologies and strategic direction for select groups and policy makers. If you have any suggestions for events, please contact me at the email address below.

The Wellcome Genome Campus Scientific Conferences team are here to help this meeting run smoothly, and at least one member will be at the registration desk between sessions, so please do come and ask us if you have any queries. We also appreciate your feedback and look forward to your comments to continually improve the programme.

Best wishes,

Dr Rebecca Twells Head of Advanced Courses and Scientific Conferences [email protected]

3

General Information

Conference Badges Please wear your name badge at all times to promote networking and to assist staff in identifying you.

Scientific Session Protocol Photography, audio or video recording of the scientific sessions, including poster session is not permitted.

Social Media Policy To encourage the open communication of science, we would like to support the use of social media at this year’s conference. Please use the conference hashtag #PersGen19. You will be notified at the start of a talk if a speaker does not wish their talk to be open. For posters, please check with the presenter to obtain permission.

Internet Access Wifi access instructions:  Join the ‘ConferenceGuest’ network  Enter your name and email address to register  Click ‘continue’ to send an email to the registered email address  Open the registration email, follow the link ‘click here’ and confirm the address is valid  Enjoy seven days’ free internet access!  Repeat these steps on up to 5 devices to link them to your registered email address

Presentations Please provide an electronic copy of your talk to a member of the AV team who will be based in the meeting room.

Poster Sessions Posters will be displayed throughout the conference. Please display your poster in the Conference Centre on arrival. There will be one poster sessions during the conference. Which takes place on Thursday, 11 April at 18:30-19:30.

The abstract page number indicates your assigned poster board number. An index of poster numbers appears in the back of this book.

Conference Meals and Social Events Lunch and dinner will be served in the Hall, apart from lunch on Thursday 11 April when it will be served in the Conference Centre. Please refer to the conference programme in this book as times will vary based on the daily scientific presentations. Please note there are no lunch or dinner facilities available outside of the conference times.

All conference meals and social events are for registered delegates. Please inform the conference organiser if you are unable to attend dinner.

The Hall Bar (cash bar) will be open from 19:00 – 23:00 on Thursday, 11 April.

Dietary Requirements If you have advised us of any dietary requirements, you will find a coloured dot on your badge. Please make yourself known to the catering team and they will assist you with your meal request.

If you have a gluten allergy, we are unable to guarantee the non-presence of gluten in dishes even if they are not used as a direct ingredient. This is due to gluten ingredients being used in the kitchen.

4

For Wellcome Genome Campus Conference Centre Guests Check in If you are staying on site at the Wellcome Genome Campus Conference Centre, you may check into your room from 14:00. The Conference Centre reception is open 24 hours.

Breakfast Your breakfast will be served in the Hall restaurant from 07:30 – 09:00

Telephone If you are staying on-site and would like to use the telephone in your room, you will need to contact the Reception desk (Ext. 5000) to have your phone line activated - they will require your credit card number and expiry date to do so.

Departures You must vacate your room by 10:00 on the day of your departure. Please ask at reception for assistance with luggage storage in the Conference Centre.

Taxis Please find a list of local taxi numbers on our website. The conference centre reception will also be happy to book a taxi on your behalf.

Return Ground Transport Complimentary return transport has been arranged for 18:30 on Friday, 12 April to Cambridge station and city centre (Downing Street), and Stansted and Heathrow airports.

A sign-up sheet will be available at the conference registration desk from 15:30 on Thursday, 11 April. Places are limited so you are advised to book early.

Please allow a 30 minute journey time to both Cambridge and Stansted Airport, and two and a half hours to Heathrow.

Messages and Miscellaneous Lockers are located outside the Conference Centre toilets and are free of charge.

All messages will be posted on the registration desk in the Conference Centre.

A number of toiletry and stationery items are available for purchase at the Conference Centre reception. Cards for our self-service laundry are also available.

Certificate of Attendance A certificate of attendance can be provided. Please request one from the conference organiser based at the registration desk.

Contact numbers Wellcome Genome Campus Conference Centre – 01223 495000 (or Ext. 5000) Wellcome Genome Campus Conference Organiser (Jemma) – 07771 666665

If you have any queries or comments, please do not hesitate to contact a member of staff who will be pleased to help you.

5

Conference Summary

Thursday 11 April 11:30-12:30 Registration with lunch 12:30-12:40 Welcome and Introductions 12:40-13:30 Keynote lecture: George Church 13:30-15:30 Session 1: Personal genetic testing: opportunities and limitations 15:30-16:00 Afternoon Tea 16:00-17:30 Session 2: Interpretation of personal genomes 17:30-18:30 Panel discussion: benefits from testing and sharing personal genome data 18:30-19:30 Poster Session with drink reception 19:30 Dinner

Friday 12 April 09:00-10:30 Session 3: Citizen science and : users, customers and patients 10:30-11:00 Morning Coffee 11:00-12:30 Session 4: Return of data to research participants and personal data access 12:30-14:00 Lunch 14:00-15:30 Session 5: Society challenges: data protection and privacy and the ethics of data sharing 15:30-16:00 Afternoon Tea 16:00-17:00 Session 6: Clinical perspective – from patients to the public 17:00-18:00 Keynote Lecture: Yaniv Erlich 18:00-18:15 Concluding Remarks 18:30 Coaches depart to Cambridge City Centre and Train Station, Heathrow Airport via Stansted Airport

6

Conference Sponsors

www.cpm.onl

7

Personal Genomes: accessing, sharing and interpretation

Wellcome Genome Campus Conference Centre, Hinxton, Cambridge

11-12 April

Lectures to be held in the Francis Crick Auditorium Lunch and dinner to be held in the Hall Restaurant Poster session to be held in the Conference Centre

Spoken presentations - If you are an invited speaker, or your abstract has been selected for a spoken presentation, please give an electronic version of your talk to the AV technician.

Poster presentations – If your abstract has been selected for a poster, please display this in the Conference Centre on arrival.

Conference programme

Thursday, 11 April

11:30-12:30 Registration with lunch

12:30-12:40 Welcome and Introductions Stephan Beck, University College London, UK

12:40-13:30 Keynote Lecture: New Technologies & Sharing Comprehensive Personal Precision-Medicine Data George Church, Harvard University, USA

13:30-15:30 Session 1: Personal genetic testing: opportunities and limitations Chair: Manuel Corpas

13:30 Personal Genomes and beyond for the Indian population Anu Acharya Mapmygenome, India

14:00 Codigo46: Bridging the gap between the promises and realities of in Mexico Lorenza Haddad Codigo46, Mexico

14:30 From Biobanking to Precision Medicine Andres Metspalu Estonian Genome Centre, Estonia

8

15:00 The Personal Genome Project Canada: findings from whole genome sequences of the inaugural cohort Naveed Aziz CGEn, Canada

15:15 Korean Personal Genome project Sungwon Jeon Ulsan National Institute of Science and Technology, South Korea

15:30-16:00 Afternoon Tea

16:00-17:30 Session 2: Interpretation of personal genomes Chair: Mad Price Ball

16:00 Analyzing personal genomes, phenomes and electronic health records at scale Gustavo Glusman Institute of Systems Biology, USA

16:30 Empowering the public to use personal genomic information: a genetic counsellor perspective Nicki Taverner University of Cardiff, UK

17:00 Using personal genomes to calculate and interpret polygenic risk scores in preparation for genomic medicine Cathryn Lewis King's College London, UK

17:15 Leveraging phenome-wide information to improve accuracy and applicability of genetic risk predictions for complex traits Vincent Plagnol GenomicsPlc, UK

17:30-18:30 Panel discussion: benefits from testing and sharing personal genome data Chair: Mahsa Shabani Fiona Nelson Repositive, UK Tom Stubbs Chronomics, UK

18:30 -19:30 Poster Session with Drinks Reception

19:30 Dinner

Friday, 12 April

09:00-10:30 Session 3: Citizen science and personal genomics: users, customers and patients Chair: Stephan Beck

09:00 Harnessing the power of open crowdsourcing for personal genomics Bastian Greshake Tzovaras Open Humans, USA

9

09:30 Open donation of personal genome sequence data from the perspective of a hybrid scientist/citizen scientist Colin Smith University or Brighton, UK

10:00 Genomics Aotearoa, a platform for best practice genomic science in partnership with indigenous people Ben Te Aika Genomics Aoteraoa, New Zealand

10:15 The rise of genetic genealogy as a citizen science Maurice Gleeson International Society of Genetic Genealogy

10:30-11.00 Morning Coffee

11:00-12:30 Session 4: Return of data to research participants and personal data access Chair: Fiona Nielsen

11:00 What is the behavioral impact of personal genomic information, and why does it matter? Saskia Sanderson University College London, UK

11:30 Genomics England - treating data with care Joanne Hackett Genomics England, UK

12:00 Health data sharing and data protection law in Africa: a South African perspective Ciara Staunton Middlesex University, UK

12:15 Evaluating utility of patient-centered deep phenotyping Monica Munoz-Torres Oregon State University, USA

12:30-14:00 Lunch

14.00-15:30 Session 5: Society challenges: data protection and privacy and the ethics of data sharing Chair: Mahsa Shabani

14:00 Personal genomes and the police: public opinion and ethical considerations Christi Guerrini Baylor College of Medicine, USA

14:30 Access, Storage, and Sharing of personal genomic information Pascal Borry University of Leuven, Belgium

10

15:00 Accessing 1M Genomes transnationally across Europe by 2022 Gary Saunders ELIXIR Europe, UK

15:15 Genomics as a personalized medicine approach in disease risk prediction - P5.fi FinHealth Heidi Marjonen National Institute for Health and Welfare, Finland

15:30-16:00 Afternoon Tea

16:00-17:00 Session 6: Clinical perspective – from patients to the public Chair: Johan den Dunnen

16:00 Knowns and unknowns in genomic testing; a clinician’s eye view Frances Elmslie St George's, University of London

16.30 Embedding genomics into routine health care Reecha Sofat University College London, UK

17:00-18:00 Keynote Lecture: Genetic privacy: friend or foe? Yaniv Erlich MyHeritage, Israel

18:00-18:15 Concluding remarks Programme committee

18:30 Coaches depart to Cambridge City Centre and Train Station, Heathrow Airport via Stansted Airport

11

These abstracts should not be cited in bibliographies. Materials contained herein should be treated as personal communication and should be cited as such only with consent of the author.

12

Spoken Presentations

New Technologies & Sharing Comprehensive Personal Precision-Medicine Data

George Church Harvard University, USA

PersonalGenomes.org (PGP) is a unique international cohort (US, UK, Canada, Austria, China) with fully open consent (IRB, LREC, REB ethics approved) for genomes, many other omes, imaging and medical records. PGP has enabled: (1) NIST+FDA Genomeinabottle (GIAB) Diverse trios, (2) High-Quality Reference Genomes (HQRG) of the GIAB samples, hopefully soon filling all sequence gaps, (3) differentiation factor libraries, (4) ENCODE isogenic multiple cell types, (5) in situ barcoding of RNA, DNA and protein at conventional and super-resolution (20 nm). (6) Critical Assessment of Genome Interpretation (CAGI) (7) Re-identification tests, (8) Tests of avoiding identification including Nebula.org homomorphic encryption queries, (9) full individual comprehensive precision medicine datasets. All of these benefit for fully shareable cells and data without restrictions (analogous to Wikipedia). Also we describe new "omic" reading, 3D-imaging and editing technologies developed using these cells.

S1

Notes

S2

Personal Genomes and beyond for the Indian population

Anuradha Acharya1

1Mapmygenome India Limited, Hyderabad, Telangana, India

Indians currently make up ~20% of the global population, a number projected to touch 1.5 billion by the year 2030. However, if we look at global genetic information databases, Indian data accounts for less than 0.2% of the total data. Today, genomic data holds tremendous potential in improving healthcare strategies across various dimensions - be it disease prevention, enhanced diagnosis, optimised treatment or optimal drug development. The efforts of genetic and medical researchers are constantly driven towards utilization of this potential and its translation into actionable information and clinical applications thereof. The two biggest hurdles facing the medical and research community today are the lack of genotype-phenotype correlations for Indians at a population-wide and individual level, and the inefficient translation of genomic information into the decision making process in traditional medical practice. Population-wide sequencing projects for Indian genomes help overcome these hurdles. By creating a centralized database of Indian genomic data (anonymized and de-identified), analytical efforts can be made to identify biomarkers (and their clinically significant ranges) specific to health conditions and traits, via case-control associations and bioinformatics studies. These findings, when integrated with biological data points (electronic health records, medical reports, family history), can be used to tailor an individual’s healthcare plan (personalised), for disease prevention (predictive), and for continuous monitoring and feedback (participatory). The database would be constantly updated with inputs (from wearable devices, health apps and medical concierge services) for active tracking of individual health status, whilst performing trend analysis, making it truly dynamic. Machine-learning (ML)/Artificial Intelligence (AI) integration would enable high-end analytics and automation with the benefit of deep learning algorithms for pattern recognitions and enhanced predictions. The benefits of personal genomics spread across many verticals in the health and wellness industry - nutritional intervention and therapy (nutrigenomics), personalized medicine by drug-response profiling (pharmacogenomics), sports and exercise genomics, reproductive medicine (carrier screening), lifetime disease risk assessment and mitigation, etc - which would be elaborated upon in the form of case studies during the presentation.

S3

Notes

S4

Codigo46: Bridging the gap between the promises and realities of personalized medicine in Mexico

Lorenza Haddad Talancon1 1Codigo 46

Precision medicine is supposed to be the future of healthcare. Genetics is just one component to be able to truly individualize medicine, but the promises of it fall short for two main reasons; we need more research, and for some populations the current tools for precision medicine might not be accurate. Codigo 46’s goals lie in both providing current genetics applications for health in an accessible way as well as to generate knowledge from within those understudied populations in order to make precision medicine a reality for everyone. We are far away from what science fiction depicts, but there are real applications for where science is today. For example, in prescribing certain medication and correct dosage for patients, or helping doctors realize a patient’s risk for a disease and planning prevention strategies or testing. The problem is these have been developed using mostly people of European descent as reference, excluding understudied and underprivileged populations like the Mexican one, and not prioritizing the diseases or risk factors, both genetic and environmental, these populations face. Codigo 46 is already building its offerings based on research for these populations as well as a data base to promote the creation of knowledge for Mexicans.

S5

Notes

S6

From Biobanking to Precision Medicine Andres Metspalu The Estonian Genome Center, Institute of Genomics, University of Tartu, Estonia

The Estonian Biobank was founded in 2000 as a population-based biobank. 19 years later, the biobank includes a collection of health and genetics data of around 156 000 people and by the end of the 2019 it will be increased to 200 000, or approximately 20% of the adult population. All participants of the biobank have donated blood samples for purification of DNA and plasma. The whole cohort of 200 000 will be genotyped with Illumina GSA array (currently 152 000). The Human Genes Research Act (from year 2000) allows regular updating of data through linkage to national registries enabling long-term follow-up of the cohort and to re- contact the gene donors and the changes needed by GDPR were mostly cosmetic. WGS is performed on 2600 and WES on 2500 genomes allowing to use this as population based reference for imputing. In the past few years increasing amount of attention has been placed on translating the results of genetic research to improve public health. A nationwide technical infrastructure (X-road) for the secure electronic exchange of medical data has also been established and is maintained by the state. This allows creating the disease (or life!) trajectories on all gene donors from the birth in the Estonian Biobank, where all contacts with the medical systems incl. ICD-10 diagnoses, prescriptions, lab data and EMR are included. Recently, we have completed the deep sequencing of the (~30X coverage, PCR-Free) whole genomes of 3,000 gene donors and in addition 2500 whole exomes. Using these data, we have demonstrated in the case of familiar hypercholesterolemia that “the genetics first approach” can discover many new FH patients not seen by medical system before and over 50% of cases the treatment was changed. We are conducting several pilot projects in order to work out the best ways to return the heath related research data - genetic risks scores (GRS) back to people in the biobank who are asking for it. This is the instrument of early prediction and prevention of the disease. For that purpose, we have developed the decision support tools for several major diseases like CAD, T2D, breast cancer, pharmacogenomics etc. During the first contact with the genetic counsel and/or medical geneticist the rapport will be explained and if needed further recommendation given. It will be transferred to the medical system in next few years and together with the RITA program on personalized medicine in two largest hospital in Estonia the personalized medicine as 4P medicine (personal, predictive, preventive and participatory) has reached to the point of no return.

S7

Notes

S8

The Personal Genome Project Canada: findings from whole genome sequences of the inaugural cohort

Stephen W. Scherer, Naveed Aziz

CGEn - National Platform for Genome Sequencing & Analysis

Rapid technological advances are enabling a view of human genetic variation in ever- increasing detail and at plummeting costs.1 Until recently, analysis has been targeted largely to defined genes, but pan-genomic approaches, such as microarrays, gene-panel testing and exome sequencing, have become mainstream. Now, can capture all of the genes (about 1% of the whole genome) and most of the rest of the genome in a single experiment, with the potential to recognize all types of genetic variation and thereby usurp the less comprehensive technologies Information from whole genome sequencing can already identify the molecular causes of suspected heritable conditions and cancer; however, we anticipate that genomic analysis will become a standard component of proactive health care, given its potential to identify predisposition to medically actionable conditions, explain uncharacterized disease and reveal carriers for recessive disorders and predictors of medication safety and response. Interpretation of sequence data remains challenging, with unknown clinical utility and predictive value among the general population.

The Personal Genome Project Canada was launched in 2007 and shares the guiding principles and open consent policy of the parent project in the United States. It aims to develop a public data set of fully annotated genomic information, connected with human trait information. It can provide control data for other studies, but it also aims to forecast effects of integrating DNA-derived knowledge into routine clinical practice. The project will evaluate the utility of such information, and how best to gather and apply it within Canada's provincially administered, publicly funded health care system. Participants in this ongoing project are highly motivated to promote genomic research and explicitly forego privacy commitments. We report the data and experiences from whole genome sequencing and medical annotation of genomes of the first 56 participants in the Personal Genome Project Canada.

S9

Notes

S10

Korean Personal Genome Project

Sungwon Jeon1, Asta Blazyte1, Sungwoong Jho2, Jungeun Kim2, Jong Bhak1,2

1KOGIC, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea 2Personal Genomics Institute, Genome Research Foundation, Cheongju 28160, Republic of Korea [email protected]

KPGP or PGP-Korea is the first and longest-lasting Korean genome project. It was initiated by the Korean Bioinformation Center (KOBIC) in 2006 to characterize ethnicity-relevant variome of Koreans. It has three major goals. The first is to provide personal genome data to the public by democratizing genomic information in Korea. The second goal is to build a unique Korean reference genome (KOREF) that has both single genome (KOREF_S) assembly and population consensus assembly (KOREF_C) to sufficiently represent the population as a whole. The third is to develop Korean population variome, KoVariome, which aims to provide as much genomic information as openfreely as possible. Since KPGP published the first Korean genome data in 2009, the number of openfreely accessible complete genomes of KPGP database has reached to 111 personal genomes as of 2018. Moreover, this database was used to construct the first consensus Korean Reference genome standard (KOREF_C) and KoVariome. These genome data were the first of its kind that were generated under standard reference construction protocol as a joint project of National Center for Standard Reference Data of Korea. However, we also collected 2,400 Korean genomes last two years through 10,000 Korean genome project (KU10K). We also conducted Personal Welfare Genome project in Korea which recruited 1,000 healthy Koreans for three years, providing a free health check-up and a genetic counselling to the Ulsan citizen through a private hospital. While the collected genomic data is used to expand KoVariome, the participants are provided with our developed personal genome research report that contains relevant information such as ancestry analysis results and allele information related to certain phenotypes and diseases.

S11

Notes

S12

Analyzing personal genomes, phenomes and electronic health records at scale

Gustavo Glusman Institute for Systems Biology, USA

Soon, millions of individual human genomes with rich phenotype data will be available for analysis, posing a data management challenge and offering significant discovery opportunities. Rich genomic and phenomic knowledge will help improve our understanding of genome structure, function and evolution, and will translate into actionable opportunities for improving health and wellness. We have developed several algorithms and methods for studying and visualizing personal genome data in family, cohort and population context. In particular, our 'genome fingerprinting' method enables ultrafast and private genome comparisons in very large cohorts, and our 'data fingerprinting' method offers fast, semantically and structurally agnostic method for analyzing electronic health records (e.g., in FHIR format). Our locality-sensitive hashing strategies summarize complex data into highly compressed representations which cannot recreate details in the data, yet simplify and greatly accelerate the comparison and clustering of data records by preserving similarity relationships. Applications include detection of duplicates, clustering and classification, which support higher goals including summarizing large and complex data sets, analyzing cohort structure, quality assessment, evaluating methods for generating simulated patient data, and data mining. Beyond genomes and electronic health records, our approach is applicable to any domain in which semi-structured data (e.g., in JSON or XML formats) are commonly used.

S13

Notes

S14

Empowering the public to use personal genomic information: a genetic counsellor perspective

Nicki Taverner Cardiff University and All Wales Medical Genetics Service

The rapid development of genomic testing provides real opportunities for individuals to learn more about their genetic health risks, and this genetic information can be empowering, enabling them to make informed decisions to manage their health. However, interpreting risk information is challenging, and many individuals will need support to understand and use this information. The rapid upscaling of testing means that we are not yet able to understand the implications of much of the genetic variation that we identify: we do not yet fully understand the exome, which is only around 1-1.5% of the genome. Genomic testing is important for us to develop this knowledge but, in the interim, it is important for individuals to understand what personal genomes can and cannot tell them. Genomic screening can also have significant psychosocial implications: examples in the news include individuals finding out about non-paternity or considering termination of pregnancy based on inaccurate information but, more commonly, individuals may be told that they have increased health risks and need support to manage the emotional impact of this. These impacts have significant implications for the NHS in the UK: testing may be carried out privately but individuals then seek the support of the NHS to understand and interpret their results, a significant burden both in terms of bioinformatic interpretation and provision of support. We have also seen examples where results of private testing have not been reproduced within NHS laboratories, highlighting the importance of laboratory standards. Genomic testing holds great promise but, in the medium term, there is a need for education and support to ensure that individuals are able to benefit from the interpretation of their personal genomes.

S15

Notes

S16

Using personal genomes to calculate and interpret polygenic risk scores in preparation for genomic medicine

Cathryn Lewis, Lasse Folkersen

King’s College London, UK Sankt Hans Hospital, Denmark

Interpretation of personal genomes has focused on single variants conferring disease risk, but most disorders of major public concern are polygenic. Polygenic risk scores (PRS) give a single measure of disease liability by summarising disease risk across hundreds of thousands of genetic variants. They can be calculated in any genome-wide genotype data-source, using a prediction model based on genome- wide summary statistics from external studies. As genome-wide association studies increase in power, the predictive ability for disease risk will also increase. While PRS are unlikely ever to be fully diagnostic, they may give valuable medical information for risk stratification, prognosis, or treatment response prediction. Public engagement on the potential use and acceptability of PRS is therefore becoming important. The current public perception of genetics is as a 'Yes/No' test. This model is only true for exceptionally strong effects, such as rare genetic disease or breast cancer mutations - variants that are FDA approved for reporting in consumer genetics. Meanwhile, unregulated third-party apps are being developed to satisfy consumer demand for information on lower risk variants and for common diseases that are highly polygenic. Many apps report results from single SNPs, with little regard to effect size, which is inappropriate for common, complex disorders where everybody carries risk alleles. Consequently, sites such as Promethease and Codegen.eu enable users to highlight (false) genetic causes for any disease: good business, but poor science. Communication tools are therefore needed to aid our understanding of genetic predisposition as a continuous trait, where a genetic liability confers risk for disease. Impute.me is one such a tool, whose focus is on education and explanation of polygenic disorders. Its research-focused open-source website allows users to upload consumer genetics data and obtain pseudo-PRS, with results reported on a population-level normal distribution. Diseases can be browsed by ICD10-chapter- location or alphabetically but are never sorted by worst risk-score. This ensures the user to consider genetic risk scores by medical context, not by risk. Clinical research studies indicate that PRS may already have predictive utility for coronary artery disease and breast cancer, despite problems in their interpretation across ancestry and the incomplete information captured. Personal genomes can play an important role in preparing for implementation of PRS in genomic medicine.

S17

Notes

S18

Leveraging phenome-wide information to improve accuracy and applicability of genetic risk predictions for complex traits

Vincent Plagnol, Eva Kraphol, Peter Sorensen, Chris Spencer, Peter Donnelly

GenomicsPlc

The broadening the availability of population cohorts combining genetic data and health records is bringing into focus the potential value of genetic risk prediction in health care. However, these opportunities also raise a range of concerns, in particular the limited predictive ability of these approaches and their much poorer performance outside populations with European ancestry.

Motivated by these limitations Genomics plc have gathered, curated, and aligned, genome- wide summary statistics from over 10,000 association studies. This allows us simultaneously to develop genetic-based risk predictions across many common human diseases and traits. Additionally, by developing sophisticated statistical methods to leverage cross-trait information we can also considerably improve risk prediction for individual traits. Firstly, joint colocalisation across all 10,000 studies substantially improves fine-mapping resolution, and hence power to identify true causal variants. This not only improves the overall prediction ability, but critically it increases the utility of these predictions to populations outside of the groups in which the association studies were undertaken. It also substantially improves the ability to make predictions from complex regions such as the major histocompatibility complex (MHC), with major implications for autoimmune diseases. Secondly, compared with analyses of individual traits, statistical approaches which model and utilise the correlations between related traits can lead to more accurate estimates of effect sizes, which also improves prediction accuracy. Finally, we show that cross-ethnic association studies are sufficiently consistent across populations to improve effect size estimates and therefore prediction accuracy when jointly analysed with studies from different ancestry groups.

Together, these strategies improve the resolution at established loci and better define a truly polygenic component based on a large number of loci with effect sizes that are individually small, but collectively meaningful. We illustrate the value of this phenome-wide approach to individual risk prediction on a range of traits, including coronary artery disease and immune related traits.

S19

Notes

S20

Harnessing the power of open crowdsourcing for personal genomics Bastian Greshake Tzovaras1,2, Philipp E. Bayer3, Helge Rausch 1 Lawrence Berkeley National Laboratory, Berkeley, USA 2 Open Humans Foundation, USA 3 The University of Western Australia, Crawley, Australia Direct-To-Consumer (DTC) genetic testing is rapidly gaining traction, with an exponential growth and well over 10 million people already having been genotyped. This enormous collection of genetic data has the potential to enable wide-reaching genetic research, especially for researchers who lack funding to genotype large cohorts themselves. The potential use of these data is demonstrated by the over 100 publications that 23andMe alone has published using data from their customers. The full potential of these data is hard to turn into action though, as it usually stays in the hands of the DTC testing companies, with only limited access to it for individual researchers. We built openSNP, to enable a wider access to DTC genotyping data. OpenSNP is a crowdsourced repository for genotyping data through which individuals who participated in DTC genotyping can deposit their own genotypes into the public domain – along with their own, crowdsourced, phenotypic annotation. As all the data are in the public domain, no access or data usage restrictions apply to them, making them well-suited for re-using the data in various research contexts. Since its launch in late 2011, nearly 5,000 personal genotype and exome data sets have been donated into the public domain through openSNP, making it one of the world’s largest repositories of its kind. As a community-driven project, openSNP is fully open source and funded through crowdfunding by the community. Data provided through openSNP has been used in community-driven as well as academic projects. For example, the data has been used in a CrowdAI machine learning competition, evaluating the best methods to predict height from genotyping data. Other examples include psychological studies that associate genotype-associations with exploration/exploitation behaviour and research into genomic privacy and re-identification. Together, these examples highlight how a crowdsourced open data platform such as openSNP can facilitate new kinds of research and fuel scientific progress.

S21

Notes

S22

Open donation of personal genome sequence data from the perspective of a hybrid scientist/citizen scientist

Colin P. Smith School of Pharmacy and Biomolecular Sciences, University of Brighton, Brighton, BN2 4GJ, UK

I was fortunate to get the opportunity to have my whole genome sequenced in 2013 as a participant in one of Illumina’s Understand Your Genome symposia. Since then I have been keen to investigate the genome sequence in more detail and to exploit it as the subject for outreach activities and debates on personal data sharing. My introduction to personal genetic testing was quite extreme as my father had inherited an autosomal dominant mutation for an untreatable fatal condition and I went through the lengthy and unsettling process of being tested for the mutation. By contrast, having a full genome sequence report for (principally) single nucleotide variants was relatively stress free! I am a strong advocate for open sharing of personal genome sequence data and became an ambassador for the Personal Genome Project UK (PGP-UK), led by Stephan Beck at University College London. In this role I participated in the development of ‘GenoME’, an educational personal genomics iPad app designed for the general public. I had the opportunity to enrol in the PGP-UK programme in 2015 as the first ‘donor’ of a whole genome sequence, shortly after they had received ethical approval to receive donations of whole sequences under open consent, from individuals who had had their genomes sequenced independently of the PGP family of organisations. Participation with the PGP-UK team has been very rewarding, providing an approachable, engaging and interactive opportunity for updating interpretation of sequence variants and for engaging in additional genomic studies, such as the epigenomic analysis and reporting that is also conducted by PGP-UK.

S23

Notes

S24

Genomics Aotearoa, a platform for best practice genomic science in partnership with indigenous people.

Ben Te Aika, Prof Peter Dearden

University of Otago, University of Auckland

New Zealand has a unique and diverse bio-heritage landscape. The indigenous people (Māori) of New Zealand have an extensive and closely knit relationship with this landscape and are positioned centrally within it. Unfortunately, the role of Māori within genomic, as well as other, science, has been largely limited to that of research subject. This research subject role is at odds with the Treaty-based relationship that exists between the New Zealand Government and Māori. This obligates the government in a number of ways including; providing partnership in decision making, the active protection of Māori culture, and roles and engagement in how the country is governed and administered. International human rights agreements, such as the UN Covenant on Economic, Social and Cultural Rights and the UN Convention on Biological Diversity, also set out significant responsibilities for Māori and the Crown. The Government inquiry into its obligations to protect Māori interests in fauna and flora, cultural and intellectual property has identified significant issues and suggested outcomes. These obligations shape what is required and what is possible in New Zealand genomic science. Genomics Aotearoa has recently been established as a nationally significant data repository and research platform. It brings high-quality research and the role of Māori forward into a 21st century practice, where western science and indigenous people come together in an inclusive, proactive and mutually enriching approach to engagement, data management and genomic research. Genomics Aotearoa seeks to improve research ethical and professional practices by adopted new research principles, guidelines and practices relating to its indigenous peoples. Data storage is managed within a Māori-values framework underpinned by new research and data-storage guidelines. Genomics Aotearoa is embedded with Māori personnel, expert in their relevant fields and researchers work closely with the Māori community with the goal of research being far more relevant to the Māori people and New Zealand as a whole. Programs include increasing the involvement of Māori in the science through education. 12 Months after establishment Genomics Aotearoa is experiencing increasing inquiries from Māori about ethical storage of data. Māori organisations and businesses are expressing interest in knowing the benefits of the science and partnership opportunities. Researchers also, are becoming increasingly interested in knowing more about Māori culture and best practices for engaging and consulting groups outside of established western norms. International opportunities are emerging for researchers with increased diverse engagement skills.

S25

Notes

S26

The Rise of Genetic Genealogy as a Citizen Science

Maurice Gleeson Education Ambassador, International Society of Genetic Genealogy

Ever since commercial DTC (direct to consumer) DNA tests became available in 2003, people have been using these tests to run their own DNA projects. The first tests to become available were Y-DNA and mitochondrial DNA tests. The Y-DNA tests were STR-based and started with only 12 STR markers but quickly evolved to 111 markers. As the Y chromosome and inherited surnames are passed down along the same direct male line, the Y-DNA test lent itself to the study of surnames and many surname studies emerged. The company FamilyTreeDNA (FTDNA) created an infrastructure to allow ordinary people to run their own projects, and there are currently more than 10,000 such studies hosted on their website. In addition haplogroup projects emerged which rapidly took over from the academic researchers in helping to build the human Y-Haplotree (the Tree of Mankind). This research was further boosted by the introduction of Y-SNP testing using Next Generation Sequencing. To date more than 100,000 SNPs have been discovered on the Y-chromosome. Similar research is ongoing with mitochondrial DNA and the construction of the Tree of Womankind (the mitochondrial Haplotree). Geographic projects helped characterize the distribution of both Y-DNA and mitochondrial DNA signatures in particular geographic locations. Some studies have used Y-DNA to explore the accuracy (or otherwise) of the Ancient Annals of Ireland and the ancient genealogies of Scotland. The introduction of autosomal DNA tests in 2007 by 23andMe added a new segment to the customer base that was primarily interested in the health-related aspects of DNA testing. Few of these people were interested in genealogy. Subsequently, the largest of the commercial companies (Ancestry) launched their autosomal DNA test (a SNP-based microarray test assessing some 700,000 SNPs). Additional “specialist groups” have emerged since then including those primarily interested in adoptee research, and more recently the rise of forensic genetic genealogy and the ability to use the databases to discover the identity of unknown persons. This latter development may evolve further as it has potential applications in mass grave situations such as the mass grave at Fromelles which contains the remains of 250 WWI soldiers and the mass grave at Tuam, Co. Galway where 800 children are believed to have been interred. This presentation explores how the advent of DTC DNA testing has turned ordinary citizens into Citizen Scientists.

S27

Notes

S28

What is the behavioural impact of personal genomic information, and why does it matter?

Saskia C Sanderson UCL

Research participants, as well as patients and consumers, arguably have the right to access their own personal genomic data and to receive health-related information arising from the data, including that about complex diseases. But what does this really mean, if anything, for how complex diseases such as cancer and heart disease might be prevented, detected and/or managed? One answer to this important question is that identifying people in the general population who are at increased disease risk early on in their lives will lead to much- needed improvements in primary prevention by empowering patients and their doctors to take action to reduce their risk. Another answer is that using personal genomic information alongside other ‘omic information about a person’s condition may usefully inform clinical recommendations in secondary care. Whether these hopes are borne out will rely to a large part on human behaviour: using genomics to identify “high risk” people or to inform management strategies will only lead to improved disease prevention and treatment if the patients and/or their clinicians believe this information is worth acting on, and if they are subsequently able to make the necessary behavioural changes to reduce their risk. These behaviours include medication initiation and adherence, screening, other tests and procedures, as well as diet, exercise and other lifestyles. This talk covers what is currently known about the impact of personal genomic information on people’s behaviours based on existing empirical research (the ‘known knowns’) and what is likely to be learned over the next few years (the ‘known unknowns’). This includes the behavioural impact of personal genomic information based on (1) a single or a few common DNA variants with weak effects on disease risk, (2) a single or a few rare DNA variants with strong or moderately strong effects on disease risk, and (3) hundreds or thousands of common DNA variants that, together with other ‘omics and health information, may have strong or moderately strong effects on disease risk. The talk also explores how psychological and behavioural theories can be applied to suggest possible explanations for past research findings, and pose testable hypotheses for future studies. Given that many of the potential benefits (and potential harms) of providing people with access to their personal genomic data, are psychological and behavioural in nature, the talk concludes with discussion of the need for truly interdisciplinary, collaborative and large-scale research on these questions going forwards.

S29

Notes

S30

Genomics England - treating data with care Joanne M. Hackett Genomics England Genomics England has always placed the participant at the heart of everything they do. This includes working closely with the Participants Panel to help them understand the journey from whole genome sequencing to the return of results. Every step must be treated with care, but the data also needs to be accessed. Creating a Research Environment that is ‘read only’ is one way to ensure the genomic and clinical data is safely stored yet readily accessible to qualified academics, clinicians and industry.

S31

Notes

S32

Health Data sharing and Data Protection Law in Africa: A South African perspective

Ciara Staunton, Nóra Ni Loideain, Jantina de Vries

School of Law, Middlesex University London & Centre for Biomedicine, EURAC Italy

In recent years there has been exponential growth in genomic research in Africa. Fuelled by initiatives such as MalariaGEN, H3Africa and B3Africa, this has led to a dramatic increase in genomic data sharing between African countries and other international collaborators. South Africa has been a key player in many of these initiatives, and the importance of genomic research, bioinformatics and open science have been identified as key drivers in improving innovation and health outcomes by many government reports. These policymaking developments include the National Development Plan, the Bioeconomy Report, the Draft White Paper on Science, Technology and Innovation, and the Academy of Science of South Africa’s consensus study on Human Genetics and Genomics in South Africa: Ethical, Legal and Social Implications. The current governance of genomic data sharing in South Africa involves navigating a complex patchwork of laws comprising of the Constitution, various legislation, regulations, and guidelines. The Protection of Personal Information Act [No.4 of 2013] (POPIA) is the first comprehensive data protection regulation to be passed in South Africa and seeks to give effect to the constitutional right to privacy although it is not due to come into force until 2020. POPIA draws largely from an early draft version of the EU General Data Protection Regulation (GDPR) (enacted in 2016 with entry into force in 2018). Critically, however, unlike the GDPR, POPIA lacks any special provisions for research. Rather than clarify the regulation of genomic data, it introduces certain ambiguities. To be an effective and key international player in genomic research, a coordinated response between government, industry and academia within South Africa - that recognises international legal norms and best practice - is essential. In February 2019, a two-day workshop was convened in Cape Town, South Africa to discuss the governance of genomic data in South Africa. Approximately 30 legal, ethical and scientific experts were in attendance and were drawn from academia, industry and government. This interdisciplinary group identified challenges and gaps in the currently regulatory framework and pertinent issues that must be addressed, notably around broad consent, institutional oversight, compliance and enforcement, and alignment with international standards. A Position Paper from this workshop addresses the challenges and issues identified in this workshop, as well as the recommendations outlined in the position paper to ensure that regulations in South Africa foster the sharing of genomic data.

S33

Notes

S34

Evaluating utility of patient-centered deep phenotyping

Monica Munoz-Torres, Melissa Haendel (1, 2), Nicole A. Vasilevsky (2), Julie A. McMurry (1), Chris Mungall (3), Catherine Brownstein (4), Ingrid Holm (4), Kent Shefchek (1), Sebastian Köhler (5), Peter N. Robinson (6).

(1) Oregon State University, Corvallis, OR, 97331, USA; (2) Oregon Health and Science University, Portland, OR, 97239, USA; (3) Lawrence Berkeley National Labortatory, Berkeley, CA, 94720, USA; (4) Boston Children's Hospital, Boston, MA 02115, USA; (5) Charité University Hospital, 10117 Berlin, Germany; (6) The Jackson Laboratory for Genomics Medicine, Farmington, CT 06032, USA.

The Human Phenotype Ontology (HPO) is the de facto terminology for clinical 'deep phenotyping' in humans. The HPO enables non-exact matching of sets of phenotypic features (phenotype profile) against known diseases, other patients, and model organisms and is a flagship of the Monarch Initiative. Algorithms based on HPO have been implemented into many diagnostic and variant prioritization tools and are used by the 100,000 Genomes project, the NIH Undiagnosed Diseases Program/Network, and thousands of other clinics, labs, tools, and databases worldwide.

Patients are an eager and untapped source of accurate information about phenotypes - some of which may go unnoticed by the clinician. However, medical terminology is often perplexing to patients, making it difficult to use resources like HPO. To support use of HPO by patients, we created a 'layperson' translation of HPO. To determine the feasibility of using lay-HPO phenotyping in diagnostic tools, we evaluated lay-HPO in recalling correct diseases in phenotype comparison algorithms. We created synthetic profiles ('slim annotations') for each disease and compared them against the gold standard curated set. We also permuted these profiles by adding or removing annotations at random to determine how robust the lay annotation profiles might be in the face of missing or noisy data from patients. We then measured the semantic similarity between HPO gold standard annotations and the derived profiles (with and without noise added). 57% of profiles scored 80% similarity or higher, and 75% of profiles scored 70% similarity or higher.

Preliminary analyses suggest that lay-HPO has the features required to be useful in a diagnostic setting, in that lay terms are: a) sufficiently specific and b) well-represented in our disease-to-phenotype database that is utilized by the aforementioned tools for differential diagnostics. This patient-friendly version of HPO uses the same infrastructure as the primary HPO, so patient-generated phenotyping data can readily be combined with clinical phenotyping data to improve variant prioritization.

New patient-centered tools are being developed to help patients assist clinicians in creating robust computational phenotype profiles; improving these profiles can empower patients to be active participants in their diagnostic odysseys, potentially improving the accuracy and speed of their diagnosis. Finally, such tools can also assist in creation of patient generated phenotypic profiles for sharing in patient registries, forums, and on the Web for cohort definitions and community formation.

S35

Notes

S36

Personal genomes and the police: public opinion and ethical considerations

Christi J. Guerrini, JD, MPH1 Amy L. McGuire, JD, PhD1 Jill O. Robinson, MA1 Devan Petersen, MPH1

1 Baylor College of Medicine, Center for Medical Ethics and Health Policy, Houston, Texas USA

On April 24, 2018, a suspect in California’s notorious Golden State Killer cases was arrested after decades of hiding in plain sight. Using a novel forensic approach, investigators identified the suspect—who was wanted for murdering at least a dozen individuals and raping at least 50 women—by first identifying his relatives using a free, online genetic database populated by individuals researching their family trees. The technique has since been used by U.S. law enforcement to identify dozens of other criminal suspects. Yet, concerns that police use of genetic genealogy databases might violate the privacy rights or expectations of their contributors (and family members) are persistent. Public opinion is a critical but thus far underdeveloped input to policy discussions regarding whether the police should be permitted to use genetic genealogy databases to generate investigative leads. To fill this gap, we conducted a survey of 1,587 individuals in May 2018 to assess their perspectives on forensic use of genetic genealogy databases. This presentation will report the survey’s findings of strong support of police access when the purpose is to identify violent criminals that was not predicted by age, race, ethnicity, annual household income, criminal experiences, or law enforcement employment. These findings will be discussed in the context of recent legal and ethical scholarship at the intersection of privacy and public safety. Finally, implications of these findings for policymakers will be suggested.

S37

Notes

S38

Access, Storage, and Sharing of personal genomic information Pascal Borry

Centre for Biomedical Ethics and Law, Department of Public Health and Primacy Care, KU Leuven, Belgium

The increasing availability of genomic information, within and outside the context of the traditional healthcare system, provides new opportunities for individuals to engage with this information. Individuals are now able to have their own genetic data interpreted by all kinds of third-party interpretation services, outside of a clinical context. Healthcare professionals will increasingly being challenged by requests from individuals to help interpret genetic information that was obtained outside a traditional context. The emerging possibilities for obtaining and storing genomic information and making it available to individuals, raise novel challenges with regard to the data privacy, storage and processing. In particular, processing genomic data may raise informational risks for the data subjects, their family members or specific groups. Previous studies have demonstrated that individuals are willing to share their personal data for research purposes. Data sharing by patients will be beneficial for both clinical and research purposes, and optimize the use of massive raw data that are generated. Nevertheless the ethical and legal grounds of emerging (for-profit and non-profit) web platforms for personal data sharing should be a subject for ongoing scrutiny,

S39

Notes

S40

Accessing 1M Genomes transnationally across Europe by 2022

Gary Saunders, Thomas Keane, Jordi Rambla, Ilkka Lappalainen, and Serena Scollen

ELIXIR

Over the last forty years, we have seen the emergence of large cohorts of human samples from research and national healthcare initiatives. Many countries in Europe now have nascent personalised medicine programmes meaning that human genomics is undergoing a step change from being a predominantly research-driven activity to one funded through healthcare. This is evidenced by the recent Declaration of 19 European countries to sequence and share transnationally at least 1M human genomes by 2022. This initiative will catalyse the transition of genomics from the bench to bedside in Europe. We envisage that a significant subset of these data will be made available for secondary research. However genetic data generated through healthcare is not likely to be shared as widely as research data. Healthcare is subject to national laws, and it is often unacceptable for health data from one country to be exported outside regional or national jurisdictions. Our vision for the ELIXIR Federated Human Data Community is to create a federated ecosystem of interoperable services that enables population scale genomic and biomolecular data to be accessible across international borders accelerating research and improving the health of individuals resident across Europe.

In this presentation we shall describe our work within the ELIXIR Federated Human Data Community which coordinates the delivery of FAIR compliant metadata standards, interfaces, and reference implementation to support the federated ELIXIR network of human data resources. The overall goal is to provide secure, standardized, documented and interoperable services under the framework of the European Genome-phenome Archive (EGA). We will describe our structured roadmap for the ELIXIR Nodes to join the EGA federated network by providing the necessary technical, logistical, and training coordination across the network.

This project builds on earlier work in the ELIXIR-EXCELERATE, CORBEL, and Tryggve projects. It will be led by the European Genome-phenome Archive (EGA) to ensure work described in this proposal is aligned with the policies, legal agreements, and governance model for establishing the Federated EGA. We will also describe how this work builds on work in EXCELERATE to create a reference software implementation, the Local EGA, that Nodes can use to operate their federated node for the secure archival and for providing access to sensitive human research data. The result will be a coordinated bioinformatics infrastructure across Europe that enables the transnational access for approved researchers to 1M genomes by 2022.

S41

Notes

S42

Genomics as a personalized medicine approach in disease risk prediction - P5.fi FinHealth

Heidi Marjonen, Minttu Marttila1, Teemu Paajanen1, Niko Kallio1, Ari Haukkala2, Helena Kääriäinen1, Kati Kristiansson1, Markus Perola1,3

1National Institute for Health and Welfare, Helsinki, Finland 2Faculty of Social Sciences, University of Helsinki, Finland 3Research Program for Clinical and Molecular Metabolism, Faculty of Medicine, University of Helsinki, Finland

In P5.fi we utilize polygenic risk scores to provide personalized information on the individual disease risk related to three common diseases (coronary heart disease, type 2 diabetes and venous thromboembolism) for 3.400 volunteering participants. We hypothesize that genetic risk information would improve prevention, diagnosis and treatment.

We validated the polygenic risk scores in whole genome genotyped population based FINRISK cohorts (N=20.000) using Cox regression models. Follow up data from national health care registers allowed us to model the impact of genetic and traditional risk factors such as smoking, cholesterol, blood pressure and BMI on a person's risk of disease within the next 10 years.

We observed that type 2 diabetes (T2DM) PRS significantly associates with the T2DM disease risk (HR:1.5 per 1 sd PRS, p-value:<2*10-16). Also the top 8% of the FINRISK population who had inherited the highest PRS had fourfold increased risk for T2DM. Moreover, people with >35 BMI and the highest PRS tend to get diabetes at younger age.

By combining the systemic genetic analyses with more traditional disease risk factors in the FINRISK cohort, we produced estimates on the impact of PRS and selected covariates on risk of T2DM. We use these estimates to assess the future risk of T2DM in P5.fi FinHealth participants who will receive this disease risk information including genetic risk via a web portal. Our approach enables to identify the individuals within highest genetic risk and those with pre-disease symptoms. We will monitor the reception of the information by questionnaires and follow the participants for disease end points using registry data.

S43

Notes

S44

Knowns and unknowns in genomic testing; a clinician’s eye view.

Dr Frances Elmslie Consultant Clinical Geneticist, President, UK Clinical Genetics Society, South West Thames Regional Genetics Service, St George’s, University of London

Clinical Genetics Services have traditionally focussed on the diagnosis and management of rare diseases. With increasing reliance on whole genome and whole exome sequencing, genetics clinicians have become more closely integrated with scientific staff to aid in the interpretation of genomic data. Even in patients with a clear phenotype, correct interpretation of variants can present significant challenges.

The availability of direct to consumer genomic testing has led to an influx of enquiries from primary care or even from the consumers themselves into genetics services, requesting help in the interpretation of genomic data. The agreed approach has been to prioritise those in whom a clearly pathogenic variant in a known disease-causing gene has been identified through these analyses, but as a result there will be individuals concerned about their genomic information who are unable to access support. Do publicly funded genetics services have a duty of care to these individuals? If so, how do services need to change?

In my talk, I will present a number of case studies that illustrate the challenges and benefits of access to the new genomic technologies, and will consider how these can be safely integrated into routine clinical care.

S45

Notes

S46

Embedding genomics into routine health care

Reecha Sofat University College London, UK

Rapid advances have been made in platform –omics technologies including but not limited to genomics, proteomics and transcriptomics. We are used to dealing with these data in an research environment and the breadth of utility is being demonstrated for example by large national biobanks. However what utility these technologies, in particular genomics will have in routine clinical care remains unexplored and untested. Embedding some of these practices into routine care may begin to yield answers to this. Moreover, how this complex data is handled within routine clinical care environments, how they are stored, accessed repeatedly through an individual’s life course to inform health remains unknown. AboutMe is an institutional initiative at University College London and University College London Hospital Foundation Trust which is beginning to answer these questions.

S47

Notes

S48

Genetic privacy: friend or foe?

Yaniv Erlich MyHeritage, Israel

We generate genetic information for research, clinical care and personal curiosity at exponential rates. Sharing these genetic datasets is vital for accelerating the pace of biomedical discoveries and for fully realizing the promises of the genetic revolution. However, one of the key issues of broad dissemination of genetic data is finding an adequate balance that ensures data privacy. I will present several strategies to breach genetic privacy using open internet tools, including a systematic analysis of the strategy that implicated the Golden State Killer. Our analyses show that these strategies can identify major parts of the US population from their allegedly anonymous genetic information by anyone in the world. I will conclude my talk with practical suggestions to reconcile genetic privacy with the need to share genetic information.

S49

Profiting from sharing personal genomic data: A review of ethical concerns.

Eman Ahmed, Mahsa Shabani

Center for Biomedical Ethics and Law, Department of Public Health and Primary Care, University of Leuven, Leuven, Belgium.

In the recent years, some Direct-to-Consumer (DTC) genetic testing companies have developed partnerships with third parties, such as pharmaceutical and biotech companies, who are interested to have access to genomic data for medical research and drug development purposes. Although the customers are mainly supporting research activities of the DTC companies, for-profit nature of such data sharing raises some questions regarding the rights of the data subjects and fairness in sharing benefits. In response, a new generation of sequencing and data sharing companies such as Nebula Genomics are emerging which aim for leaving the ownership and data control in the hands of each individual customer. In particular, such business model allows individuals to receive various types of monetary incentives to sequence their genome and share it with interested commercial parties. Offering direct incentives to individuals for genomic data sharing may seem beneficial, however, this needs to be in line with the overarching principles of biomedical research and personal data protection. The pressing question here is how far existing guidelines and policies regarding incentives in biomedical research should apply to such data sharing by individuals for research purposes in exchange for free sequencing or tokens? Also, the implications for withdrawal of consent and privacy rights of the individuals after remuneration remain to be investigated. Moreover, the impact of such data sharing on conventional ways of genomic data collection and sharing in biomedical research should be scrutinized. In this paper, we offer a critical review of the associated ethical concerns that may arise from for-profit genomic data sharing by the individuals and provide some points- to-consider for future policy developments.

P1

Identification of a splice site mutation in DNAI1 gene in Multiplex Kuwaiti family with severe chronic respiratory symptoms of PCD with situs solitus.

Al-Mutairi DA [1],, Alsabah BH [2], Alkhaledi B[3], Pennekamp P [4], Omran H [4]

1 Department of Pathology, Faculty of Medicine, Health Sciences Center, Kuwait University, Safat, Kuwait. 2 Zain Hospital for Ear, Nose and Throat, Shuwaikh, Kuwait City, Kuwait. 3 Pediatric Pulmonary Unit, Al-Sabah Hospital, Kuwait.4 Hospital Muenster, Muenster, Germany

Introduction: Primary ciliary dyskinesia (PCD) is one of the congenital thoracic disorders caused by dysfunction of motile cilia resulting in insufficient mucociliary clearance of the lungs. Approximately 50% of all PCD patients have Kartagener syndrome, a triad of bronchiectasis, sinusitis and situs inversus totalis. The overall aim of this study is to identify causative mutated genes for PCD and CHD in the Kuwaiti population. Methods: A cohort of multiple consanguineous PCD families was ascertained from Kuwaiti patients and genomic DNA from the family members was isolated using standard procedures. The DNA samples from all affected individuals were analyzed using whole Exome Sequencing technology and Sanger sequencing method. Transmission electron microscopy (TEM) and Immunofluorescence staining (IF) was performed for patient samples obtained by nasal brushings in order to identify the structural abnormalities within ciliated cells. Here we present one multiplex family from our cohort that has a splice site mutation in DNAI1 gene. Results: Whole Exome sequencing show a homozygous splice site mutation in DNAI1 gene (c.1311+2T>A) in Intron 13 that shared between the two affected sibling. Sanger sequencing was performed for the patients and the parents and the results confirming the patients carry a homozygous mutation and the parents are both carrier for the same mutation. In addition, TEM for the patients show lacking of Outer Dynein Arms (ODAs). IF staining shows a complete absence of DNAI1 protein. The expression of the other ciliary proteins such as (GAS8, DNAH11 and RSPH9) were also tested by IF and found to be normally expressed in this family. Conclusions: Splice site mutation in DNAI1 gene can cause severe symptoms of PCD without affecting left/right body asymmetry as the patients have normal positions of the internal organs that known as situs solitus. This study helped the PCD-families to get confirmed diagnosis of PCD firstly by determining the defects in the cilia ultrastructure using (IF and TEM) and then by mapping the disease mutations. Genetic screening is confirming the type of ciliary defect for each family understudy.

P2

A Clonal Expression Biomarker Improves Prognostic Accuracy: TRACERx Lung Dhruva Biswas*, Nicolai J Birkbak*#, Rachel Rosenthal, Crispin T. Hiley, Emilia L. Lim, Krisztian Papp, Marcin Krzystanek, Dijana Djureinovic, Yin Wu, David A. Moore, Marcin Skrypski, Christopher Abbosh, Maise Al Bakir, Thomas BK Watkins, Selvaraju Veeriah, Gareth A. Wilson, Mariam Jamal-Hanjani, Arul M. Chinnaiyan, Patrick Micke, Jiri Bartek, Istvan Csabai, Zoltan Szallasi, Javier Herrero, Nicholas McGranahan#, and Charles Swanton#, on behalf of the TRACERx consortium. At the point of cancer diagnosis, molecular biomarkers aim to stratify patients into precise disease subtypes predictive of outcome independent of standard clinical parameters such as tumour stage. Although prognostic gene expression signatures have been derived for many cancer types, seldom have they been shown to improve therapeutic decision making, limiting their clinical use. While intra-tumour transcriptomic heterogeneity (RNA-ITH) has been shown to bias existing biomarkers, efforts to control for this biological parameter have not been considered in biomarker development. Here, we analyse multi-region RNA-seq and whole-exome data for 156 tumour regions from 48 TRACERx patients to explore RNA-ITH in NSCLC. We show that chromosomal instability is a major driver of RNA-ITH, through the generation of heterogeneous copy number events within tumours, and that existing prognostic gene expression signatures are vulnerable to sampling bias. To address this issue, we develop the Outcome Risk Associated Clonal Lung Expression (ORACLE) assay, comprised of genes expressed homogeneously within individual tumours but heterogeneously between patients. These genes are enriched in modules associated with cell proliferation, such as mitosis and nucleosome assembly, that are often selected for through copy number gain events occurring early in tumour evolution. Our approach to identify “clonal” transcriptomic biomarkers in NSCLC overcomes tumour sampling bias, improves survival risk forecasting over current clinicopathological risk factors, and may be generalised to other cancer types, whilst revealing the early evolutionary selection of high risk DNA copy number events driving poor clinical outcome.

P3

A highly admixed Kazakh personal genome of a complex history in Central Asia and the need for a set of representative ethnic and national genomes.

Madina Seidualy, Madina Seidualy1, Asta Blazyte1,2, Sungwon Jeon1,2, Youngjune Bhak1,2, Yeonsu Jeon1,2, Jungeun Kim3, Anders Eriksson4, Semin Lee1,2, Jong Bhak1,2,3,5

1) Korean Genomics Industrialization and Commercialization Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea 2) Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea 3) Personal Genomics Institute, Genome Research Foundation, Cheongju 28160, Republic of Korea 4) Department of Medical and Molecular Genetics, King’s College London, London SE1 9RT, United Kingdom 5) Clinomics LTD, UNIST, Ulsan 44919, Republic of Korea

As a part of Pan Asia Population Genomics Initiative (PAPGI), many personal genomes have been sequenced and compared. Until now there are multiple ethnic genomes analyzed and published by PAPGI including the Chinese, Japanese, Koreans, Indians (Gujarati), Pakistani (Pathan), Egyptians, and Malaysians. Central Asian genomes such as Kazakh can provide highly admixed genome data for associating the genetic variants to phenotypic traits and diseases. We have sequenced a Kazakh genome, which has a clear intertribal admixture history. Analyses of this personal genome accomplished a highly heterozygous individual's genomic composition reconstruction, which was supported by historical events. We further carried out a further present-day Kazakh genome comparison with various modern and ancient genomes to evaluate the impact of the ancient and recent admixtures. As a result, we confirmed the expected heterozygosity, which proved to be high and consisting of variants attributed to different continental groups. Heterozygosity was also observed in the phenotypic trait, disease and pharmacogenomic profile determining variants. We identified over 4 million SNPs, including 102,240 novel and 627 common functionally- damaging variants. Phylogenetic analysis revealed the surrounding Central Asian populations such as Kalmyk and Kyrgyz as genetically closest, however, a considerable similarity to East Asians; Xibe, Korean, and Japanese suggested a complex admixture within the continent of Asia. Overall, the biggest proportions of shared variants point towards fairly recent admixtures traceable to the 16th -20th century. As a discussion point to various personal genome projects across the world, researchers must consider how accurately they can map the origins or ancestors of admixed samples, which is very difficult. To overcome this problem, the construction of numerous ethnically and nationally representative genomes utilized as anchors will enable us to efficiently dissect admixed personal genetic heritage.

P4

Elucidation of the phenotypic spectrum and genetic landscape in primary and secondary microcephaly

P Boonsawat, Paranchai Boonsawat1, Pascal Joset1, Katharina Steindl1, Beatrice Oneda1, Laura Gogoll1, Silvia Azzarello-Burri1, Frenny Sheth2, Chaitanya Datar3, Ishwar C. Verma4, Ratna Dua Puri4, Marcella Zollino5, Ruxandra Bachmann-Gagescu1, Dunja Niedrist1, Michael Papik1, Joana Figueiro-Silva1, Rahim Masood1, Markus Zweier1, Dennis Kraemer1, Sharyn Lincoln6, Lance Rodan6,7, Undiagnosed Diseases Network, Sandrine Passemard8,9, Séverine Drunat9, Alain Verloes9, Anselm H.C. Horn10, Heinrich Sticht10, Robert Steinfeld11, Barbara Plecko11, 12, Bea Latal13, Oskar Jenni13, Reza Asadollahi1, Anita Rauch1,14,15

1Institute of Medical Genetics, University of Zurich, Schlieren-Zurich, Switzerland 2FRIGE's Institute of Human Genetics, FRIGE House, Satellite, Ahmedabad, India 3Sahyadri Medical Genetics and Tissue Engineering Facility, Kothrud, Pune and Bharati Hospital and Research Center Dhankawadi, Pune, India 4Institute of Medical Genetics & Genomics, Sir Ganga Ram Hospital, Rajinder Nagar, New Delhi, India 5Institute of Genomic Medicine, Catholic University, Gemelli Hospital Foundation, Rome, Italy 6Division of Genetics and Genomics, Department of Pediatrics, Boston Children’s Hospital, Boston, Massachusetts, USA 7Department of Neurology, Boston Children’s Hospital, Boston, Massachusetts, USA 8Service de Neuropédiatrie, Hôpital Universitaire Robert Debré, APHP, Paris, France 9Département de Génétique, Hôpital Universitaire Robert Debré, APHP, Paris, France 10Division of Bioinformatics, Institute of Biochemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany 11Division of Pediatric Neurology, University Children’s Hospital Zurich, Zurich, Switzerland 12Department of Pediatrics and Adolescent Medicine, Division of General Pediatrics, Medical University of Graz, Austria 13Child Development Center, University Children’s Hospital Zurich, Zurich, Switzerland 14Neuroscience Center Zurich, University of Zurich, Zurich, Switzerland 15Zurich Center of Integrative Human Physiology, University of Zurich, Zurich, Switzerland

Introduction: Microcephaly is a sign of many genetic conditions but has been rarely systematically evaluated. We therefore comprehensively studied the clinical and genetic landscape of an unselected cohort of patients with microcephaly. Materials and Methods: We performed clinical assessment, high-resolution chromosomal microarray analysis, exome sequencing and functional studies in 62 patients (58% with primary microcephaly (PM), 27% with secondary microcephaly (SM), and 15% of unknown onset). Results: We found severity of developmental delay/intellectual disability correlating with severity of microcephaly in PM, but not SM. We detected causative variants in 48.4% of patients and found divergent inheritance and variant pattern for PM (mainly recessive and likely gene-disrupting (LGD)) versus SM (all dominant de novo and evenly LGD or missense). While centrosome-related pathways were solely identified in PM, transcriptional regulation was the most frequently affected pathway in both SM and PM. Unexpectedly, we found causative variants in different mitochondria-related genes accounting for ~5% of patients, which emphasizes their role even in syndromic PM. Additionally, we delineated novel candidate genes involved in centrosome-related pathway (SPAG5, TEDC1), Wnt signaling (VPS26A, ZNRF3) and RNA trafficking (DDX1). Conclusions: Our findings enable improved evaluation and genetic counseling of PM and SM patients and further elucidate microcephaly pathways.

P5

Ethical dilemmas and data sharing in genetic genealogy

John Cleary Associate Professor, School of Social Sciences, Heriot-Watt University Edinburgh A number of events in 2018-19 have put the data-sharing methods of genetic genealogy into the public spotlight forcing open largely undiscussed issues about privacy and the ethics of publishing personal DNA data of oneself and relatives. There is an expectation of reciprocal sharing and shared genealogical purposes attached to using these company databases. Revelations that these databases have been used by law enforcement agencies, chiefly in the USA, to identify unknown murder victims and to apprehend suspects of violent crimes have led to a growth of anxiety among the customer base which has been explored through a series of interviews with lead players in this debate on their perceptions of the ethical hazards involved and what remedies may be found. The analysis covers: (1) whether privacy is actually compromised if the genomic data of individuals is not revealed; (2) whether social benefits might justify the actions permitted by certain testing companies; (3) how they may perceive the risk of ‘mission creep’ if tolerance of such usage for crimes of violence may see it extend to other forms of criminal behaviour, and the effect that may have on public support. As a result, we recommend that the testing companies all take approaches that strongly foreground the principle of informed consent in order to avoid potential long-term harm to their business models.

P6

The logical philosophy of Investigating the Challenges of Genetic Bioethics in the use of Pig Derivatives in Medical Manufacturing and its possible impact on personal human inheritance of genes

Gihan E-H Gawish, MSc, PhD, PostDoc-UBC Fellow, Female Section of Saudi Scientific Society for Juristic Medical Studies, IMSIU, Riyadh-SA

Ass Prof of Medical Biochemistry, Molecular Genetics and Cancer Genetics Member of the Board Directors & Supervisor of Female Section of Saudi Scientific Society for Juristic Medical Studies, IMSIU , Medical Biochemistry Department, College of Medicine and Medical Services. Al-Imam Muhammad Ibn Saud Islamic University, Riyadh-SA Member of the Board Directors of SSCC-SCFHS Founding Member of the Genome Research Chair (Former), KSU Medical Laboratories Specialties, mohp-eg & scfhs-sa https://imamu.academia.edu/DrGihanGawish [email protected] +996553101340

The world's pork manufacturing revenue is $ 500 billion a year While Muslims and Jews represent more than a quarter of the population of the earth, which means that it is inevitable to examine the current status of food and medicine based on the prohibition in the heavenly books to investigate if these derivatives could affect the personal human genes inheritances. The residues genes of the pig in the presence of an appropriate environment of its viruses resulting from the manufacture may to have an impact on the genetic content of the human being. The objective of the philosophical examination of influences is not inconsistent with the existing economy but is to find industrial solutions to preserve the moral legacy of human genetic material in the event that eating a food or medicine containing the residues of taboos from the sequences of nucleotides with the remnants of pigs with viruses may be a tool to destroy parts of human genetic heritage or replace parts of them over the decades or at least it could enhance the genotoxicity and encourage cellular deviation to carcinogensis. Especially that these industries are modern and began in the sixties, and no research has been carried out to ascertain their long-term bad consequences and how to avoid them. In 2014 when the Department of Islamic Development in Malaysia investigated samples of Cadbury and they found parts of the pig's genetic material in chocolates. This prompted me to make a comparison between the physical characterization of the pig's DNA and the industry's impact on it in extraction, pressure, heat, cracking and purifications. And those associated with the containment of biological residues such as viruses or vital compounds. These residues may not be affected by pressure and industrial heat, and others. The impact of which should be examined for the future of natural genetic replications in the human cells to save their physical, chemical and ethical inheritances. My study is focusing in how we could update the industrial process to protect personal human genome from any transformation related to industrial development.

P7

Identifying persons unknown using genetic genealogy - a review of the methodology

Maurice Gleeson, Education Ambassador, International Society of Genetic Genealogy

The arrest of the alleged Golden State Killer in California in April 2018 created a media storm that has never fully abated. Since then over 25 “cold case” criminal suspects have been arrested as a result of the application of genetic genealogy techniques. The power of these techniques to help solve “cold cases” has caught the attention of forensic scientists and law enforcement agencies across the world, highlighting the power of these genetic genealogy techniques to solve cases where standard forensic techniques have failed. The genetic genealogy techniques used to identify these criminal suspects are rooted in adoptee research that the genetic genealogy community has been engaged in for the past 12 years. There are many people who don't know the identity of one or both of their parents, among them adoptees, foundlings, and donor-conceived children. For these people there is frequently a desire to track down their biological relatives, learn about their roots, and forge relationships with new family members. Another frequent objective is to acquire vital information about medical histories that may impact on the individual's own health or that of their children. Following the availability of commercial autosomal DNA tests in 2007 (specifically autosomal SNP microarray genotyping), many people realised that they could potentially help adoptees trace their birth families. This was particularly important in those US states where adoption files were not open to the public. As a result a whole new industry in adoptee research was created. This presentation summarizes the genetic genealogy methodology used to identify unknown persons, whether they be adoptees, foundlings, unidentified murder or accident victims, unidentified human remains, rapists or killers. The technique can be broken down into the following steps:

1) identify close genetic matches to the person of interest

2) cluster them into groups of Shared Matches

3) identify or build family trees for the members of each cluster

4) use these family trees to triangulate back to a common ancestor for each cluster

5) trace forward from the ancestral couples until one cluster’s descendants intersect with another

6) use profiling to narrow down the potential candidates for the target person (or their parents)

7) perform further targeted DNA testing to confirm the relationship

P8

ELIXIR: Providing a coordinated European Infrastructure for managing Human Genomics Translational Data and Services

Jen Harrow, On behalf of ELIXIR

ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK

ELIXIR unites Europe's leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. It coordinates, integrates and sustains bioinformatics resources across its member states and enables users in academia and industry to access services that are vital for their research. There are currently 22 countries involved in ELIXIR, bringing together more than 200 institutes and 600 scientists.

ELIXIR's activities are coordinated across five areas called 'Platforms', which have made significant progress over the past few years. For instance, the Data Platform has developed a process to identify data resources that are of fundamental importance to research and committed to long term preservation of data, known as core data resources. The Tools Platform has services to help search appropriate software tools, workflows, benchmarking as well as a Biocontainer's registry, to enable software to be run on any operating system. The Compute Platform has services to store, share and analyse large data sets and has developed the Authorization and Authentication Infrastructure (AAI) single-sign on service across ELIXIR. The Interoperability Platform develops and encourages adoption of standards such as FAIRsharing, and the Training Platform helps scientists and developers find the training they need via the Training e-Support System (TeSS).

The Beacon Project is an open sharing platform that allows any genomic data centre in the world to make its data discoverable. The project is a first-of-its kind effort to make the massive amounts of life sciences data being collected in healthcare and research settings around the globe accessible and is being supported and funded by ELIXIR. To date, 70 beacons have been "lit," including seven in the UK and another nine across Europe, allowing users unprecedented discovery of genomic variants in national and international cohorts. The Authentication and Authorisation Infrastructure (AAI) provides a centralised user identity and access management service (ELIXIR AAI). ELIXIR AAI will be used to access the European Genome-Phenome Archive (EGA) resources and ELIXIR is working with the GA4GH to have ELIXIR AAI approved as a standard. The focus now for ELIXIR Human Genomics and Translational Data is to establish a federated suite of EGA services across Europe, coordinating the national roadmaps and large EU projects to enable population scale genomic, phenotypic, and biomolecular data to be accessible across international borders.

P9

Mutational dynamics in the mouse mitochondrial genome

Maribel Hernández-Rosales

Conacyt-Institute of Mathematics, UNAM, Juriquilla; Alfredo Varela-Echavarria, Institute of Neurobiology, UNAM Juriquilla.

In the cell there are from hundreds to thousands of mitochondria. Mitochondrial mutant genomes can coexist with wild-type genomes. Mutations in the mitochondrial genome have been associated to several diseases, such as aging, Alzheimer’s disease, Parkinson’s disease, some forms of cancer, infertility, neuromuscular disorders, etc. In this work, we address the following questions: what is the mutation load in the mitochondrial genome? does the mutation load change in the mouse brain in different stages of life? does the frequency of individual mutations change in different stages of life? how are mutations distributed in the mitochondrial genome? I will show preliminary results of this study in the mouse mitochondrial genome that will give us insights about the mutational dynamics in the human mitochondrial genome.

P10

Issues and experience in incorporating Personal Genome Project principles into Asian genome projects: the need for standardized protocols using future technologies

Sungwon Jeon, Sungwon Jeon1, Asta Blazyte1, Sungwoong Jho2, Dan Bolser3, and Jong Bhak1,2,4*

1) KOGIC, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea 2) Personal Genomics Institute, Genome Research Foundation, Cheongju 28160, Republic of Korea 3) Solidi, Cambridge Judge Business School, Trumpington Street, Cambridge CB2 1AG 4) Clinomics LTD, UNIST, Ulsan 44919, Republic of Korea [email protected]

Various Personal Genome Project concepts have been around for decades and the best known project is PGP by George Church group at Harvard. Implementing detailed PGP principles to other nations and cultures have serious issues. Here, we introduce several PGP principle-adopting Asian genome projects and share the problems and critical assessment on implementing the broadly democratic principles of PGP. The first one - Personal Welfare Genome project in Korea recruited 1,000 healthy Koreans for three years. We provided a free health check-up resulting in 111 phenotypical assays and answers from 160 health related questionnaires through a private hospital providing a genetic counselling. It was funded by the government and participants reported a high level of satisfaction. However, it was impossible to implement a consent on sharing the genome information openfreely outside the project. The second project is a 10,000 Korean human genome project which collected 2,400 genomes for two years. In this case, the genomic data could be shared only if traditional human sample access procedure is followed. In the end, the personal genome data will be shared only in a secured cloud environment if de-identification rule is implemented. The last PGP principle associated project is PAPGI, Pan Asian Population Genomics Initiative, aimed to gather a wide variety of Asian personal genomes. The main problem of this is that each nation's research and regulation environment is different, therefore, it is impossible to implement any standardized data deposition and sharing. Major issues: Privacy, ethnics, and legal regulations issues need to be overcome robustly. We need to dissociate the scientific data from medical data because, currently, any sequencing-based data are regarded medical and diagnostic automatically mostly controlled by medical authorities. Human rights of knowing and sharing or deleting ones' personal genomic data should be recognized. Each individual must take his responsibility of acquiring, storing, and sharing at her/his own risk by becoming the center of his own genomics. The society must recognize each individual's rights on free and unlimited usage of scientific and biological genome data.

P11

KPGP: the Korean Personal Genome Project towards Personal Reference Genomes Era

Sungwoong Jho, Sungwoong Jho1, Jungeun Kim1, Sungwon Jeon2, and Jong Bhak1,2

1Personal Genomics Institute, Genome Research Foundation, Cheongju, 28190, Republic of Korea 2 KOGIC, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea

KPGP or PGP-Korea is the first and longest-lasting Korean genome project1. It was initiated by the Korean Bioinformation Center (KOBIC) in 2006 to characterize ethnicity-relevant variome of Koreans. It has three major goals. The first is to provide personal genome data to the public by democratizing genomic information in Korea. The second aim is to build the Korean reference genome (KOREF) that is not only of one person's but also of the public. KOREF is unique because it has both single genome (KOREF_S) assembly and population consensus assembly (KOREF_C)2. The third goal is developing Korean population variome, KoVariome3. It aims to provide as much genomic information as openfreely as possible. Since KPGP published the first Korean genome data in 20094, the number of openfreely accessible complete genomes of KPGP database has reached to 111 personal genomes as of 2018. This database was used to construct the first consensus Korean Reference genome standard (KOREF_C) and KoVariome. These genome data was the first of its kind that were generated under standard reference construction protocol as a joint project of National Center for Standard Reference Data of Korea. KoVariome contains 12.7 M SNPs and 1.7 M small indels from 50 unrelated healthy Korean individuals in the KPGP cohorts in 2018, and the number of samples reached to 80 currently. The KoVariome in 2019 will contain 300 Korean samples.

P12

Structural variant calling by assembly in whole human genomes: applications in hypoplastic left heart syndrome

Matthew Kendzior, Sparsh Agarwal, Dave Istanto, Xiyu Ge, Xiaoman Xie, Zach Stephens, Jacob Heldenbrand, Timothy Olson, Jeanne Theis, Jared Evans, Eric Wieben, Liudmila Mainzer, Matthew Hudson

National Center for Supercomputing Applications: Matthew Kendzior, Sparsh Agarwal, Jacob Heldenbrand, Liudmila Mainzer, Matthew Hudson University of Illinois at Urbana-Champaign: Dave Istanto, Xiyu Ge, Xiaoman Xie, Zach Stephens Mayo Clinic: Timothy Olson, Jeanne Theis, Jared Evans, Eric Wieben

Alignment-based variant calling leaves many variants undetected, particularly structural variants. Reads originating from large insertions or highly repetitive sequences may not map, or map incorrectly. Large deletions (hundreds to thousands of nucleotides) can be left unidentified. This problem can be remedied through variant calling by assembly. We used Cortex-Var, a program that creates de Bruijn graphs for input samples and looks for divergence among them within a population, or relative to a reference. This is a complex multistep process that is difficult to deploy, and requires high performance, large memory compute nodes. To aid in this process, we have developed a fully-automated solution using Nextflow, an advanced workflow management system. We applied this workflow to high- coverage whole genome sequencing data from 24 family trios, each containing a proband affected with hypoplastic left heart syndrome (HLHS), a critical congenital heart defect with poorly understood genetic underpinnings.

Current research into the etiology of HLHS has identified mutations in candidate genes functioning in embryonic heart development. However, most of these are single nucleotide variants (SNVs) that are present in few individuals. Current opinion based on statistical genetic studies is that HLHS is unlikely to be caused by a small number of large-effect variants, but rather a combination of alleles in affected individuals that result in the HLHS phenotype. Many of these alleles are may reside in non-coding regulatory regions, that would also be undetected by targeted exome sequencing alone. Using our cortex-var workflow, we have identified a number of large structural variants in individuals affected with HLHS.

We annotated variants based on location relative to genes and regulatory elements related to congenital heart disease and embryonic heart development and found many that have annotation related to a potential role in HLHS. These include structural variants removing transcription factor binding sites within introns of NOTCH1, a gene previously implicated in HLHS; variants affecting exons of genes important to embryonic cardiac development; and variants in fetal cardiac enhancer regions identified through ChIP-Seq. We compared the frequency of the variants in the probands versus the parents to estimate the likelihood that a variant is de novo or inherited. Successful application of our workflow will enable faster and cheaper detection of variants not only contributing to HLHS but also of other complex diseases.

P13

Complex, Challenging Variants are a Significant Fraction of the Pathogenic Variants in Patients: Implications for Clinical WGS

Stephen Lincoln [1], Andrew Fellowes [2], Shazia Mahamdallie [3], Shimul Chowdhury[4], Eric Klee [5], Justin Zook [6], Rebecca Truty [1], Marc Salit [7], Nazneen Rahman [3], Stephen Kingsmore [4], Robert Nussbaum [1], Matthew Ferber [5], Brian Shirts [8]

. Invitae, San Francisco, USA, 2. Peter MacCallum Cancer Centre, Melbourne, Australia, 3. Institute of Cancer Research, London, UK, 4. Rady Children's Institute for Genomic Medicine, San Diego, USA, 5. Mayo Clinic, Rochester, USA, 6. National Institute of Standards and Technology, Gaithersburg, USA, 7. Stanford University, Palo Alto, USA, 8. University of Washington, Seattle, USA

Next-generation sequencing (NGS) is a capable technique for detecting single nucleotide variants and small indels in relatively accessible parts of a patient's genome. However, conventional NGS methods have important limitations. An analysis of over 200,000 patients, tested using sensitive methods, showed that variants of other, technically challenging types comprise between 9 and 19% of the reportable pathogenic findings, depending on clinical indication. Approximately 50% of these variants were of challenging types (large indels, single exon CNVs, etc.), 20% were in challenging genomic regions (homopolymers, non- unique sequences, etc.), and 15% were in poorly covered regions. A further 15% presented multiple challenges. These data have been deposited into ClinVar and prevalence data are being made available. It can be difficult to evaluate the sensitivity of DNA sequencing methods for such challenging variants. The most recent AMP/CAP guidelines for clinical NGS bioinformatics [Roy et al., JMD 2018] recommend that validation studies include samples containing enough variants of each type to achieve statistical significance, a goal that is difficult to achieve for complex variants given the relative scarcity of positive controls. As proof-of-concept for one potential method to address this issue, we developed a synthetic specimen containing 22 challenging variants of diverse types in commonly tested genes from the ACMG 59 list. This specimen was sequenced using 10 validated NGS tests by an international group of collaborating laboratories: only 10 of the 22 challenging variants were detected by all tests, and just 3 tests detected all 22. Some of these limitations were not known to the respective laboratory directors, demonstrating the utility of this Most but not all of the limitations appeared to be bioinformatic in nature. We believe that both our prevalence data and control specimens such as ours may be a valuable asset to improve the performance of genome sequencing in medical practice.

P14

The Personal Genome Project UK – An Update

Ismail Moghul1 & José Afonso Guerra-Assunção2 on behalf of the PGP-UK Consortium 1Medical Genomics, UCL Cancer Institute, University College London, UK; 2BLIC, UCL Cancer Institute, University College London, UK The Personal Genome Project UK (PGP-UK) is dedicated to making genome, health, and trait data publicly available under an ethically approved, open-access and open-consent model. Participant enrolment for the PGP-UK is an extensive process, where participants have to demonstrate a thorough understanding of the risks involved in taking part in a project of this nature by completing an online, multiple-choice exam. To date, over 150 datasets (600+ data files) have been released from the project, including a multi-omic pilot study of ten participants and a further 100 whole genome sequencing datasets. The PGP-UK currently includes data produced from whole genome sequencing (n=101), whole exome sequencing (n=2), whole transcriptome sequencing (n=20), whole genome bisulphite sequencing (n=10) and genome-wide methylation arrays (n=23). The entire PGP-UK dataset is freely available for download from public repositories (ENA, EVA and ArrayExpress) with no access restrictions. Links to all datasets are provided on the PGP-UK website (www.personalgenomes.org.uk). Basic phenotype data, which includes self-reported age, sex, smoking status, etc, can be found on the project’s data web page (www.personalgenomes.org.uk/data), alongside with genome and methylome reports, generated by the PGP-UK. Furthermore, all of the data and associated metadata are available through the PGP-UK API. The API is compliant with the Open API Specification 3.0 and is documented at www.personalgenomes.org.uk/api. Data from the pilot study is available on the Seven Bridges Cancer Genomics Cloud, which offers various tools and workflows for genomic and epigenomic data analysis. The entire PGP-UK data is available on the Lifebit's Open Data Project (opendata.lifebit.ai/table/pgp), where data can be exported to Lifebit’s cloud-computing platform Deploit (deploit.lifebit.ai) in order to run custom pipelines. In addition to generating open-access multi-omics data, we have developed an open source iPad app, call ‘GenoME’. This app allows users to explore the personal genome and epigenomes of four PGP-UK participants. As well as acting as a valuable educational tool, this app explores novel methods of returning epigenomic data to participants for the first time.

P15

Robust governance for sustainable sharing of genomic data

Guro Meldre Pedersen, Vibeke Binz Vallevik, Sharmini Alagaratnam

DNV GL, N-1363 Høvik, Norway

The successful clinical implementation of precision medicine allows patient information from a wide range of sources along the course of the patient journey to be combined for medical decision support. More precise diagnosis and intervention requires comparing patient data with a backdrop of population data, which in turn requires access to and sharing of aggregated data, ideally across entities and borders.

Sharing of genomic data is technically challenging and requires interoperability, data standardization and harmonization, as well as focus on data quality and security. Resolving the technical bottlenecks for sharing of genomic data must be accompanied by adequate governance of data, integrating regulatory, organizational and individual needs related to data capture, aggregation, storage, access and sharing. Ideally, data governance will recognise the individual's ownership of health data and balance privacy needs as guided by the GDPR and national regulations with individual and societal benefits of data sharing.

Through the Norwegian Research Council funded project BigMed, and working with leading Nordic clinical genetic labs in the Nordic Alliance for Clinical Genomics, we have developed the Trusted Variant eXchange (TVX). The TVX is a concept for facilitation of safe sharing of quality assured variant classifications between trusted partners of choice. The project has allowed us to test and demonstrate technical solutions while exploring legal challenges in precision medicine specific to our case with Norway's top legal experts. In this talk, we will share our experiences on governance needs piloted through the TVX concept and discuss needs for sharing of more complex genomic data in increasingly complex digital patient pathways.

Driven by our mission to safeguard life, property and the environment and building on more than 150 years of experience in combining technical domain knowledge, risk management and quality assurance, DNV GL is working with stakeholders to understand needs related to governance and sharing of genomic data. Being fully owned by an independent foundation, DNV GL is a disinterested party to the data itself who aims to bring together producers and users of data to establish robust and sustainable models for data sharing.

P16

MyEyeSite: a feasibility study and prototype for a patient-owned repository of rare- disease clinical and genetic data using inherited retinal disease as a paradigm Nikolas Pontikos1, Rose Gilbert1, Gavin Arno1, Rodrigo Young1, Nick Nettleton3 and Andrew R. Webster1 1UCL Institute of Ophthalmology, 11-43 Bath Street, London EC1V 9EL, United Kingdom; 2Moorfields Eye Hospital, 162 City Road, London EC1V 2PD, United Kingdom; 3Loft Digital, 19-21 Christopher Street, London EC2A 2BS, United Kingdom

Rare diseases affect approximately 7% of the population. For these, it is harder to pool data for research purposes, as, unlike other common disorders, the pertinent data is highly- specialised, embedded and inaccessible within hospital networks (images, radiographs, electrophysiology, genetic diagnosis). How then do we collate person-specific clinical information from multiple locations and over time for the purposes of patient care and research? A standard strategy might be to link data within the NHS Data Spine, and then access the data en masse for research. This is technically challenging and ethically difficult without explicit patient consent.

MyEyeSite will explore a different approach – give the job to the patient. Our unique insight is to start with highly motivated patients and their medical community, within a specific disease group, and support them with new, accessible technology. Here we apply to undertake a feasibility appraisal and prototype of a suite of applications that will:

● facilitate subject-access requests from patients to hospitals for disease-appropriate data ● provide a framework for hospitals to respond efficiently to such requests, ● allow patients to access their own data in an informative way, robust to sight- impairment ● provide pooled data on consented patients for research purposes. As part of our approach we will be educating patients about their data and how it can be used for research and improvement of their clinical care.

P17

Validating the Key Implications of Data Sharing (KIDS) framework for the pediatric infrastructure sciences in Canada: a policy Delphi study

Vasiliki Rahimzadeh1,2, Gillian Bartlett2, Bartha Maria Knoppers1 1. Centre of Genomics and Policy, McGill University 2. Department of Family Medicine, McGill University

BACKGROUND: The informational feedback loops driving clinical progress in the genomicsenabled earning health systems rely on the production, use and exchange of data, including from children. The policies and practices guiding proportionate governance of such production, access and exchange are, however, markedly lacking in the pediatric genomics space. Despite the researchcare nexus that genomics-enabled learning health systems afford, the respective ethical-legal traditions circumscribing appropriate oversight of data sharing in clinical research and care remain separate and distinct in Canada. The need for policy-practice coherence in genomic data sharing can be accentuated when involving populations such as children, for whom such data may require special protections. Absent understanding the ethical-legal bases upon which responsible pediatric data sharing rests, present and future children may not reap the benefits of a healthcare system that continuous ‘learns’ from the production, use and exchange of genomic and associated clinical data. METHODS: A systematic review of reasons was combined with policy Delphi to develop the Key Implications of Data Sharing (KIDS) framework for pediatric genomics. The results of the latter will be discussed in depth in this presentation. Thematic content, and descriptive statistical analyses were used to understand how 12 Canadian pediatricians, genomic researchers, ethicists and bioethics scholars prioritize the ethical-legal, social and scientific policy positions outlined in the KIDS framework. RESULTS: The panel reached consensus on 9 of 12 policy positions. Discrepant views related to informational risks, data access and oversight of anonymized versus coded genomic data were primary sources of dissention. CONCLUSION: This policy Delphi makes two contributions to the theory and practice of responsible data sharing involving children in Canada. It suggests that skepticism of data anonymization drives support for more stringent access controls and oversight when data involve children. Greater emphasis on data accountability—coupled with data security— could serve as more effective policy levers to preserve patient trust in data sharing in light of rapid computational, and ensure children remain at the forefront of genomic innovation.

P18

The Heart Hive - a scalable solution for 21st century cardiovascular research

Angharad M Roberts, Rachel J Buchan, Sarah Chopping, Nichola Whiffin, Paul J R Barton, Stuart A Cook, James S Ware

Imperial College London, UK Royal Brompton & Harefield NHS Trust, UK

Our vision: All patients should have the opportunity to participate in research into their condition, to advance knowledge and treatment. We are growing an online community to connect willing research participants with active researchers and projects. The challenge and opportunity: Inherited cardiovascular disease affects over 1 in 200 people, and is progressive. Tremendous advances have been made in understanding the molecular basis and clinical manifestations. However, little is known about why disease expression and clinical outcomes are so variable. Adequately-powered, systematic study is needed to characterise the contributions of both common and rare genetic variation to disease risk, and as modifiers of disease expression. Patients living with cardiovascular disease want to participate in research. They also tell us that this is difficult as research opportunities are clustered around certain centres. Researchers also face major challenges in recruiting eligible patients and maintaining patient engagement. Contemporary genetics requires large cohorts for well-powered studies; these are beyond the reach of single centres and even stretch traditional collaborative networks. At the other end of the spectrum, stratified approaches might demand recruitment of individuals with a very specific set of phenotypic characteristics . Access to a large pool of individuals allows for identification of a rare subset. The solution: Give everyone the opportunity to participate in research. We are reaching research participants through patient groups, social media and an engaging online presence. Patients enrol in our study through an ethically-approved, fully-online and self-directed consent interface. DNA for genetic analysis is be collected remotely using saliva kits distributed by post. Participants have full control of their own data, and which researchers can use it, through a dynamic and interactive online consent process. Any researcher can offer an ethically and scientifically approved study to the Heart Hive community through this unbiased platform. Subsequent studies will contribute to a growing and sustainable cumulative resource of data and experience that will transform the landscape. The Heart Hive represents a scalable and effective solution for 21st century medical research. It is a strategy applicable not only to these specific cardiac conditions, but across a much broader range of medical research. By empowering patients to participate from home, to control their own data and by sharing this online resource with the scientific community we can bring together the large cohorts needed for modern genetic research, and generate a cumulative collaborative resource with contributions from multiple researchers.

P19

Data Use Ontology: Classifying data access conditions for genomic data

Dr Dylan Spalding, The European Genome-phenome Archive The GA4GH DURI workstream

European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD,UK; Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, Spain and Universitat Pompeu Fabra (UPF), Barcelona, Spain Global Alliance for Genomics and Health Data Use and Researcher ID workstream, https://ga4gh-duri.github.io

Accessing personal genomic data while ensuring the individuals' consent is respected requires the proposed research use to be compared to the consented use. This is usually done manually and in free-text which generates inconsistency across different data stewards. To expedite applications and facilitate data access, the Global Alliance For Genomics and Health (GA4GH) Data Use Ontology (DUO) has been developed by the Data Use and Researcher Identities (DURI) workstream to standardise the way these data access conditions are categorised. DUO helps to ensure consistent understanding of data use conditions amongst data stewards, so that the conditions can be applied congruently. Additionally, by using a standardised terminology, data can be discovered based on possible research use, improving data screening. To ensure the correct definition of a DUO term is applied, DUO is versioned and available as a machine-readable file using the W3C standard OWL Web Ontology Language Standard. DUO is updated centrally and released using a PURL-based URI: users can use the latest version of the ontology which is browsable at http://purl.obolibrary.org/obo/DUO_0000001 or downloadable from http://purl.obolibrary.org/obo/duo.owl. Due to the structured nature of an ontology, algorithms such as DUOS have been developed to determine access decisions without human intervention, speeding up the application process. As well as DUO, work on Researcher Identities is ongoing with the aim to allow fully automated application and data access, based on the researcher's identity and proposed data use, providing faster data access and resulting in more efficient research outcomes. As a GA4GH driver project, the European Genome-phenome Archive (EGA) now supports DUO to tag the data use conditions to datasets, and is working with the DURI workstream and ELIXIR to implement Researcher Identities. EGA has been working with the Wellcome Trust Sanger Institute to apply these codes to existing and new datasets, and the EGA now recommends new datasets are submitted using DUO while working with submitters to enhance uptake.

P20

Storing and sharing personal genome variant and phenotype data in LOVD3

Peter EM Taschner1, Stephen Pieterman1, Ivo FAC Fokkema2, Marjolein Kriek3

1Leiden Centre for Applied Bioscience, University of Applied Sciences Leiden, 2Department of Human Genetics and 3Department of Clinical Genetics, Leiden University Medical Center, Leiden, Nederland

Personal exome and genome sequencing is already provided by commercial companies. Storing and sharing variant information from personal genomes can be challenging. The free, open-source, platform-independent Leiden Open-source Variation Database software (LOVD, http://www.LOVD.nl) has been developed to build standardized databases for curating and sharing gene variants(1). The latest version, LOVD3, is compatible with the Gen2Phen data model, implemented with additional tables for phenotype, screening and transcript information. Genome-wide sequence variant data can be stored in a single LOVD installation using chromosomal nucleotide positions as reference. Web services retrieve gene and transcript information on the fly. Data from exomes or genomes from one or more individuals can be stored and displayed in several ways: variant-by-variant or all connected to one or more individuals in the database. To promote data sharing, both phenotypes and variants can be stored (and identified) individually. Data can be made public and non-public for both with the option to query. Other features include: display of disease-specific phenotype information, storage of temporal phenotype information, and queries in and across data columns.

An example of personal genome variant information stored in LOVD3 can be found at http://databases.generade.nl/personal_genomes

The Human Variome Project has granted LOVD the recommended system status for variant collection.

[1] Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT (2011). LOVD v.2.0: the next generation in gene variant databases. Hum Mutat. 2011 32:557-63.

P21

Notes

P22

Notes

P23

Notes

P24

Notes

P25

Delegate List Alexandra Blakemore xxx [email protected] Anuradha Acharya

Mapmygenome Asta Blazyte [email protected] Ulsan National Institute of Science and

Technology Eman Ahmed [email protected] KU Leuven [email protected] Paranchai Boonsawat

University of Zurich Dalal AlMutairi [email protected] Kuwait University [email protected] Robert Borkowski

23andMe Mohamed Osman Arab [email protected] Oslo university Hospital [email protected] Pascal Borry

KU Leuven Kat Arney [email protected] First Create the Media [email protected] Nick Brain

Thermo Fisher Scientific Naveed Aziz [email protected] CGEn [email protected] Darren Burgess

Nature Reviews Genetics Mark Bale [email protected] Genomics England [email protected] Toby Call

Chronomics Ltd Sivia Barnoy [email protected] Tel-Aviv University [email protected] Dorothée Caminiti

ETH Zurich Stephan Beck [email protected] UCL [email protected] Jose L Campos

IGMM, University of Edinburgh Christopher Bell [email protected] University of Southampton [email protected] George Church

Harvard Medical School Dhruva Biswas [email protected] UCL Cancer Institute [email protected] John Cleary

Heriot-Watt University Martina Bittoova [email protected] GENNET s.r.o. [email protected]

Hayley Clissold Gihan Gawish Wellcome Sanger Institute College of medicine, AlImam University [email protected] (IMSIU) [email protected] Manuel Corpas Cambridge Precision Medicine Maurice Gleeson [email protected] Genetic Genealogist [email protected] James Dai FHCRC Gustavo Glusman [email protected] Institute for Systems Biology [email protected] Johan den Dunnen Leiden University Medical Center (LUMC) Jaap Goudsmit [email protected] Harvard School of Public Health [email protected] Priya Dewan Royal London Hospital Becki Green [email protected] King's College London [email protected] Huw Dorkins University of Oxford Bastian Greshake Tzovaras [email protected] openSNP [email protected] Simone Ecker UCL Cancer Institute Jose Afonso Guerra Assuncao [email protected] UCL Cancer Institute [email protected] Frances Elmslie St George's, University of London Christi Guerrini [email protected] Baylor College of Medicine [email protected] Yaniv Erlich MyHeritage/Columbia University Joanne Hackett [email protected] Genomics England [email protected] Bert Eussen ErasmusMC Lorenza Haddad Talancon [email protected] Codigo 46 [email protected] Eva Fisher Robert Koch-Institute Thomas Haizel [email protected] Nkaarco Diagnostics Limited [email protected] Natalie Fitzpatrick UCL Mihail Halachev [email protected] University of Edinburgh [email protected] Isabelle Foote Queen Mary University of London [email protected] Jennifer Harrow Matthew Kendzior ELIXIR University of Illinois [email protected] [email protected]

John Hatwell Monika Koudova Genomics England GENNET s.r.o. [email protected]. [email protected] uk David Kovalic Charlotte J Haug Webster University New England Journal of Medicine [email protected] [email protected] Peter Lederer Maribel Hernández Rosales FAU Erlangen Institute of Mathematics UNAM [email protected] [email protected] Heidi Ledford Dorota HoffmanZacharska Nature magazine Institute of Mother and Child [email protected] [email protected] Edmund Lehmann Camilla Ip Cambridge Precision Medicine University of Oxford [email protected] [email protected] Cathryn Lewis Irma Jarvela King's College London University of Helsinki [email protected] [email protected] Stephen Lincoln SIRKKA JARVENPAA Invitae University of Texas at Austin [email protected] [email protected] Marina Lipkin Vasquez Sungwon Jeon INCa Ulsan National Institute of Science and [email protected] Technology [email protected] Jodie Lord Miss Kathleen Job [email protected] Cardiff University [email protected] Heidi Marjonen National Institute for Health and Welfare Lennart Karssen [email protected] PolyOmica [email protected] Argyri Iris Mathioudaki Uppsala University Stephen Kearney [email protected] Griffith College Dublin [email protected] Karyn Megy University of Cambridge [email protected] Andres Metspalu Nikolas Pontikos Andres Metspalu UCL Institute of Ophthalmology [email protected] [email protected]

Ismail Moghul Mad Price Ball University College of London Open Humans Foundation [email protected] [email protected]

Tiffany Morris Vasiliki Rahimzadeh Illumina McGill University [email protected] [email protected]

Miranda Mourby Jake Reeves University of Oxford University of Surrey [email protected] [email protected]

Monica Munoz-Torres Michael Rhodes Oregon State University NanoString Technologies [email protected] [email protected]

Sergey Nechaev Angharad Roberts Illumina Imperial College [email protected] [email protected]

Fiona Nielsen Barjinder Sahota Repositive SAHOTA SOLICITORS [email protected] [email protected]

Sandosh Padmanabhan Saskia Sanderson University of Glasgow UCL [email protected] [email protected]

Priit Palta Rupa Sarkar FIMM, University of Helsinki The Lancet Digital Health [email protected] [email protected]

Guro Meldre Pedersen Gary Saunders DNV GL ELIXIR Europe [email protected] [email protected]

Minja Pehrsson Cathleen Schulte Helsinki Biobank Office for Life Sciences [email protected] [email protected] .uk Hartmut Peters Charite -Universitaetsmedizin Mahsa Shabani [email protected] University of Leuven [email protected] Vincent Plagnol GenomicsPlc [email protected] Sevasti Skeva Annemieke Verkerk KU Leuven Erasmus Medical Center [email protected] [email protected]

Colin Smith Natalia Volfovsky University of Brighton Simons Foundation [email protected] [email protected]

Reecha Sofat Cyndi Williams University College London Quin [email protected] [email protected]

Dylan Spalding Howard Wu EMBL-EBI Open Humans [email protected] [email protected]

Ciara Staunton Yanxiang Zhou Middlesex University Illumina Ventures [email protected] [email protected]

David Stejskal GENNET s.r.o. [email protected]

Peter Taschner University of Applied Sciences [email protected]

Nicki Taverner Cardiff University and All Wales Medical Genetics Service [email protected]

Ben Te Aika Genomics Aoteraoa [email protected]

Philip Twiss Addenbrookes Hospital [email protected]

Kees van den Berg GenomeScan [email protected]

Adam Vaughan New Scientist [email protected]

Index

Acharya, A S3 Pedersen, G P16 Ahmed, E P1 Pontikos, N P17 AlMutairi, D P2 Plagnol, V S19 Aziz, N S9 Rahimzadeh, V P18 Biswas, D P3 Roberts, A P19 Blazyte, A P4 Boonsawat, P P5 Sanderson, S S29 Borry, P S39 Saunders, G S41 Smith, C S23 Church, G S1 Sofat, R S47 Cleary, J P6 Spalding, D P20 Staunton, C S33 Elmslie, F S45 Erlich, Y S49 Taschner, P P21 Taverner, N S15 Gawish, G P7 Te Aika, B S25 Gleeson, M S27, P8 Glusman, G S13 Greshake Tzovaras, B S21 Guerrini, C S37

Hackett, J S31 Haddad Talancon, L S5 Harrow, J P9 Hernandez Rosales, M P10

Jeon, S S11,P11,P12

Kendzior, M P13

Lewis, C S17 Lincoln, S P14

Marjonen, H S43 Metspalu, A S7 Munoz-Torres, M S35 Moghul, I P15