Linked Electronic Health Records for Research on a Nationwide Cohort Including Over 54 Million People in England
Total Page:16
File Type:pdf, Size:1020Kb
medRxiv preprint doi: https://doi.org/10.1101/2021.02.22.21252185; this version posted February 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . Linked electronic health records for research on a nationwide cohort including over 54 million people in England CVD-COVID-UK consortium* 3 tables, 3 figures 6 Supplementary tables Key Words: Electronic Health Records, England, administrative health datasets, nationwide, population-scale, trusted research environment, NHS Digital, primary care, hospital admissions, Covid-19 Word count = 4342 *Contributions to this manuscript: Manuscript drafting and revising: Angela Wood (writing committee chair) 4,5,8,13,19, Rachel Den- holm2,10,14, Sam Hollings16, Jennifer Cooper2,10,14, Samantha Ip4, Venexia Walker7,12, Spiros De- naxas3,11,15,19, Ashley Akbari18, Jonathan Sterne†2,10,14, Cathie Sudlow†9,21 †equal contributions Data wrangling, QA and analysis (including generating phenotype definitions): Angela Wood (chair), Rachel Denholm, Sam Hollings, Jennifer Cooper, Samantha Ip, Venexia Walker, Spiros De- naxas, Amitava Banerjee1,11,20, William Whiteley6,17 Figures/graphics: Alvina Lai11 Consortium coordination (BHF Data Science Centre core team): Rouven Priedon9, Cathie Sudlow9, Lynn Morrice9, Debbie Ringham9 NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.1 medRxiv preprint doi: https://doi.org/10.1101/2021.02.22.21252185; this version posted February 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . Consortium membership: https://www.hdruk.ac.uk/wp-content/uploads/2021/01/210128-CVD- COVID-UK-Consortium-Members.pdf and see Annexe 1 – Consortium members, who contributed to discussions leading up to this manuscript and provided very helpful insights and comments Public and patient advisory panel: Suzannah Power, Lynn Laidlaw, Michael Molete, John Walsh NHS Digital (coordination, data management/provision, data access request and information gov- ernance support, Trusted Research Environment support: Garry Coleman16, Cath Day16, Elizabeth Gaffney16, Tim Gentry16, Lisa Gray16, Sam Hollings16, Richard Irvine16, Brian Roberts16, Estelle Spence16, Janet Waterhouse16 Correspondence: [email protected] (Jonathan Sterne, Cathie Sudlow, Angela Wood) Affiliations: 1Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK; 2Bristol Medical School: Population Health Sciences, University of Bristol, Bristol, UK; 3British Heart Foundation Research Accelerator, University College London, UK; 4British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK; 5British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK; 6Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK; 7Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA 8Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK; 9BHF Data Science Centre, Health Data Research UK, Gibbs Building, London, UK; 10Health Data Research UK, South West Better Care Partnership, Bristol, UK; 11Institute of Health Informatics, University College London, 222 Euston Road, London, UK; 12MRC University of Bristol Integrative Epidemiology Unit, Bristol, UK; Bristol Medical School: Population Health Sciences, University of Bristol, Bristol, UK; 13National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge, UK; 14National Institute for Health Research Bristol Biomedical Research Centre, University of Bristol, UK; 15National Institute for Health Research University College London Hospitals Biomedical Research Centre, Uni- versity College London, UK; 16NHS Digital, 1 Trevelyan Square, Leeds, UK 17Nuffield Department of Population Health, University of Oxford, Oxford, UK; 18Population Data Science and Health Data Research UK, Swansea University, UK; 19The Alan Turing Institute, London, UK; 20University College London Hospitals NHS Trust, 235 Euston Road, London, UK; 21Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh, UK 2 medRxiv preprint doi: https://doi.org/10.1101/2021.02.22.21252185; this version posted February 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . ABSTRACT [Word count = 350] Objectives – Describe a new England-wide electronic health record (EHR) resource enabling whole population research on Covid-19 and cardiovascular disease whilst ensuring data security and priva- cy and maintaining public trust. Design – Cohort comprising linked person-level records from national healthcare settings for the English population accessible within NHS Digital’s new Trusted Research Environment. Setting – EHRs from primary care, hospital episodes, death registry, Covid-19 laboratory test results and community dispensing data, with further enrichment planned from specialist intensive care, cardiovascular and Covid-19 vaccination data. Participants - 54.4 million people alive on 1st January 2020 and registered with an NHS general prac- titioner in England. Main measures of interest – Confirmed and suspected Covid-19 diagnoses, exemplar cardiovascular conditions (incident stroke or transient ischaemic attack (TIA) and incident myocardial infarction (MI)) and all-cause mortality between 1st January and 31st October 2020. Results – The linked cohort includes over 96% of the English population. By combining person-level data across national healthcare settings, data on age, sex and ethnicity are complete for over 95% of the population. Among 53.2M people with no prior diagnosis of stroke/TIA, 98,721 had an incident stroke/TIA, of which 30% were recorded only in primary care and 4% only in death registry records. Among 53.1M people with no prior history of MI, 62,966 had an incident MI, of which 8% were rec- orded only in primary care and 12% only in death records. A total of 959,067 people had a confirmed or suspected Covid-19 diagnosis (714,162 in primary care data, 126,349 in hospital admission rec- ords, 776,503 in Covid-19 laboratory test data and 48,433 participants in death registry records). While 58% of these were recorded in both primary care and Covid-19 laboratory test data, 15% and 18% respectively were recorded in only one. Conclusions – This population-wide resource demonstrates the importance of linking person-level data across health settings to maximize completeness of key characteristics and to ascertain cardio- 3 medRxiv preprint doi: https://doi.org/10.1101/2021.02.22.21252185; this version posted February 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . vascular events and Covid-19 diagnoses. Although established initially to support research on Covid- 19 and cardiovascular disease to benefit clinical care and public health and to inform health care policy, it can broaden further to enable a very wide range of research. 4 medRxiv preprint doi: https://doi.org/10.1101/2021.02.22.21252185; this version posted February 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license . INTRODUCTION The Covid-19 pandemic has increased awareness of the importance of population-wide person-level electronic health record (EHR) data from a range of sources for examining, modelling and reporting disease trends to inform healthcare and public health policy1. Key benefits of research using such data on nationwide cohorts include: (i) generalisability of findings across all age groups, ethnicities, geographical locations and socioeconomic, health and personal characteristics and (ii) inclusion of very large numbers of people and events, enhancing the precision of findings and enabling a wide spectrum of novel research studies (e.g., characterising shapes of relationships between risk factors and disease or studying minority sub-populations and rare disease sub-types) . Whilst EHRs for whole country cohorts for Wales, Scotland, Denmark and Sweden (populations approximately 3 to 10 million) have been used for research for several years,2,3,4,5,6 at the start of the COVID-19 pan- demic, there was no access for bona fide researchers to national linked healthcare data across the population of England to enable critical research to support healthcare decisions and public health policy. There were two main reasons for this: there was no national collection of comprehensive, linkable primary care data; and there was no secure, privacy-protecting mechanism for researchers