N3C): Let’S Get Involved !
Total Page:16
File Type:pdf, Size:1020Kb
The National COVID Cohort Collaborative (N3C): Let’s Get Involved ! Warren A. Kibbe, PhD, FACMI June 15, 2021 Purdue Big Data in Cancer Workshop @data2health covid.cd2h.org @wakibbe @ncats_nih_gov ncats.nih.gov/n3c Speaker Objectives A program of NIH’s National Center for Advancing Translational Sciences ● Real World Data ● Open Science ● Overview of N3C Warren Kibbe ● N3C Data Enclave statistics Duke Biostatistics & Bioinformatics ● How common data models and variables CTSA Informatics Duke Cancer Institute are harmonized Member N3C ● The scope of answerable questions ● Data access and security ● How common data models and variables are harmonized ● Oncology research in N3C Special thanks to: ● Chris Chute, N3C, Johns Hopkins ● Melissa Haendel, N3C, Colorado University ● Umit Topaloglu, N3C, Wake Forest ● Frank Rockhold, Duke ● Noha Sharafeldin, N3C, UAB Take homes • N3C represents a unique resource to examine effects of COVID-19 on cancer outcomes • Largest COVID-19 and cancer cohort within the US • Consistent with previous literature, older age, male gender, increasing comorbidities, and hematological malignancies were associated with higher mortality in patients with cancer and COVID-19 • The N3C dataset confirmed that cancer patients with COVID-19 who received recent immuno-, or targeted therapies were not at higher risks of overall mortality 4 What is Real World Data? Collected in the context of patient care. Real World Data was called out as part of the 21st Century Cures Act 21st Century Cures Act: https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/21st-century-cures-act Graphic from HealthCatalyst: https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development Current sources of data molecular genome pathology imaging labs notes sensors Our ability to generate biomedical data continues to grow in terms of variety and volume icons by the Noun Project AI is changing our ability to go both deep and broad Trustworthy AI Reusable Provenance Reproducible Having a health equity lens ● Digital Health, precision medicine, and real world data all have the power to transform healthcare. However, we must pay attention to structural racism and implicit bias if we want to achieve equity. 21st Century Cures Act Last year I discussed the NCI Cancer Moonshot and Precision Medicine activities funded under the 21st Century Cures Act FDA was directed by congress to focus on the use of RWD and RWE in drug design, development and outcomes assessment https://www.fda.gov/regulatory-information/selected- amendments-fdc-act/21st-century-cures-act Is it just about Real World Data? What about Open Science? Data transparency? Data Access? The importance of Open Science Calls for greater transparency and ‘open data access’ in clinical research continue actively. ● “Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society”* ● Open Science Project**: “If we want open science to flourish, we should raise our expectations to: Work. Finish. Publish. Release.” ● FAIR Principles: Findability, Accessibility, Interoperability, and Reusability*** ● TRUST Principles: Transparency, Responsibility, User focus, Sustainability and Technology * https://www.fosteropenscience.eu/resources ** http://openscience.org/ *** https://www.nature.com/articles/sdata201618 ****https://www.nature.com/articles/s41597-020-0486-7 Open Science and Patient Data Access Some of the challenges are: ● Patient privacy ● Academic credit ● Commercial sensitivity and intellectual property ● Data standards ● Resources (money and people) There should be room for researchers and patients alike to gain from this effort. Informatics experts and data scientists are essential elements of this discussion. One problem with Clinical Trials Data Sharing ● “The tendency for researchers to ‘‘sit’’ on their data for an unduly long period of time is neither desirable from a scientific point of view nor acceptable from an ethical perspective. ‘ ● ‘After all, the data belong to the patients who agreed to participate in the research, not to the investigators who coordinated it, as the new European General Data Protection Regulation emphasizes.”* *Rockhold, F, et al. Open science: The open clinical trials data journey, Clinical Trials, Vol 16 (5) 1-8, 2019 Access to patient-level data is important for research There are certainly challenges, but question is not whether data should be shared, but rather how and when access should be granted. Responsible open access enables secondary analyses that: ● Enhance reproducibility of clinical research ● Honor the contributions of trial participants, ● Improve the design of future trials ● Generate new research findings This journey of making patient data available is part of an evolution in transparency and not a sudden awakening. What about N3C? It is an open science, controlled access environment Clinical and Translational Science Awards (CTSA) Program A program of NIH’s National Center The pandemic highlights urgent needs for Advancing Translational Sciences ● Algorithms (diagnosis, triage, predictive, etc.) ● Drug discovery & pharmacogenetics ● Multimodal analytics (EHR, imaging, genomics) ● Interventions that reduce disease severity ● Best practices for resource allocation ● Coordinated research efforts to maximize efficiency and reproducibility These all require the creation of a comprehensive clinical data set What Kinds of Questions Can N3C Address? A program of NIH’s National Center for Advancing Translational Sciences The scope and scale of the information in the platform will support probing questions such as: ● What social determinants of health are risk factors for mortality? ● Do some therapies work better than others? By region? By demographics? ● Can we compare local rare clinical observations with national occurrences? ● Can we predict who might have severe outcomes if they have COVID-19? ● What factors will predict the effectiveness of vaccines? ● Can we predict acute kidney injury in COVID-19 patients? ● Who might need a ventilator because of lung failure? Cohort characterization objectives A program of NIH’s National Center for Advancing Translational Sciences To clinically characterize the N3C cohort ● Largest U.S. COVID-19 cohort to date (+ representative controls) + ● Racially, ethnically, and geographically diverse To develop and share validated, versioned OMOP representations of common variables (labs, vital signs, medications, treatments) To generate hypotheses to be tested within N3C and elsewhere ● Clinical phenotypes and trajectories ? ● Treatment patterns and response ● … and many others Benefits for Participation A program of NIH’s National Center for Advancing Translational Sciences ●Access to large scale COVID-19 data from across the nation ●Pilot data for grant proposals ●Opportunities for KL2 and TL1 and other scholars ●Team science opportunities for new questions and access to Teams, statistics, machine learning (ML), informatics expertise ●Learn ML analytics, NLP methods & access to tools, software, additional datasets Step 4. FederatedWho is in Analytics the N3C? with HPC A program of NIH’s National Center for Advancing Translational Sciences The N3C Computable Phenotype ● At a high level, our phenotype looks for patients: ○ With a positive COVID-19 test (PCR or antibody) OR ○ With an ICD-10-CM code of U07.1 OR ○ Two or more COVID-like diagnosis codes (ARDS, pneumonia, etc.) during the same encounter, but only on or prior to 5/1/2020 ● Each one of these patients is then demographically matched to two patients with negative or equivocal COVID-19 tests. Age 47 Age 49 Age 46 Gender M Gender M Gender M Race Black Matching algorithm Race Black Race Black Ethnicit Unknow Ethnicit Hispanic/ Ethnicit Not y n y Latino y Hispanic COVID Positive COVID Negative COVID Negative ● Each site securely sends this set of patients, along with their longitudinal EHR data from 1/1/2018 to the present, to the N3C on a regular basis. A program of NIH’s National Center N3C Timeline for Advancing Translational Sciences N3C Dashboard A program of NIH’s National Center for Advancing Translational Sciences covid.cd2h.org/dashboard 55 sites with data released (purple) and 37 sites with data pending (open circle). OCHIN is a national network of 131 sites (diamond). covid.cd2h.org/teams 31 Domain teams! As of June 14, 2021 Data Transfer Agreement Signatories 6/14/2021 88 DTA Signatories Northwestern University at Chicago ᛫ Tufts Medical Center ᛫ Advocate Health Care Network ᛫ University of Alabama at Birmingham ᛫ Oregon Health & Science University ᛫ University of Washington ᛫ Stanford University ᛫ The University of Michigan at Ann Arbor ᛫ Children's Hospital Colorado ᛫ Duke University ᛫ Medical College of Wisconsin ᛫ The Ohio State University ᛫ University of Nebraska Medical Center ᛫ University of Arkansas for Medical Sciences ᛫ George Washington University ᛫ Johns Hopkins University ᛫ West Virginia University ᛫ Medical University of South Carolina ᛫ University of North Carolina at Chapel Hill ᛫ University of Virginia ᛫ The University of Texas Medical Branch at Galveston ᛫ University of Minnesota ᛫ University of Cincinnati ᛫ Columbia University Irving Medical Center ᛫ Cincinnati Children's Hospital Medical Center ᛫ Rush University Medical Center ᛫ Nemours ᛫ University of Wisconsin-Madison ᛫ The State University of New York at Buffalo ᛫ Washington University in St. Louis ᛫ University of Rochester ᛫ The University of Chicago ᛫ University of