Polyethnic-1000 Advancing cancer by studying ethnically diverse patient populations in New York Fieke Froeling1,2, Nicolas Robine1, Benjamin Hubert1, Michael Zody1, Dayna Oschwald1, Harold Varmus1,3, Charles Sawyers1,4, David Tuveson1,2, On behalf of the NYGC P1000 Consortium 1 New York Genome Center, 2 Cold Spring Harbor Laboratory, 3 , 4 Memorial Sloan Kettering Cancer Center Background Stage I: retrospective, feasibility study • Known ethnic differences in cancer incidence and mortality. • Central IRB approval (BRANY) with waiver of informed consent • In most cases, the causes for these variations are multi-factorial, however little • Sample requirements: remains known about the molecular attributes. - Self-identified non-white cancer patient > 18 years old • Advances in sequencing technologies have revolutionized approaches to the - Tumor sample no older than 2016 prevention, risk assessment, early detection, diagnosis, and treatment of cancers. - Estimated tumor percentage at least 50% • However, many ethnic groups, especially non-European populations, continue to be - Size of the tumor needs to be at least 0.5 cm x 0.5 cm significantly under-represented in cancer research, including clinical trials, and have - All samples will be de-identified before submission to NYGC not received equal benefits in clinical practice. • Central pathology review at NYGC by P1000 pathologist: - 30 samples reviewed, ~70 in pipeline for shipment to NYGC from total of 14 different sites Our current knowledge about cancer risk, biological behavior and response to - Diverse range of tumor types and ethnicities treatment has primarily been derived from patients with a European ancestry. • Tumor only WES and RNAseq (Illumina Novaseq) - Genomic ancestry via “Ancestry Informed Markers” Aim - Variant calling, allele-specific expression, alternative splicing, fusion discovery Polyethnic-1000 aims to study the genomic landscapes of cancer in the ethnically diverse - HLA typing population of the greater area and thereby: • Data sharing limited for stage I: 1. Deepen our understanding of the contributions ethnicities make to the incidence and - Data accessible to the participating members in NYGC Commons behavior of cancers. - Processed data (somatic variants) presented in cBioPortal 2. Bring genomic innovation to patient populations generally under-represented in à Results will inform pipeline for stage II (Figure 4) research and hence deprived of the benefits of scientific progress. Stage II and III: prospective generation of the Polyethnic-1000 Methods database • Coordinated by the New York Genome Center (NYGC), leaders of the New York City • The study will enroll any patient with cancer and a self-identified race/ethnicity of non- cancer research community joined together to create a dynamic research platform white (stage II). However, specific research proposals will address questions in particular involving the NYGC, academic centers and partnering hospitals in the New York City tumor types with known worse disease outcomes in certain ethnic groups (stage III). region (Figure 1, 2). • Data generated will be integrated with similar national and international initiatives (AACR Project GENIE, ICGC ARGO, and others). • The project is designed to take place in 3 stages (Figure 3): a retrospective feasibility study (stage I) to enable the start of a prospective pilot (stage II) and expansion study (stage III), which will include selected research projects within the Polyethnic-1000 Figure 2. Participating hospitals in the ethnically diverse area of NYC Any self-identified non-white patient with suspected or known cancer umbrella structure. - Specific research proposals may focus on particular tumor type (stage III)

External Advisory Board and GCCG Patient advocacy Stage I: feasibility, tissue coordination study Telegenetics counseling Scientific Review Committee 1. Develop infrastructure Polyethnic-1000 Steering Committee 2. Document pipelines Consent (tumor +/- germline) PIs, GCCG chairs, Project Manager and key members from different sites • Retrospective collection of ~100 de-identified samples from under-represented minorities (any non-white) Working Groups • Tumor WES, RNAseq, any tumor type Diagnostic & Research Biopsy (FF, FFPE) - Sample acquisition and pathology WG - Data sharing WG EDTA blood sample, saliva, others - Sequencing WG - Population genetics and statistics WG - Community hospital outreach WG - Ethics & Governance WG Stage II, prospective pilot Participating hospital NYGC Participating hospital NYGC 1. Test established infrastructure 2. Develop polyethnic database • Standard tx or clinical • Research data Data sharing • Central pathology Central pathology 1 • Research data • ~500-1000 samples from URMs (any non-white patient) trial Sequencing • Clinical report • Local PI confirmation Sequencing Analysis • Clinical report • any tumor type confirmation • Clinical data review • Clinical data annotation • Germline implications • Germline implications • T/N WES/WGS and RNAseq • Review and selection of research proposals 1 Partner Hospitals and Research Institutions Research WES/WGS and RNAseq with clinical sequencing of subset of samples that have potentially actionable genomic alterations • Albert Einstein College of Medicine • Lawrence Hospital • Ralph Lauren Cancer Center Stage III, Polyethnic-1000 database expansion • Bronx VA • Memorial Sloan Kettering Cancer Center • 1. Generate genomic and transcriptomic data that can inform future research Figure 4. Stage II and III, operational flowchart • Brooklyn Methodist Hospital • Mount Sinai Hospital • studies and clinical trials • Cold Spring Harbor Laboratory • • SUNY Downstate 2. Study specific research questions as identified in selected proposals • Medical Center • NYU Langone Health • Weill Cornell Medicine • Hudson Valley Hospital • New York Presbyterian Queens • Others 3. Integrate polyethnic-1000 with other projects Conclusion Polyethnic-1000 provides a collaborative network and dynamic research platform that will Figure 1. Polyethnic-1000 Governance Structure (GCCG, Genome Center Cancer Group) Figure 3. Study design increase our understanding of the role of ethnicities and genetic ancestry in cancer biology.