Meta-Analysis of a Multi-Ethnic, Breast Cancer Case-Control Targeted Sequencing Study
Total Page:16
File Type:pdf, Size:1020Kb
Meta-Analysis of a Multi-Ethnic, Breast Cancer Case-Control Targeted Sequencing Study The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Ablorh, Akweley. 2015. Meta-Analysis of a Multi-Ethnic, Breast Cancer Case-Control Targeted Sequencing Study. Doctoral dissertation, Harvard T.H. Chan School of Public Health. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:16121143 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA META-ANALYSIS OF A MULTI-ETHNIC, BREAST CANCER CASE-CONTROL TARGETED SEQUENCING STUDY AKWELEY ABLORH A Dissertation Submitted to the Faculty of The Harvard T.H. Chan School of Public Health in Partial Fulfillment of the Requirements for the Degree of Doctor of Science Harvard University Boston, Massachusetts. May, 2015 Dissertation Advisor: Dr. Peter Kraft Akweley Ablorh Meta-analysis of a multi-ethnic, breast cancer case-control targeted sequencing study ABSTRACT Breast cancer, the most commonly diagnosed cancer in American women, is a heritable disease with nearly one hundred known genetic risk factors. Using next generation sequencing, we explored the contribution of genetics at 12 GWAS-identified loci to breast cancer susceptibility in a multi-ethnic breast cancer case-control study. Methods: The study population consists of 4,611 breast cancer cases and controls (2,316 cases and 2,295 controls) from four mutually exclusive ethnicities: African, Latina, Japanese, or European American.We conducted rare variant association testing between sequenced genotypes and simulated phenotypes to compare the performance of several approaches for assessing rare variant associations across multiple ethnicities and the statistical performance of different ethnic sampling fractions, including single-ethnicity studies and studies that sample up to four ethnicities. Findings from simulation of causal rare variant penetrance models were then applied to a non- synonymous protein-coding rare variant association study of breast cancer. Next, we applied variance partitioning methods to determine what proportion of breast cancer heritability is explained by rare and common, coding and non-coding, and the complete set of sequenced genetic variants. Results: Variance component-based tests were better powered in several scenarios. Multi-ethnic studies were well powered, with inclusion of African Americans providing the largest gains in statistical power. Rare variation in several genes was nominally associated (alpha=0.05) with breast cancer risk. Common variants explained a significant amount of breast cancer heritability (5%; SE=2%). Total breast cancer heritability from all sequenced SNVs from all 12 loci was approximately 11% (S.E.=4%), a substantial portion of breast cancer heritability which ranges from 27% to 32% in European familial studies. Conclusion: Our findings suggest that association studies between rare variants and complex disease should consider including subjects from multiple ethnicities, with preference given to genetically diverse groups. We demonstrate practices with the potential to uncover and localize gene-based associations using gene-based rare variant association testing at 12 GWAS-identified breast cancer susceptibility loci. We also present strong evidence that just this subset of previously-identified loci explains a substantial portion of heritability which suggests that all GWAS-identified loci may explain more breast cancer heritability than the 17% previously reported. ii Table of Contents List of Figures iv List of Tables v Acknowledgments vi Dedication ix Chapter 1 1 Introduction Chapter 2 Meta-analysis of rare variant association tests in multi-ethnic 6 populations - Introduction 9 - Methods 11 - Results 19 - Discussion 32 - References 36 Chapter 3 Breast cancer and rare protein-coding variation at 12 43 susceptibility loci - Introduction 45 - Methods 48 - Results 57 - Discussion 64 - References 70 Chapter 4 Components of breast cancer heritability in a multi-ethnic 77 targeted sequencing study - Introduction 79 - Methods 81 - Results 90 - Discussion 98 - References 105 iii List of Figures Title Page Figure 2.1: Estimated singleton, doubleton and tripleton count 22 Figure 2.2: Type I error (at nominal 5% alpha) for individual ethnic results and multi- 23 ethnic designs Figure 2.3: Lambda GC when altering sampling fraction between ethnicities 24 Figure 2.4: Burden test and MetaSKAT power under two penetrance models by 26 study population and carrier proportion (relative risk = 2.5) Figure 2.5: Meta-analysis statistic power for all genes in fourteen multi-ethnic 27 populations and four monoethnic populations under two penetrance models Figure 2.6: Non-Synonymous Variants by Sample Size 34 Figure 4.1: Theoretical whole genome SE and observed SE for observed-scale 101 heritability estimates iv List of Tables Title Page Table 2.1: 12 Sequenced Regions, Coverage Depth, and Length 12 Table 2.2: Study Subject Count by Cohort and Ethnicity 13 Table 2.3: Variant, Median Singleton, and Carrier Proportion for Genes with Highest and 20 Lowest Carrier Proportions Table 2.4: Gene Counts by Carrier Proportion and Self-Reported Ethnicity 21 Table 2.5: Gene-specific power for alternate proportion of causal variants – Single Ethnicity 28 Table 2.6: Gene-specific power for alternate proportion of causal variants – Multi-Ethnic 29 Table 2.7: Power by Penetrance Model for two multi-ethnic and four mono-ethnic study 31 populations Table 3.1: Study Participants 49 Table 3.2: 12 Sequenced Regions, Coverage Depth, and Length 51 Table 3.3: Count and Percent of Predictions and Polyphen-2 Scores for Non-Synonymous SNPs 53 Table 3.4: Significant gene meta-analyses for all breast cancer, and by ER status (alpha=0.05) 58 Table 3.5: Significant Gene Meta-Analyses after Correction for Number of Statistical Tests 59 (alphaB =0.025) Table 3.6: Gene-Based Associations by Ethnicity (P-Value < 0.02 in at least one ethnicity) 62 Table 3.7: Correlation between summary effect estimate and ethnicity specific effect 66 estimates by case ER status Table 3.8: Genetic Susceptibility Loci Identified in GWAS by receptor status 67 Table 3.9: Nominal gene-based rare variant breast cancer associations 69 Table 4.1: Study Population and Sequenced Variants by Ethnicity - Counts (Percent) 83 Table 4.2: Targeted Regions and Liability-Scale Heritability (and SE) from sequenced SNPs 85 Table 4.3: Counts of SNVs (M) in sequencing data by minimum MAF and ethnicity 86 Table 4.4: Conditional Analysis of all Sequenced SNPs 92 2 Table 4.5: Rare and Common Variant Breast Cancer Heritability (h g) 94 Table 4.6: Heritability in Coding and DHS Regions 95 Table 4.7: Heritability by Functional Annotation 97 Table 4.8: ER-Positive Breast Cancer heritability 98 Table 4.9: Heritability Estimation in Two Independent Samples 102 2 Table 4.10: Crude, Adjusted, and LD-pruned estimates of Rare, Common and Total hg in 103 Single variance-component analysis Table 4.11: Breast cancer heritability (Percent) in European populations 105 v Acknowledgements This, to paraphrase the most inspiring scientist I have been so fortunate to meet, was the work of many people; many more people than the few I acknowledge here. First and foremost, thank you to the members of my research committee, Dr. Heather Eliassen, Dr. Alkes Price, and Dr. Peter Kraft, for your feedback, time, and commitment to the success of this venture. Thank you to Dr. Peter Kraft and our collaborators at other institutions Dr. Chris Haiman, Dr. Brian Henderson, Dr. Loic Le Marchand, Dr. Seunngeun (Shawn) Lee and Dr. Dan Stram for graciously including me and sharing study resources and feedback as this project developed. My sincere gratitude to the current and former postdoctoral members of the Program in Genetic epidemiology and Statistical Genetics (PGSG) and the Program in Molecular epidemiology And Genetic Epidemiology (PMAGE) who include Dr. Sara Lindstrom and Dr. Sasha Gusev for their openness, flexibility, and insights. I thank my colleagues and former classmates, Dr. Jason Wong, Dr. Shasha Meng, Dr. Zhonghua Liu, Dr. Mitch Machiela, Dr. Mengmeng (Margaret) Du, Maxine Chen, Dr. Chia-Yen Chen, Tristan Hayek, Hilary Finucane, and Dr. Gaurav Bhatia for their collegial nature and unflagging dedication to high standards. I am grateful to the staff for meeting and exceeding expectations including Laura Ruggiero, Connie Chen, Elizabeth Solomon, Donald Halstead and Jill McDonald. I would like to thank members of the Harvard faculty for an unmatched learning experience and especially thank Dr. Ichiro Kawachi, Dr. David Williams, Dr. John Quackenbush, Dr. Lisa Signorello, Dr. Betty Johnson, Dr. Robert Blendon, Dr. David Hunter and Dr. Xihong Lin who generously shared their support or guidance with me along the way. vi Finally, I thank Dr. Alkes Price, truly you inspire me, and Dr. Michelle Williams, Chair of the department of Epidemiology, for leading by example and for ensuring my progress through the Doctor of Science degree was fully supported in every sense that an aspiring investigator could dare to hope for. vii Support for this work was provided by: Grant Source Grant Title Grant Number Funding Dates NIH – National Institute Initiative to Maximize Student Diversity GM055353-13 09/01/2010- of General Medical 08/31/2012 Sciences Department of Health Joint Interdisciplinary Training Program T32 - 09/01/2012 - and Human Services in Biostatistics & Computational GM074897 08/31/2014 Public Health Service Biology NIH – National Institute Statistical Methods for Studies of Rare R01 MH101244 09/01/2014 - of Mental Health Variants 07/01/2015 viii For my friends, family, and loved ones and for the many survivors and their friends and family whose lives have been touched by breast cancer. ix Chapter 1 Introduction Introduction Breast cancer genetic association studies and NGS genotypes Like so many transformative accomplishments, sequencing the first human genome was both the end and the beginning of a journey.