BROWN UNIVERSITY

Dietary Lipids, Relevant Networks, and Biological Interactions

Underlying Obesity and Cardiovascular Disease

A dissertation submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Epidemiology at Brown University

Providence, Rhode Island

by

Qing Liu

May 2019

Ó Copyright 2019 by Qing Liu

This dissertation by Qing Liu is accepted in its present form

by the Department of Epidemiology as satisfying the

dissertation requirement for the degree of Doctor of Philosophy.

Date______

Simin Liu, Advisor

Recommended to the Graduate Council

Date______

Stephen T. McGarvey, Reader

Date______

Wen-Chih Wu, Reader

Date______

Xi Luo, Reader

Approved by the Graduate Council

Date______

Andrew G. Campbell, Dean of the Graduate School

iii

iv Curriculum Vitae

QING LIU School of Public Health Brown University 121 South Main Street Providence, RI 02903

Qing Liu graduated with a B.S./M.D. degree in preventive medicine from Peking

University, Beijing, China. She then obtained a Sc.M. degree in Epidemiology from

Brown University School of Public Health, and later joined the Ph.D. program in

Epidemiology in Fall 2015, with Dr. Simin Liu as her primary advisor. At Brown

University, Qing has been interested in genetic and nutritional epidemiology in the field of cardio-metabolic health outcomes. She had presented some of her thesis work at the American Heart Association’s annual conferences, and her research has contributed to publications in dietary lipids in relation to cardiovascular disease risk.

During her Ph.D. program, she was a graduate teaching assistant for three courses in public health and interned at a healthcare consulting company.

EDUCATION

2015 - present Brown University, School of Public Health (GPA: 4.0) Ph.D. candidate in Epidemiology Major Field of Study: nutritional epidemiology, molecular and genetic epidemiology, cardio-metabolic health outcomes

2013 - 2015 Brown University, School of Public Health (GPA: 4.0) M.S. in Epidemiology Thesis: Change in risks of cardiovascular disease in postmenopausal women associated with substituting different

v types of margarine for butter. (Women’s Health Initiative Study)

2008 - 2013 Peking University, Health Science Center (GPA: 3.86) B.S./M.D. Concentration: Preventive Medicine

AWARDS

2016 Runner-up of Doctoral Student or Post-doctoral Trainee Posters, Public Health Research Day 2015 2nd Place for Epi Department Conference Abstract Competition

RESEARCH EXPERIENCE

01/2018 - present Providence VA Medical Center Research Assistant Develop research proposals and manuscripts for genetics and heart failure using data from the Women’s Health Initiative study, the Jackson Heart Study, and the Framingham Heart Study.

09/2013 - present Center for Global Cardiometabolic Health, Brown University Research Assistant Develop research proposals and manuscripts for nutrition, genetics, and cardio-metabolic health outcomes using data from the Women’s Health Initiative study, the Jackson Heart Study, and the Framingham Heart Study.

vi Assisted on grant writing for dietary intervention trials among pregnant women and their offspring.

06/2017 - 08/2017 Center for Gerontology and Healthcare Research, Brown University Research Assistant Assisted on data analysis for a R01 project: Anticholinergic/sedentary medication and fracture among Nursing Home Residents

02/2014 - 08/2015 Center for Primary Care and Prevention, Brown University Research Assistant Developed research proposals and manuscripts for dietary/endogenous lipids and cardiovascular disease using data from the Women’s Health Initiative study.

05/2014 - 08/2014 The China Health and Retirement Longitudinal Study (CHARLS) Research Assistant Assisted data analysis and manuscript writing related to chronic kidney disease using data from the China Health and Retirement Longitudinal Study

TEACHING EXPERIENCE

09/2016 - 12/2016 Brown University Teaching Assistant PHP 1700 Current Topics in Environmental Health Facilitated student discussions; wrote and graded exams.

09/2017 – 12/2017 Brown University Teaching Assistant and Guest Lecturer

vii PHP 0320 Introduction to Public Health Facilitated student discussions; graded homework; guest lectured “Epidemiology: Study Design and Measures”.

09/2018 – 12/2018 Brown University Teaching Assistant PHP 2018 Epidemiology of Cardio-Metabolic Health Wrote and graded exams; prepared lab materials.

PROFESIONAL WORK

06/2018 - 08/2018 Analysis Group, Inc. Summer Associate Led one project showing seasonal trend and regional distribution of clinical asthma exacerbation using insurance claim data.

Developed study concepts, protocols, and assisted data analysis for multiple projects involving cancer therapies.

PUBLICATIONS

Liu Q, Lichtenstein AH, Matthan NR, et al. Higher Lipophilic Index Indicates Higher Risk of Coronary Heart Disease in Postmenopausal Women. Lipids, 2017; 52(8):687-702. doi: 10.1007/s11745-017-4276-8. PMID: 28689316

Liu Q, Rossouw JE, Roberts MB, et al. Theoretical Effects of Substituting Butter with Margarine on Risk of Cardiovascular Disease. Epidemiology, 2017; 28: 145–156. doi: 10.1097/EDE.0000000000000557. PMID: 27648593

Liu S, Liu Q. Personalized magnesium intervention to improve vitamin D metabolism: applying a systems approach for precision nutrition in large randomized trials of diverse populations. The American journal of clinical nutrition, 2018; 108 (6): 1159-1161. doi: https://doi.org/10.1093/ajcn/nqy294.

viii Jankowich M, Elston B, Liu Q, et al. Restrictive Spirometry Pattern, Cardiac Structure and Function, and Incident Heart Failure in African Americans. The Jackson Heart Study. Ann Am Thorac Soc. 2018;15(10):1186-1196. doi: 10.1513/AnnalsATS.201803-184OC. PMID: 30011374

Zhang T, Zullo AR, Shireman TI, Lee Y, Mor V, Liu Q, et al. Epidemiology of hip fracture in nursing home residents with multiple sclerosis. Disabil Health J. 2018;11(4):591-597. doi: 10.1016/j.dhjo.2018.03.002. PMID: 29598927

Wang S, Chen R, Liu Q, et.al. Prevalence, awareness and treatment of chronic kidney disease among mid-aged and elderly: data from the China Health and Retirement Longitudinal Study. Nephrology, 2015 Jul; 20(7):474-84. doi: 10.1111/nep.12449. PMID: 25773384

Shengfeng Wang, Qing Liu, Qingmin Liu, et al. Assessment on the impact of smoking cessation counseling service and related influencing factors under comprehensive community-based intervention programs about the community medical staff in Hangzhou city. Chinese Journal of Epidemiology, 2014; 35(9):1002-6. PMID: 25492140

WORK UNDER REVIEW / IN PREPARATION

“Discovery of Biological Pathways and Gene Networks for Heart Failure with Preserved and Reduced Ejection Fraction in Women Across Ethnicities”

“Pathways and Gene Networks for Obesity and Potential Interactions with Dietary Lipid in Women Across Ethnicities”

“Plasma Phospholipid Fatty Acid and Coronary Heart Disease Risk: A Nested Case-Control Study within the Women’s Health Initiative Observational Study.”

“Magnesium Intake, Cognitive Impairment, and Dementia in Postmenopausal Women: Results from Women’s Health Initiative Memory Study”

ix

“Prospective associations of waist-to-height ratio with cardiovascular events of diabetic and non-diabetic postmenopausal women: Results from Women’s Health Initiative”

“Prospective Associations of Added Sugar Intake with risk of Atherosclerotic Cardiovascular Disease: Results from Women’s Health Initiative”

“A meta-analysis on subjective sleep quality and metabolic syndrome Running title: Sleep quality and metabolic syndrome”

“Gene polymorphism, vitamin B and the risk of stroke or coronary heart disease: Results from the Women’s Health Initiative”

ABSTRACTS AND PRESENTATIONS

Liu Q, Chan KH, Luo X, Reiner AP, McGarvey ST, Liu S. Abstract 15406: Gene-Dietary Fat Interactions on Obesity Among Postmenopausal Women Participated in the Women's Health Initiative Study. Circulation. 136 (Suppl 1), A15406-A15406. Accepted for the American Heart Association 2017 Scientific Session. November 11-15th, 2017, Anaheim, CA, USA.

Liu Q, Lichtenstein AH, Matthan NR, Liu S, Allison MA, Howard BV, Martin LW, Valdiviezo C, Manson JE, Eaton CB. Abstract MP38: Plasma Lipophilic Index and Risk of Coronary Heart Disease in Postmenopausal Women. Circulation. 133 (Suppl 1), AMP38-AMP38. Accepted for the American Heart Association EPI/Lifestyle 2016 Scientific Session. March 1- 4th, 2016, Phoenix, AZ, USA.

Liu Q, Lichtenstein AH, Tinker LF, Neuhouser ML, Van Horn LV, Howard BV, Manson JE, Rossouw JE, Matthan NR, Allison MA, Martin LW, Li W, Snetselaar LG, Wang L, Eaton CB. Abstract MP39: Plasma Phospholipid Fatty Acid and Coronary Heart Disease Risk. Circulation. 133 (Suppl 1),

x AMP39-AMP39. Accepted for the American Heart Association EPI/Lifestyle 2016 Scientific Session. March 1-4th, 2016, Phoenix, AZ, USA.

JOURNAL REVIEW Reviewer – Medicine Reviewer – American Journal of Clinical Nutrition

LANGUAGES English – Full professional proficiency Chinese - Native language

SERVICES AND MEMBERSHIPS Member – The American Heart Association

xi Preface

This dissertation includes three publishable manuscripts (Chapter 2-4) reformatted to comply with the requirements of the Graduate School at Brown

University.

Chapter 1 is an overview of background, significance, and rationale in support of the specific aims of the research reported in the following chapters.

Chapter 2 is titled “Higher Lipophilic Index Indicates Higher Risk of

Coronary Heart Disease in Postmenopausal Women”, which was published in the

Journal of Lipids.

Chapter 3 is titled “Discovery of Biological Pathways and Gene Networks for

Heart Failure with Preserved and Reduced Ejection Fraction in Women Across

Ethnicities”, which has been completed and currently under peer-review for publication.

Chapter 4 is titled “Pathways and Gene Networks for Obesity and Potential

Interactions with Dietary Lipid in Women Across Ethnicities”, which is currently in preparation for submission. An abstract including part of the analysis was accepted by and presented at the American Heart Association (AHA)’s Scientific Sessions in

2017.

xii Acknowledgements

This dissertation concludes my doctoral research in the Department of

Epidemiology, School of Public Health, at Brown University. I would like to express my gratefulness to those who have helped and supported me during my Ph.D. years, from research to personal life. Without you, I could not be able to accomplish this success and be a person I am now.

First, I would like to thank my primary advisor, Dr. Simin Liu, for his guidance, insight, and encouragement throughout my doctoral training. I am grateful for all the great opportunities, resources, criticisms, and space for independent thinking and self-exploration he has provided me with. His guidance and support were not only limited in academic, but also in many other aspects of life, with which I was able to conquer the uncountable challenges I have been facing during the past almost four years and become a public health professional and independent researcher.

I would also like to thank Dr. Wen-Chih (Hank) Wu, my RAship advisor, for his continuous guidance and support. It has been a real pleasure working with him. He is very concerned about me, not only small issues arising in my research, but also my work-life balance and emotional ups and downs. Thank you for making me feel so warm and secure.

I also want to express my gratitude to my committee members, Drs. Stephen

(Steve) T. McGarvey and Xi (Rossi) Luo, for their tremendous input, comments, and feedback to my dissertation work. I sincere appreciate their guidance and encouragement throughout my dissertation research. Furthermore, I would like to thank my master’s advisor, Dr. Charles (Chuck) B. Eaton. Without his guidance and support during my master’s years at Brown, my dissertation would not have a great start.

xiii I would like to thank all the faculty and staff in the Epidemiology Department, and all other faculty and staff of the School of Public Health, who have taught me or offered me help during my years here since 2013: Eric Loucks, Gregory Wellenius,

Brandon Marshall, David Savitz, Mark Lurie, Karl Kelsey, Chanelle Howe, Joseph

Braun, Alison Field, Steve Buka, Ilana Gareen, Yen-Tsung Huang, Tongzhang

Zheng, Dominique Michaud, Joseph Hogan, Roee Gutman, Christopher Schmid, Jean

Wu, Cici Bauer, Abigail Harrison, Caroline Kuo, Don Operario, Brittany Leclerc,

Vickie Beaulieu, Currie Touloumtzis, Barbara Dailey, Joel Hernandez, Rosenny

Taveras, and Kathryn Petterson. I also want to acknowledge the contributions of the participants, investigators, and staff of the Women’s Health Initiative Study (WHI), the Jackson Heart Study (JHS) and the Framingham Heart Study (FHS).

I would like to acknowledge all members of my research group: Xiaochen Lin,

Mengna Huang, Stephannie Shih, Jie Li, Katie Chan, Isabel Zhang, Yidan Zhang,

Kenneth Lo, Priyanka Joshi, Aimin Yang, and Linlin Li. It has been fantastic to work with you all. It is a pleasure to share the office with my friendly peers: Heather Lee,

Annie Wentz, Geetika Kalloo, Rachel Denlinger, and Joella Adams. Special thanks to

Heather, my best friend at Brown. My life has been enriched because of you and I feel so lucky to have you here. Also thanks to all my other colleagues at Brown: Shuchen

Zheng, Bo Wei, Jun Ke, Qi Zhang, Guijin Zou, Ashleigh Lovette, Kimberly Glazer,

Samantha Kingsley, Su Chu, Marcia Pescador Jimenez, Sina Noshadjamal, Courtney

Choy, Jason Gantenberg, Hongseok Kim, Hannah Ziobrowski, Qianhui Lu, Keven

Wu, Yi Zhao, Chengyang Gu, Yi Zhang, Xuefei Cao, Yizhen Xu, Xin Zhan, Wentao

Guan, Ruoshui Zhai, Chun Chen, and Zhimeng Ouyang; my colleagues from Peking

University: Xiaowei Wu, Xingchao Peng, Mingxin Zheng, Jingyu Peng, Mingxing

Liu, Han Bao, Renze Wang, Qiong Ye, Yue Huang, Zhaoheng Gong, Jingxuan Wu,

xiv Zhen Zhang, Hong Yue, Guangyan Du, Yuan Ren, and Ming Shan; and my Zumba class instructor, Diego.

I also want to thank my best-ever friend, Peiyao Luo. I am grateful for having you over the past sixteen years, from middle school, high school, to college and graduate school. Regardless how far the physical distance is – Xi’an to Beijing, or

America to the Netherlands, our hearts are always being together. We went through each other’s happiness and sorrow by holding the hands and witnessed every single progress each of us has made. Thank you for always standing by my side and being a crucial part of my life!

In addition, I want to say thank you to my boyfriend, Lingfeng Zhou, for his love and support. The distance between Beijing and Providence is not just 7,000 miles or the time difference of 12 hours. Thank you for your endeavor to come to me. The past seven years has been hard for both of us, but more importantly, unforgettable!

Finally, I would like to give my special thanks to my parents, Xuanping Liu and Xiuchan Jin. Thank you for being good examples I really look up to, providing me with unconditional love, endless support, and the freedom to try and learn new things, as well as being my close friends and great listeners. Because of you, I have broadened my horizons, been able to be a person am I now, and have the courage and confidence to face the future! And to my passed-away grandparents, I felt grief and pain for not spending your last days with you. However, I will make greater efforts to my future work as you had expected.

Qing Liu

Providence, December 2018

xv Table of Contents

Title Page ...... i

Copyright Page ...... ii

Signature Page ...... iii

Curriculum Vitae ...... v

Preface ...... xii

Acknowledgements ...... xiii

Table of Contents ...... xvi

List of Tables ...... xx

List of Illustrations ...... xxiii

Chapter 1: Introduction ...... 1

1.1 Background and Significance ...... 2

1.2 Specific Aims ...... 5

1.3 References ...... 6

Chapter 2: ...... 11

Higher Lipophilic Index Indicates Higher Risk of Coronary Heart Disease in

Postmenopausal Women ...... 11

2.1 Introduction ...... 12

2.2 Objectives ...... 13

2.3 Methods ...... 13

2.3.1 Study Population ...... 13

xvi 2.3.2 Lipophilic Index ...... 15

2.3.3 Covariates ...... 17

2.3.4 Statistical Analysis ...... 17

2.4 Results ...... 20

2.4.1 Associations Between LI and CHD ...... 21

2.4.2 Sensitivity Analyses ...... 22

2.5 Discussion ...... 22

2.6 Conclusion ...... 25

2.7 Tables and Figures ...... 27

2.8 Supplemental Material ...... 36

2.9 References ...... 44

Chapter 3: ...... 49

Discovery of Biological Pathways and Gene Networks for Heart Failure with

Preserved and Reduced Ejection Fraction in Women Across Ethnicities ...... 49

3.1 Introduction ...... 50

3.2 Objectives ...... 51

3.3 Methods ...... 51

3.3.1 Study Population ...... 51

3.3.2 Definition of Heart Failure ...... 53

3.3.3 Genotype Data ...... 53

3.3.4 Statistical Analysis ...... 54

3.4 Results ...... 56

3.4.1 Identification of Significant Genetic Loci Using Standard GWAS Analysis

...... 56

xvii 3.4.2 Identification of Biological Pathways Using Integrative Pathway Analysis

...... 57

3.4.3 Identification of Key Drivers for HFpEF and HFrEF ...... 58

3.5 Discussion ...... 59

3.6 Conclusion ...... 66

3.7 Tables and Figures ...... 67

3.8 Supplemental Material ...... 75

3.9 References ...... 91

Chapter 4: ...... 102

Pathways and Gene Networks for Obesity and Potential Interactions with Dietary

Lipid in Women Across Ethnicities ...... 102

4.1 Introduction ...... 103

4.2 Objectives ...... 104

4.3 Methods ...... 105

4.3.1 Study Population ...... 105

4.3.2 Genetic Data ...... 106

4.3.3 Measurement of Dietary Lipid ...... 106

4.3.4 Measurement of Outcome ...... 107

4.3.5 Statistical Analysis ...... 107

4.4 Results ...... 111

4.4.1 Identification of Significant Genetic Loci Using Standard GWAS Analysis

...... 112

4.4.2 Identification of Biological Pathways Using Integrative Pathway Analysis

...... 112

xviii 4.4.3 Identification of Key Drivers for Obesity ...... 113

4.4.4 Gene-dietary Lipid Interactions on Obesity ...... 113

4.4.5 Genetic Predisposition on Cardio-metabolic Disorders Mediated Through

Obesity ...... 114

4.5 Discussion ...... 114

4.6 Conclusion ...... 121

4.7 Tables and Figures ...... 122

4.8 Supplemental Material ...... 130

4.9 References ...... 136

xix List of Tables

Table 2.1 Baseline Characteristics by Quartiles of Dietary Lipophilic Index among

Participants in the Cohort Study of Women’s Health Initiative (1994-2014)

(N=85,563) ...... 27

Table 2.2 Baseline Characteristics among Participants in the Matched Case-control

Study of Women’s Health Initiative (1994-2005) (N=2428) ...... 30

Table 2.3 Hazard Ratios or Odds Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary/Plasma PL Lipophilic Index in the Cohort Study

(1994-2014, N=85,563) and the Matched Case-Control Study (1994-2005, N=2428) of Women’s Health Initiative ...... 33

Supplemental Table 2.1 Melting Points and Median (IQR) Levels for Individual Fatty

Acids, and Spearman’s Correlation Coefficients (r) for Correlation Between

Individual Fatty Acids and the Lipophilic Index Measured in Diets from the Cohort

Study (1994-2014) and in Plasma PL from the Case-Control Study (1994-2005) of the

Women’s Health Initiative ...... 36

Supplemental Table 2.2 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipophilic Index/Load among Participants without

Missing Covariates in the Cohort Study (1994-2014, N=60,079) of Women’s Health

Initiative ...... 38

Supplemental Table 2.3 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipids in the Cohort Study (1994-2014, N=85,563) of Women’s Health Initiative ...... 39

Supplemental Table 2.4 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipophilic Index and Lipids among Participants

xx with Blood Lipids in the Cohort Study (1994-2014, N=6,188) of Women’s Health

Initiative ...... 41

Supplemental Table 2.5 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipophilic Index/Load Calculated Using Different

Methods among Participants in the Cohort Study (1994-2014, N=85,563) of Women’s

Health Initiative ...... 42

Table 3.1 Baseline Characteristics of African and Caucasian American Women in

Study Populations a ...... 67

Table 3.2 Validated and Newly Discovered Loci for Heart Failure among African

Americans and Caucasian Americans in the Women’s Health Initiative Study ...... 69

Table 3.3 Biological Pathways Enriched for HFrEF and HFpEF among African and

Caucasian American Women across Ethnicities a ...... 71

Supplemental Table 3.1 Validation and Allocation of Previously Identified GWAS

Loci for Heart Failure in the Women’s Health Initiative Study a ...... 75

Supplemental Table 3.2 Novel Genome-Wide Significant SNPs for Heart Failure among African and Caucasian American Women ...... 77

Supplemental Table 3.3 Common Name for Close to Validated and Newly

Discovered Heart Failure Loci ...... 78

Supplemental Table 3.4 Pathways Identified Specifically for HFrEF by Multiple

Pathway Methodologies a ...... 80

Supplemental Table 3.5 Pathways Identified Specifically for HFpEF by Multiple

Pathway Methodologies a ...... 81

Supplemental Table 3.6 Characteristics of Identified Significant Pathways for Heart

Failure by Ethnicities ...... 82

xxi

Table 4.1 Baseline Characteristics of Participants in the Women’s Health Initiative

Study ...... 122

Table 4.2 Top Genome-wide SNPs for Obesity among African Americans and

Caucasian Americans ...... 123

Table 4.3 Validation of Previously Identified GWAS Obesity Loci in the Women’s

Health Initiative Study a ...... 124

Table 4.4 Pathways Identified for Obesity by Multiple Pathway Methodologies a ... 126

Table 4.5 Interactions between Key Driver Genes and Dietary Lipophilic Index on

BMI in the Women’s Health Initiative Study a ...... 127

Supplemental Table 4.1 Baseline Characteristics of Female Participants in the Jackson

Heart Study and Framingham Heart Study ...... 130

Supplemental Table 4.2 Characteristics of Identified Significant Pathways for Obesity by Ethnicities ...... 131

Supplemental Table 4.3 Effect of Genes Involved in Axon Guidance Pathway on

Cardio-metabolic Disorders Mediated Through BMI among Participants in the

Women’s Health Initiative Study ...... 133

xxii List of Illustrations

Figure 2.1 Population Selection Process. CHD: Coronary Heart Disease ...... 35

Supplemental Figure 2.1 Causal Diagrams Hypothesized for the Two Study Designs

...... 43

Figure 3.1 Venn Diagram for Biological Pathways Enriched for HFrEF and HFpEF among African and Caucasian American Women across Ethnicities ...... 72

Figure 3.2 Network of 5 Pathways Enriched for HFrEF and HFpEF with Top 10 Key

Driver Genes among African and Caucasian American Women ...... 73

Figure 3.3 Network of 6 Pathways Enriched for HFpEF with Top 10 Key Driver

Genes among African and Caucasian American Women ...... 74

Supplemental Figure 3.1 Manhattan plots of heart failure among African Americans

(n=8,298) and Caucasian Americans (n=4,257) in the Women’s Health Initiative

Study ...... 87

Supplemental Figure 3.2 Network key driver genes of the pathways enriched for

HFrEF and HFpEF among African and Caucasian American Women ...... 90

Figure 4.1 Network of 6 Pathways Enriched for Obesity with Top 10 Key Driver

Genes among African and Caucasian American Women ...... 129

Supplemental Figure 4.1 Manhattan Plots of Obesity among African Americans (A. n=7,678) and Caucasian Americans (B. n=4,079) in the Women’s Health Initiative

Study ...... 134

xxiii Supplemental Figure 4.2 Network Key Driver Genes of the Pathways Enriched for

Obesity among African Americans and Caucasian Americans ...... 135

xxiv

Chapter 1: Introduction

1 1.1 Background and Significance

Over the past several decades, the epidemics of cardio-metabolic disorders, including obesity, coronary heart disease (CHD), and heart failure (HF), have continued unabated becoming one of the biggest burdens of diseases to societies, families, and individuals worldwide 1,2. Based on comparable risk assessment methods, unhealthy behaviors, including unhealthy diet patterns, inappropriate energy balance, physical inactivity, and smoking, are the foremost risk determinants for cardio-metabolic disorders globally. Among the modifiable behavioral risk factors, suboptimal diet is the leading cause of death and disabilities related to cardio- metabolic health outcomes 3. In 2011-2012, the prevalence of ideal levels of diet

(Healthy diet score >80) was only 1.5 % in U.S. adults and 41% population were poor in diet quality 3.

Dietary lipid consumptions, especially saturated fats and polyunsaturated fats, are a major component of diet quality. Current dietary guidelines from leading professional authorities have made several healthier diet recommendations related to lipid intake for the improvement of cardio-metabolic health. In particular, increasing consumption of fish and nuts while limiting intakes of saturated fats and processed red meats have been a focus of public health campaign in the United States 4-6. A large number of randomized control trails (RCTs) and prospective cohort studies have examined the roles of dietary lipids in the development of cardio-metabolic disorders, although findings from these studies have not been consistent. Most dietary intervention trials have demonstrated that diets high in saturated fatty acids (SFA), compared with polyunsaturated fatty acids (PUFA), tend to elevate short-term total cholesterol and low-density lipoprotein cholesterol (LDL-C) with potential harmful effects on cardiovascular disease (CVD) risk 7-10. However, evidence from RCTs, especially long-term dietary interventions in healthy populations remains very sparse.

2 A recent meta-analysis of prospective observational studies and randomized trials concluded that there was no significant association of dietary SFA, MUFA, and n-6

PUFA with CHD risk, but a positive association of trans fatty acids (TFA), and an inverse association of long-chain n-3 PUFA with CHD risk 11; a meta-analysis of 53

RCTs in apparently healthy populations showed that low-fat diets did not perform better than other dietary interventions for long-term weight loss among adults 12; and a meta-analysis of 6 trials found insufficient evidence concerning the potential effect of a Mediterranean diet on decreased blood pressure 13.

One possible explanation for the inconsistent results from these intervention trials of dietary fats may be due to the lack of consideration of the genetic background among participants enrolled in these trials. It has been well recognized that not all people exposed to “unhealthy” dietary lipids become obese or experience more severe cardio-metabolic disorders, indicating the existence of underlying genetic mechanisms affecting individual responses to a particular type of dietary lipids.

Indeed, a large body of genome-wide association studies (GWAS) have recently identified many genetic variants playing a critically important role in regulating energy homeostasis, inflammation, insulin resistance, adipogenesis, and additional

CVD risk factors. To date, GWAS have identified over 300 loci associated with BMI and other traits linked with obesity 14, about 50 loci associated with myocardial infarction and CHD 15,16, and about 20 loci associated with HF 17. Although the expected heritability estimated from family studies, twin studies, and unrelated individuals was estimated to be 30-70% for BMI 18,19 and 40-60% for CHD 20,21, these identified genetic loci collectively account for only 10% of heritability for BMI 22, 15-

20% for CHD 23, and unclear for HF, leading to the term “missing heritability” to describe this clear gap between the expected and the observed heritability for cardio- metabolic disorders. The main hypothesis is that it is made up of much larger numbers

3 of common variants with small effect size not yet to be found by GWAS, gene- environment interaction, gene-gene interaction, unidentified rare variants, and structural variants 24.

Efforts have been made to identify more common genetic variants related to cardio-metabolic health by performing genome-wide meta-analysis over multiple cohorts, thus increasing the power to detect significant genetic signals 16,25,26.

Integrative approaches, such as pathway/network analysis, which summarize multiple genetic variants over certain biological functions, provide additional evidence for genetic determinants of cardio-metabolic health and yield new insights for biological mechanisms underlying cardio-metabolic disorders 27-29.

Accumulating evidence suggests that genetic variants interact with diet, contributing to the missing heritability, therefore indicating that gene-diet interactions

(G × E) are also of fundamental importance in the development of cardio-metabolic health outcomes 30-32. Unfortunately, most of the aforementioned studies that examined genetic components of cardio-metabolic disorders by performing meta- analysis or utilizing integrative approaches failed to consider gene-diet interactions.

Therefore, there is an urgent need for a better understanding of the gene-dietary lipid interactions in the development of cardio-metabolic health outcomes.

4 1.2 Specific Aims

The primary goal of this proposed dissertation is to investigate the genetic determinates and potential gene-dietary lipid interactions on cardio-metabolic health outcomes, using an integrative systems epidemiologic approach to analyze data emanated from multiple high-quality prospective cohort studies of women in the

United States. In the following chapters, we will discuss the proposed analysis plan and findings with regard to several specific aims listed herein:

1) To evaluate the association between a novel dietary lipids quality index -

lipophilic index, and CHD risk, particularly in assessing potential

difference between different sources of lipids in relation to CHD, using

data from the Women’s Health Initiative study (WHI).

2) To conduct a genetic pathway and network analysis for heart failure based

on a genome-wide association analysis to identify novel genetic markers

and assess underlying biological mechanisms of heart failure, using the

1000 Genome imputed data from the Women’s Health Initiative study

(WHI), the Jackson Heart Study (JHS), and the Framingham Heart Study

(FHS).

3) To conduct a genetic pathway and network analysis of obesity based on a

genome-wide association analysis to identify novel genetic markers of

obesity and evaluate the integrated genetic effects on obesity, and

additionally explore potential gene-dietary lipid interactions on obesity

and the genetic predisposition mediated through obesity to other cardio-

metabolic disorders, using the 1000 Genome imputed data from the

Women’s Health Initiative study (WHI), the Jackson Heart Study (JHS),

and the Framingham Heart Study (FHS).

5 1.3 References

1. Flegal KM, Carroll MD, Kit BK, Ogden CL. Prevalence of obesity and trends

in the distribution of body mass index among US adults, 1999-2010. Jama

2012;307:491-7.

2. Mozaffarian D, Benjamin EJ, Go AS, et al. Heart disease and stroke statistics-

-2015 update: a report from the American Heart Association. Circulation

2015;131:e29-322.

3. Mozaffarian D, Benjamin EJ, Go AS, et al. Heart Disease and Stroke

Statistics-2016 Update: A Report From the American Heart Association.

Circulation 2016;133:e38-360.

4. Health UDo, Services H. 2015–2020 dietary guidelines for Americans.

Washington (DC): USDA 2015.

5. Eckel RH, Jakicic JM, Ard JD, et al. 2013 AHA/ACC guideline on lifestyle

management to reduce cardiovascular risk: a report of the American College

of Cardiology/American Heart Association Task Force on Practice Guidelines.

Circulation 2014;129:S76-99.

6. Organization WH. Food based dietary guidelines in the WHO European

Region. Copenhagen, Denmark: WHO 2003.

7. Mensink RP, Katan MB. Effect of a diet enriched with monounsaturated or

polyunsaturated fatty acids on levels of low-density and high-density

lipoprotein cholesterol in healthy women and men. The New England journal

of medicine 1989;321:436-41.

8. Appel LJ, Sacks FM, Carey VJ, et al. Effects of , monounsaturated fat,

and carbohydrate intake on blood pressure and serum lipids: results of the

OmniHeart randomized trial. Jama 2005;294:2455-64.

6 9. Berglund L, Lefevre M, Ginsberg HN, et al. Comparison of monounsaturated

fat with carbohydrates as a replacement for saturated fat in subjects with a

high metabolic risk profile: studies in the fasting and postprandial states. The

American journal of clinical nutrition 2007;86:1611-20.

10. Liu Q, Rossouw JE, Roberts MB, et al. Theoretical Effects of Substituting

Butter with Margarine on Risk of Cardiovascular Disease. Epidemiology

(Cambridge, Mass) 2017;28:145-56.

11. Chowdhury R, Warnakula S, Kunutsor S, et al. Association of dietary,

circulating, and supplement fatty acids with coronary risk: a systematic review

and meta-analysis. Annals of internal medicine 2014;160:398-406-.

12. Tobias DK, Chen M, Manson JE, Ludwig DS, Willett W, Hu FB. Effect of

low-fat diet interventions versus other diet interventions on long-term weight

change in adults: a systematic review and meta-analysis. The lancet Diabetes

& endocrinology 2015;3:968-79.

13. Nissensohn M, Roman-Vinas B, Sanchez-Villegas A, Piscopo S, Serra-Majem

L. The Effect of the Mediterranean Diet on Hypertension: A Systematic

Review and Meta-Analysis. Journal of nutrition education and behavior

2016;48:42-53.e1.

14. Pigeyre M, Yazdi FT, Kaur Y, Meyre D. Recent progress in genetics,

epigenetics and metagenomics unveils the pathophysiology of human obesity.

Clinical science (London, England : 1979) 2016;130:943-86.

15. O'Donnell CJ, Nabel EG. Genomics of cardiovascular disease. The New

England journal of medicine 2011;365:2098-109.

16. Nikpay M, Goel A, Won HH, et al. A comprehensive 1,000 Genomes-based

genome-wide association meta-analysis of coronary artery disease. Nature

genetics 2015;47:1121-30.

7 17. Lopes LR, Elliott PM. Genetics of heart failure. Biochimica et biophysica acta

2013;1832:2451-61.

18. Yang J, Bakshi A, Zhu Z, et al. Genetic variance estimation with imputed

variants finds negligible missing heritability for human height and body mass

index. Nature genetics 2015;47:1114-20.

19. Bray MS, Loos RJ, McCaffery JM, et al. NIH working group report-using

genomic information to guide weight management: From universal to

precision treatment. Obesity (Silver Spring, Md) 2016;24:14-22.

20. Roberts R. Genetics of coronary artery disease. Circulation research

2014;114:1890-903.

21. Vinkhuyzen AA, Wray NR, Yang J, Goddard ME, Visscher PM. Estimation

and partition of heritability in human populations using whole-genome

analysis methods. Annual review of genetics 2013;47:75-95.

22. Xia Q, Grant SF. The genetics of human obesity. Annals of the New York

Academy of Sciences 2013;1281:178-90.

23. Deloukas P, Kanoni S, Willenborg C, et al. Large-scale association analysis

identifies new risk loci for coronary artery disease. Nature genetics

2013;45:25-33.

24. Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of

complex diseases. Nature 2009;461:747-53.

25. Bradfield JP, Taal HR, Timpson NJ, et al. A genome-wide association meta-

analysis identifies new childhood obesity loci. Nature genetics 2012;44:526-

31.

26. Hagg S, Ganna A, Van Der Laan SW, et al. Gene-based meta-analysis of

genome-wide association studies implicates new loci involved in obesity.

Human molecular genetics 2015;24:6849-60.

8 27. Locke AE, Kahali B, Berndt SI, et al. Genetic studies of body mass index

yield new insights for obesity biology. Nature 2015;518:197-206.

28. Chan KH, Huang YT, Meng Q, et al. Shared molecular pathways and gene

networks for cardiovascular disease and type 2 diabetes mellitus in women

across diverse ethnicities. Circulation Cardiovascular genetics 2014;7:911-9.

29. Kathiresan S, Voight BF, Purcell S, et al. Genome-wide association of early-

onset myocardial infarction with single nucleotide polymorphisms and copy

number variants. Nature genetics 2009;41:334-41.

30. Velez Edwards DR, Naj AC, Monda K, et al. Gene-environment interactions

and obesity traits among postmenopausal African-American and Hispanic

women in the Women's Health Initiative SHARe Study. Human genetics

2013;132:323-36.

31. Xiang L, Wu H, Pan A, et al. FTO genotype and weight loss in diet and

lifestyle interventions: a systematic review and meta-analysis. The American

journal of clinical nutrition 2016;103:1162-70.

32. Nettleton JA, Follis JL, Ngwa JS, et al. Gene x dietary pattern interactions in

obesity: analysis of up to 68 317 adults of European ancestry. Human

molecular genetics 2015;24:4728-38.

9

10

Chapter 2:

Higher Lipophilic Index Indicates Higher Risk of Coronary Heart Disease in

Postmenopausal Women

(This work has been published in the journal of Lipids)

11 2.1 Introduction

Cardiovascular disease (CVD) is the leading cause of death in the U.S., which accounts for 31.3% of total deaths 1. Coronary heart disease (CHD) is the most common type of CVD and accounts for over 30% of prevalent CVD and 47.7% of

CVD death 2. Fatty acids (FA) play important roles in cardiovascular pathophysiology. Different FAs have different properties such as binding affinity, viscosity, and lipophilicity, which determine the orientation of membrane-bound , further influencing lipoprotein metabolism and the activity of membrane bound enzymes, receptors and other proteins that can affect CVD risk 3,4. A unique characteristic of FA is the melting point, which is determined by the length and degree of unsaturation of FA chains, and has been shown to reflect the lipophilicity of

FA. The lipophilic index (LI) was firstly developed by Ding, EL in 2008, summarizing individual FA levels and their melting points, and was applied for predicting CHD risk using erythrocyte and plasma FA 5. LI provides a novel method to capture overall FA lipophilicity, with lower value indicating lower lipophilicity and higher membrane fluidity.

Two studies have examined the association between LI and CHD risk. In a nested case-control study of the Health Professionals Follow-up Study which included U.S. men aged 40-70 years, Wu et al 6 examined the association of plasma and erythrocyte

LI with CHD risk. They found that higher plasma LI was significantly associated with an increased risk of CHD (risk ratio (RR) =1.61, 95% confidence internal (CI): 1.03-

2.53, comparing extreme quintiles), while erythrocyte LI was not. In another matched case-control study of Hispanic Americans living in the Central Valley of Costa Rica by Toledo et al 7, the LI of diet, plasma, red blood cells, and adipose tissue were used to evaluate the association with myocardial infarction. Higher LI derived from diet and adipose tissue were associated with an elevated risk of myocardial infarction

12 (RR=1.57, 95% CI: 1.22-2.02, and 1.30, 95% CI: 1.00-1.69, respectively, comparing extreme quintiles). Currently, evidence regarding the relationship between LI and

CHD is limited, especially the evidence from large representative population of U.S. women. Given these findings, we examined the association of dietary LI with CHD risk among postmenopausal women who participated in the Women’s Health

Initiative (WHI) observational cohort study, and performed a separate analysis of plasma phospholipid (PL) LI in a case-control study nested in the observational cohort study as plasma PL FAs were not measured for all WHI cohort participants.

We hypothesized that LI will be positively associated with CHD risk.

2.2 Objectives

The objectives of this chapter are:

1. To evaluate the association between lipophilic index (LI) derived from

dietary fatty acids with the risk of coronary heart disease (CHD) among

postmenopausal women in the Women Health Initiative (WHI) study.

2. To evaluate the association between LI derived from plasma phospholipid

fatty acids with the risk of CHD among postmenopausal women in the

WHI.

3. To examine potential differences regarding different sources of fatty acids

in prediction of CHD risk.

2.3 Methods

2.3.1 Study Population

The WHI observational cohort study enrolled 93,676 postmenopausal women

(age 50-79) at 40 clinical centers in the U.S. from 1994 to 1998. A detailed description of the WHI observational cohort study design has been published

13 elsewhere 8,9. All incident CHD cases, which were defined as hospitalized myocardial infarction, definite silent myocardial infarction, and deaths due to definite CHD or possible CHD, were confirmed based on review of medical records and death certificate by trained physician adjudicators 10. For the cohort analysis that evaluated the association of diet derived LI with CHD risk, women were excluded from the

93,676 WHI participants based on the following criteria: (1) lack of completion of baseline food frequency questionnaires; (2) baseline self-reported myocardial infarction, coronary artery bypass graft, percutaneous transluminal coronary angioplasty or stroke; and (3) implausible baseline dietary total energy intake (<600 kcal/day or >5000 kcal/day). The final sample size in the current cohort analysis was

85,563. (Figure 2.1)

As plasma PL FA profiles were not available for all WHI participants, we additionally performed a matched case-control study of 2448 participants nested in the WHI observational cohort study to evaluate the association of plasma PL LI with

CHD risk. Specifically, for the matched case-control study, all adjudicated incident

CHD cases from the WHI observational database in September 2005 were selected for sampling 10. A total of 2468 potential cases were initially eligible. Potential cases were excluded according to the following criteria: (1) lack of available baseline plasma sample, (2) lack of completion of baseline food frequency questionnaires, and

(3) CVD reported at baseline, where CVD was defined as myocardial infarction, angina, coronary artery bypass graft surgery/percutaneous transluminal coronary angioplasty, carotid artery disease, congestive heart failure, stroke or peripheral vascular disease. Potential controls were excluded for all these criteria as well as developing CVD during follow-up (a mean of 4.5 years). Among the 1549 cases meeting the eligibility criteria, 1288 had a previously matched eligible control.

Matching was done on the basis of age at screening, date of enrollment, race/

14 ethnicity (White, Black, Hispanic, Other), and hysterectomy status at baseline. This matching process resulted in 2448 matched case-controls. Additionally, we excluded

11 participants who lacked plasma PL FA profile results and their matched pairs

(N=9). Therefore, the final sample size in the case-control analysis was 2428. (Figure

2.1)

All participants signed an informed consent, which was approved by the institutional review boards at the Clinical Coordinating Center at the Fred Hutchinson

Cancer Research Center and the 40 clinical centers. A separate approval to use de- identified samples and data for the plasma PL FA analysis in this study was obtained from the Tufts University/Tufts Medical Center Institutional Review Board 11.

2.3.2 Lipophilic Index

The primary exposure of interest was LI, which was calculated from dietary FA and plasma PL FA. Dietary FAs were measured from food frequency questionnaires at baseline and Year 3, including saturated fatty acids (SFA) (4:0, 6:0, 8:0, 10:0, 12:0,

14:0, 16:0, 17:0, 18:0, 20:0, and 22:0), monounsaturated fatty acids (MUFA) (14:1,

16:1, 18:1, 20:1, and 22:1), n-3 polyunsaturated fatty acids (PUFA) (18:3, 20:5, 22:5, and 22:6), n-6 PUFA (18:2, 18:4, and 20:4), and trans fatty acids (TFA) (16:1 T, 18:1

T, and 18:2 T). Plasma PL FAs, including SFA (12:0, 14:0, 15:0, 16:0, 18:0, 20:0,

22:0, and 24:0), MUFA (14:1, 16:1N-7, 16:1N-9, 18:1N-7, 18:1N-9, 20:1N-9, and

24:1N-9), n-3 PUFA (18:3N-3, 20:5N-3, 22:5N-3, and 22:6N-3), n-6 PUFA (18:2N-6,

18:3N-6, 20:2N-6, 20:3N-6, 20:4N-6, 22:4N-6, and 22:5N-6) and TFA (all 18:1 T and

18.2 T), were measured at baseline using an established gas chromatography method and expressed as molar percentage (mol %), proportions of FA relative to the internal standard 12. Details about internal and external quality controls can be found elsewhere 11.

15 The LI for dietary and plasma PL FAs were calculated as a summation of the products of the levels of FAs and their specific melting points (℃) using the following equations:

∑ [/&%%( &0#1 ( 4) × 8$9%#:4 ;<#:% (℃) ] "#$%&'( *+ = 3 6 6 ∑3 /&%%( &0#1 ( 4)6

>9&?@& *+

∑ [*$A$9?

where i denotes individual FA, k denotes number of FA used to calculate LI. A higher LI is associated with higher FA melting point and higher lipophilicity.

Dietary LI in the cohort study was computed for baseline and Year 3 separately.

We used the cumulative average diet method to evaluate the association between dietary LI and CHD risk 13,14. In this method, we used dietary LI derived from baseline food frequency questionnaires to capture the exposure within the first 3 years, and dietary LI averaged the baseline and Year 3 questionnaires to capture the exposure beyond the first 3 years. For those who failed to complete food frequency questionnaires at Year 3 (N=11,051), we used multiple imputation by chained equations to impute missing dietary measurements (more details in Statistical

Analysis) 15. Dietary LI in the case-control study was derived based on baseline food frequency questionnaires. Since plasma PL FAs in the WHI case-control study were expressed as molar percentage, molecular weight for each FA was taken into account.

Information about melting point and molecular weight was acquired from the

LipidBank Database 16. For those FAs which melting points were described in ranges,

16 the midpoints were used; for FAs having isomers, the weighted averages were used, where weights were calculated based on previously published papers 7.

2.3.3 Covariates

Socio-demographic variables, lifestyle factors, CHD risk factors, and dietary factors were assessed by interview, self-report, or physical measurement at baseline using standardized questionnaires or during follow-up using the same protocol as baseline assessments 8. Socio-demographic variables included age, region, race/ethnicity, education, and income. Lifestyle factors included body mass index

(BMI), physical activity, and smoking. CHD risk factors included family history of myocardial infarction/ diabetes/ stroke, medication use (included anticoagulants, diabetes medications, and lipid lowering medications), postmenopausal hormone use, and self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia/ hysterectomy status. Dietary factors included alcohol intake, percent calories from protein and carbohydrates, and total energy intake.

For analytic purposes, education was categorized as ≤high school, some college, and post-graduate. Income was categorized as <$20,000, $20,000 to $74,999, and ≥

$75,000 per year. Physical activity was measured by total physical activity score (MET- h/wk) 17. BMI was treated as continuous variable. Smoking was categorized as never, past, or current. Postmenopausal hormone use was categorized as current estrogen + progesterone, current estrogen along, past users, and never used.

2.3.4 Statistical Analysis

We initially examined the distribution of baseline socio-demographic characteristics, lifestyle factors, CHD risk factors, and dietary factors by quartiles of dietary LI in the cohort study, and by cases status in the matched case-control study.

17 Descriptive statistics such as medians, means, standard errors, frequencies and proportions were used to summarize the aforementioned variables. Chi-square and

ANOVA tests were used for categorical and continuous variables respectively in the cohort study by quartiles of dietary LI. Cases and controls were compared using either paired t-test, Wilcoxon signed rank, or McNemar tests, depending on the distribution of the data.

To examine the association between quartiles of LI and CHD risk, we used unadjusted and adjusted Cox proportional hazard regression models in the cohort study, and adjusted conditional logistic regression models in the case-control study.

LI was categorized into quartiles based on the distribution among participants without

CHD. Test for trends was conducted by assigning the median value of each quartile to a given category and then including this categorical variable in Cox and conditional logistic regression models as a linear term. Other covariates were modeled using indicators when categorical or linear terms when continuous. The covariates that were adjusted for in the aforementioned models were based on the causal diagram of each study (Supplemental Figure 2.1) and selected to control for confounding bias as well as selection bias due to study exclusions and censoring because of loss to follow up 18-

20. Patients were considered to be lost to follow-up 1 year after the last time they were seen at a clinic visit during the study period. Patients who were last seen within 1 year of August 29, 2014, were administratively censored at August 29, 2014. The fully adjusted model in the cohort study included age, region, race/ethnicity, education, income, BMI, physical activity, smoking, family history of myocardial infarction/ diabetes/ stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia/ hysterectomy status, dietary alcohol, percent calories from protein and carbohydrates, and total energy intake. The final adjusted model examining plasma PL LI on CHD risk in the case-control study

18 included matching factors (age, race/ethnicity, enrollment date, and hysterectomy status), BMI, physical activity, smoking, family history of myocardial infarction/ diabetes/ stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia, dietary alcohol, percent calories from protein and carbohydrates, and total energy intake. No evidence of violation of the proportional hazards assumption was found on the basis of the Schoenfeld residuals or the Wald test for a product term between the exposure of interest and follow-up time (both linear and on log scale).

Multiple imputation (5 times) by chained equations 15 was used to impute missing values among cohort participants (N=85,563) on dietary measurements at Year 3

(dietary LI, alcohol intake, percent calories from protein and carbohydrates, and total energy, N=11,051) and the following covariates: race/ethnicity (N=231), education

(N=680), income (N=3714), BMI (N=980), physical activity (N=958), smoking

(N=1181), family history of myocardial infarction (N=4428)/ diabetes (N=4115)/ stroke (N=4736), self-reported baseline hypertension (N=1502)/ diabetes (N=86)/ cancer (N=654)/ hypercholesterolemia (N=1887), and hysterectomy status (N=80).

Sensitivity analyses were conducted to assess the robustness of findings by 1) calculating dietary Lipophilic Load (LL), which is an integrated measure of both lipophilic quantity and quality 21, for the comparison with dietary LI; 2) calculating alternative dietary LI by excluding TFA or PUFA; 3) using data on participants with complete information (N=60,079); 4) examining the associations of dietary

PUFA:SFA and long-chain n-3 PUFA with CHD risk; 5) examining the associations of blood lipids (low-density lipoprotein cholesterol (LDL-C):high-density lipoprotein cholesterol (HDL-C) ratio and triglycerides) with CHD among a subgroup of participants with blood lipids; 6) comparing dietary LI/LL calculated based on

19 different methods (cumulative average versus baseline only); and 7) validating the association between LI and CHD by additionally adjust for PUFA to SFA ratio.

All data was analyzed using SAS (version 9.4; SAS Institute, Inc., Cary, NC).

2.4 Results

Among 85,563 women in the cohort study, we identified 4195 incident CHD events; the average (SD) follow-up person time was 13.2 (4.3) years. The median

(IQR) baseline dietary LI was 27.6 (3.5), and the mean age (SD) was 63.4 (7.3).

Participants in the highest dietary LI quartile compared to participants in lower dietary LI quartiles were more likely to be White/ non-Hispanics, live in the Midwest, and have less income (Table 2.1). Less physical activity and higher BMI were associated with higher dietary LI. We observed a moderate correlation between baseline and Year 3 LI (Pearson’s correlation coefficient = 0.52, P<0.01). Among all the cohort participants, 39% stayed in the same dietary LI quartile, 35% participants switched to lower LI quartiles, and 25% switched to higher LI quartiles in the second dietary measurement at Year 3.

Table 2.2 shows the characteristics by CHD cases status. The median (IQR) of plasma PL LI was 22.3 (2.5) overall, 22.1 (2.5) among controls, and 22.5 (2.4) among cases. The median (IQR) of dietary LI was 27.7 (3.5) overall, 27.6 (3.5) among controls, and 27.8 (3.4) among cases. The Spearman’s correlation coefficient between plasma PL LI and dietary LI was 0.12 (P < 0.01). The overall mean (SD) age was 67.8

(6.8) years, and there was no difference between cases and controls. Cases had significantly lower levels of education, income, and physical activity. Cases also had higher BMI and total energy intake than controls. There was a higher proportion of smokers, family history of myocardial infarction and a lower proportion with a family

20 history of diabetes among cases. Cases also reported a lower proportion of current hormone usage (estrogen + progesterone or estrogen along) than controls.

The melting point, median level of each FA in diet and plasma PL, and the correlations between individual FA and LI are shown in Supplemental Table 2.1.

The major plasma PL FAs were 16:0 (30.5 mol %), 18:0 (13.3 mol %), 18:1 n-9 (8.4 mol %), 18:2 n-6 (20.7 mol %), 20:4 n-6 (10.9 mol %), and 22:6 n-3 (3.1 mol %), while the major dietary FAs were 16:0 (8.4 g/d), 18:0 (4.1 g/d), 18:1 n-9 (16.3 g/d),

18:2 n-6 (8.5 g/d), 18:3 n-3 (1.0 g/d), and trans 18:1 (2.6 g/d).

2.4.1 Associations Between LI and CHD

Table 2.3 shows the relationship between LI and CHD risk for the two studies based on hazard ratios (HRs) with 95% CI for CHD associated with quartiles of LI in the cohort study, and odds ratios (ORs) with 95% CI for CHD in the matched case- control study. Among the cohort study participants, higher dietary LI was significantly associated with an increased risk of CHD both in the unadjusted and adjusted models. Specifically, in model 2, participants in the highest quartile of dietary LI showed an 18% higher risk of CHD compared with participants in the lowest quartile after adjusting for confounders. Among those included in the case- control study, the OR in an adjusted model that controlled for matching factors was

1.87 (95% CI: 1.45-2.40, P for trend<0.01) comparing the highest to the lowest quartile of plasma PL LI. In the model that adjusted for all proposed confounders, the

OR comparing extreme quartiles was attenuated to 1.76 (95% CI: 1.33-2.33, P for trend<0.01). The associations of dietary LI and plasma PL LI with CHD still persisted after the adjustment of PUFA:SFA (model 3 and model 6).

21 2.4.2 Sensitivity Analyses

In the sensitivity analysis of using cohort participants with complete information

(n=60,079), we found stronger association between dietary LI and the risk of CHD

(HR=1.23, 95% CI: 1.08-1.33) than using multiple imputation by chained equations to impute missing values (Supplemental Table 2.2). In the analysis comparing different dietary lipids (LL, PUFA:SFA, and n-3 PUFA) with LI, PUFA:SFA and n-3

PUFA were negatively associated with CHD. However, we did not find significant association between dietary LL and CHD. The associations between alternative dietary LIs (without TFA or PUFA) and CHD remained statistically significant

(Supplemental Table 2.3). When limiting the analysis to cohort participants with blood lipids and C-reactive protein (CRP) (N=6188), we found that blood triglycerides and LDL:HDL ratio were positively associated with CHD risk, and the association between dietary LI and CHD risk did not change after adjusting for

LDL:HDL (Supplemental Table 2.4). In our analysis, CRP did not modify dietary

LI/LL on CHD risk. Supplemental Table 2.5 shows the results comparing cumulative average dietary LI/LL with baseline LI/LL. We only observed significant associations between cumulative average dietary LI and CHD risk.

2.5 Discussion

In the cohort analysis of 85,563 women in the WHI observational study (1994-

2014), higher dietary LI was associated with an increased risk of CHD after adjusting for potential sources of confounding and selection bias. In the adjusted case-control analysis of 2428 women selected from the WHI 2005 database, we found significant positive association between plasma PL LI and the risk of CHD. Our results are consistent with two previous matched case-control studies 6,7. In these studies, dietary

22 LI was positively associated with increased risk of CHD and myocardial infarction

(RR=1.61 and 1.57, respectively, when comparing extreme quintiles) 6,7.

FAs regulate cellular membrane fluidity and physiological function, and further influence cardio-metabolic risks. The LI was designed to summarize the quality of

FA, either consumed in diet or present in biological samples, into an index. The quality of FA with regard to lipophilicity, was defined by melting points, which reflect two main molecular characteristics of FAs: FA hydrocarbon chain length and unsaturation (number of double bonds) 22. A higher melting point is related to a longer hydrocarbon chain, a greater degree of saturation, and presence of a double bond in the trans configuration. Therefore, higher LI is associated with higher FA lipophilicity and may indicate lower membrane fluidity.

The association between LI and CHD risk can be mainly explained by cell membrane fluidity and the PL fluidity of lipoproteins, which involve multiple mechanisms. Cell membrane fluidity affects membrane permeability, transport systems, receptor functions, or enzyme activities, therefore playing an important role in the pathogenesis of CVD 23. Lower LI can increase membrane fluidity, which further improves the activity of proteins involved in ion transport, signal transduction, cell Ca2+ handling, and intracellular pH regulation 3. In addition, lower membrane fluidity, indicated by higher LI, is associated with endothelial dysfunction through increased oxidative stress 24, impaired oxygen permeability in cell membranes 25, and impaired vascular endothelial wound closure under shear stress 26, and is also associated with decreased insulin resistance potentially through the effect of resistin

27. The lipoprotein fluidity, which is influenced by PL fatty acyl composition of lipoproteins 28, can influence the structure of lipoproteins, affect the rate at which the particle or its constituent lipids are deposited in or can be removed from developing atherosclerotic plaques, and further change the risk of CHD 29. For example, lower LI

23 is associated with higher fluidity of HDL particles, which increase the activity of lecithin:cholesterol acyltransferase 30 and the capacity of HDL to promote cholesterol efflux 31, thus lowering the development of CHD.

Alternative explanations for the association between LI and CHD may be the differential effects of FA. In our study, dietary and plasma PL LI were positively associated with SFA and negatively associated with PUFA. Lower LI, indicating a higher proportion of PUFA, is beneficial for CHD since PUFA has pleiotropic beneficial effects in the cardiovascular system, including anti-thrombotic, anti- atherosclerotic, anti-arrhythmic, anti-inflammatory, and anti-fibrotic properties 32-34.

Higher level of PUFA is also associated with diminished liver triglyceride production, increased rate of cholesterol clearance, increased FA oxidation by suppressing the expression of lipogenesis genes 22,35, and improved insulin sensitivity 36.

The LI was designed to summarize the quality of overall FA, however, has several limitations. First, the estimated LI depends on the source of FA, and its biological function may also be source-dependent. For example, as plasma PL FAs reflect both dietary fat consumption and FA metabolism, the composition of plasma

PL FA is different from dietary FA. Therefore, LI derived from different sources need to be interpreted cautiously. Second, LI derived from any source may not be a direct measurement of membrane fluidity of various cells and plasma lipoproteins. Third, melting points and molecular weights may not necessarily capture the membrane fluidity. For example, the melting points of TFA are lower than those of SFA with the same number of carbon atoms. However, TFA is a strong predictor of CHD 37,38.

With regard to the study limitations, LIs derived from both dietary FA and plasma PL FA were prone to measurement error, especially dietary FA computed from food frequency questionnaires. Not all types of FAs with available melting points from LipidBank database were measured by our study. However, these other

24 FAs are present in only small amounts, hence, our measurements captured the predominant FAs. In addition, we cannot exclude the possibility of following situations as explanations of our results: 1) residual or unmeasured selection bias due to excluding participants; 2) residual or unmeasured confounding bias due to unmeasured blood lipids which can influence the measurement of plasma PL FA; and

3) bias due to measurement error associated with FA measurements and imputed values at Year 3. Furthermore, we have a limited number of FA measurements.

Specifically, dietary FAs were measured twice within the first three years, much shorter than the average follow-up time, and plasma PL FAs were measured only once at baseline. Since plasma PL FAs only reflect medium-term dietary fat intake, one measurement may not represent long-term average FA levels, and thus are less likely to predict long-term CHD risk. Finally, the results of our study were restricted to postmenopausal women.

Our study does have some strengths compared with previous studies. Our study represents a large, multiethnic and geographically diverse population with a long period of follow-up. In addition, we used the accumulative average diet method to calculate dietary LI based on two dietary measurements and treated dietary LI as a time-varying exposure, thus reducing measurement error in food frequency questionnaires and making dietary LI more representative of long-term dietary fats.

2.6 Conclusion

The LI, derived from diet and plasma PL, summarizing the overall FA lipophilicity, were positively associated with CHD risk. Accordingly, the LI may aid in predicting CHD risk beyond individual FA and plasma lipids that are established

CHD risk factors. Future studies with multiple measurements of FA from different sources, including diet, plasma, erythrocyte, and adipose tissue are necessary to

25 strengthen the observed evidence regarding the association between LI and CHD risk.

Additional research related to potential pathways between LI and CHD is also likely warranted.

26 2.7 Tables and Figures

Table 2.1 Baseline Characteristics by Quartiles of Dietary Lipophilic Index among Participants in the Cohort Study of Women’s Health

Initiative (1994-2014) (N=85,563)

Cohort study Characteristics Overall Q1 (N=21375) Q2 (N=21352) Q3 (N=21335) Q4 (N=21501) Dietary LI a 27.6 (3.5) 24.4 (2.1) 26.8 (0.9) 28.4 (0.8) 30.5 (1.6) Socio-demographics Age, year b 63.4 (7.3) 63.7 (7.3) 63.5 (7.3) 63.3 (7.4) 63.3 (7.4) Race/ethnicity, n (%) Black/African American 6213 (7) 1900 (9) 1777 (8) 1452 (7) 1084 (5) Hispanic/Latino 3063 (4) 864 (4) 751 (4) 726 (3) 722 (3) White, non-Hispanic 72622 (85) 16887 (79) 17908 (84) 18580 (87) 19247 (90) Other 3665 (4) 1724 (8) 916 (4) 577 (3) 428 (2) Region, n (%) Northeast 19592 (23) 5153 (24) 4744 (22) 4732 (22) 4963 (23) South 21951 (26) 5883 (28) 5820 (27) 5529 (26) 4719 (22) Midwest 18952 (22) 3257 (15) 4481 (21) 5274 (25) 5940 (28) West 25068 (29) 7082 (33) 6307 (30) 5800 (27) 5879 (27) Education, n (%) <=High school 25688 (30) 5743 (27) 6424 (30) 6743 (32) 6778 (32) Some college and collage graduate 32911 (38) 8188 (38) 8169 (38) 8224 (39) 8330 (39) Post-graduate 26964 (32) 7444 (35) 6759 (32) 6368 (30) 6393 (30) Income, n (%)

27 <$20,000 12267 (14) 2828 (13) 2999 (14) 3102 (15) 3338 (16) $20,000-$74,999 53105 (62) 12757 (60) 13301 (62) 13498 (63) 13549 (63) >=$75,000 20101 (23) 5790 (27) 5052 (24) 4735 (22) 4524 (21) Lifestyle factors Physical activity, MET-h/wk a 10.0 (16.7) 12.5 (18.7) 10.5 (16.8) 9.3 (16.1) 8.3 (15.6) 2 a BMI, Kg/m 26.0 (6.8) 25.1 (6.2) 26.0 (6.5) 26.4 (6.9) 26.7 (7.3) Smoking, n (%) Never-smoker 43659 (51) 10621 (50) 10927 (51) 11151 (52) 10960 (51) Past smoker 36741 (43) 9757 (46) 9204 (43) 8840 (41) 8940 (42) Current smoker 5163 (6) 997 (5) 1221 (6) 1344 (6) 1601 (7) CHD risk factors Family history, n (% yes) Myocardial infarction 44531 (52) 11091 (52) 11196 (52) 11204 (52) 11040 (51) Diabetes 27987 (33) 7025 (33) 7144 (33) 7049 (33) 6769 (32) Stroke 32689 (38) 8069 (38) 8156 (38) 8184 (38) 8280 (39) c Medication use, n (% yes) 3314 (4) 1008 (5) 844 (4) 827 (4) 635 (3) Hormone usage, n (%) Never Used 34218 (40) 8450 (40) 8359 (39) 8485 (40) 8924 (42) Past Users 12629 (15) 3190 (15) 3143 (15) 3116 (15) 3180 (15) Current Estrogen alone 21346 (25) 5272 (25) 5427 (25) 5467 (26) 5180 (24) Current Estrogen + Progesterone 17370 (20) 4463 (21) 4423 (21) 4267 (20) 4217 (20) Hypertension, n (%) Never hypertensive 58343 (68) 14709 (69) 14350 (67) 14455 (68) 14829 (69) Untreated hypertensive 6669 (8) 1676 (8) 1566 (7) 1713 (8) 1714 (8) Treated hypertensive 20551 (24) 4990 (23) 5436 (25) 5167 (24) 4958 (23) Diabetes, n (% yes) 4190 (5) 1110 (5) 1129 (5) 1036 (5) 915 (4)

28 Cancer, n (%yes) 11026 (13) 2674 (13) 2724 (13) 2802 (13) 2826 (13) Hypercholesterolemia, n (% yes) 11521 (13) 3436 (16) 3058 (14) 2759 (13) 2268 (11) Hysterectomy, n (% yes) 35207 (41) 8602 (40) 8844 (41) 8992 (42) 8769 (41) Dietary factor Alcohol, g/day a 1.0 (6.5) 1.0 (6.5) 1.0 (7.1) 1.0 (6.5) 1.0 (6.5) Percent calories from carbohydrates a 52.7 (13.0) 55.8 (13.7) 53.0 (12.8) 51.7 (12.4) 50.4 (12.2) Percent calories from protein a 16.8 (4.2) 16.5 (4.4) 16.8 (4.1) 17.0 (4.0) 16.8 (4.2) Total energy a 1482.3 (733.0) 1350.7 (650.5) 1476.4 (714.0) 1536.3 (747.6) 1586.0 (785.7) a The median (IQR) of Continuous variables b The mean (SD) of Continuous variables c Medications included anticoagulants, diabetes medications, and lipid lowering medications Abbreviations: BMI, body mass index; CHD, coronary heart disease; LI, lipophilic index; MET, metabolic equivalent of task; IQR, interquartile range; SD, standard deviation.

29 Table 2.2 Baseline Characteristics among Participants in the Matched Case-control Study of Women’s Health Initiative (1994-2005)

(N=2428)

Case-control study Variables Overall Controls (N=1214) Cases (N=1214) Plasma PL LI a 22.3 (2.5) 22.1 (2.5) 22.5 (2.4) Dietary LI 27.7 (3.5) 27.6 (3.5) 27.8 (3.4) Socio-demographics Age, year b 67.8 (6.8) 67.8 (6.8) 67.8 (6.8) Race/ethnicity, n (%) Black/African American 136 (6) 68 (6) 68 (6) Hispanic/Latino 32 (1) 16 (1) 16 (1) White, non-Hispanic 2172 (90) 1086 (90) 1086 (90) Other 88 (4) 44 (4) 44 (4) Region, n (%) Northeast 610 (25) 303 (25) 307 (25) South 579 (24) 299 (25) 280 (23) Midwest 547 (23) 264 (22) 283 (23) West 692 (29) 348 (29) 344 (28) Education, n (%) <=High school 844 (35) 379 (31) 465 (38) Some college and collage graduate 907 (37) 472 (39) 435 (36) Post-graduate 677 (28) 363 (30) 314 (26) Income, n (%)

30 <$20,000 448 (18) 193 (16) 255 (21) $20,000-$74,999 1542 (64) 778 (64) 764 (63) >=$75,000 438 (18) 243 (20) 195 (16) Lifestyle factors Physical activity, MET-h/wk a 9.5 (16.3) 10.8 (17.3) 8.3 (15.4) BMI, Kg/m2 a 26.4 (6.8) 25.9 (6.4) 26.9 (7.3) Smoking, n (%) Never-smoker 1231 (51) 649 (53) 582 (48) Past smoker 1034 (43) 501 (41) 533 (44) Current smoker 163 (7) 64 (5) 99 (8) CHD risk factors Family history, n (% yes) Myocardial infarction 1381 (57) 647 (53) 734 (60) Diabetes 831 (34) 441 (36) 390 (32) Stroke 986 (41) 475 (39) 511 (42) Medication use, n (%) c 131 (5) 44 (4) 87 (7) Hormone usage, n (%) Never Used 1143 (47) 539 (44) 604 (50) Past Users 375 (15) 175 (14) 200 (17) Current Estrogen alone 555 (23) 295 (24) 260 (21) Current Estrogen + Progesterone 355 (15) 205 (17) 150 (12) Hypertension, n (%) Never hypertensive 1410 (58) 823 (68) 587 (48) Untreated hypertensive 237 (10) 97 (8) 140 (12) Treated hypertensive 781 (32) 294 (24) 487 (40) Diabetes, n (% yes) 197 (8) 46 (4) 151 (12)

31 Cancer, n (% yes) 367 (15) 172 (14) 195 (16) Hypercholesterolemia, n (% yes) 381 (16) 176 (15) 205 (17) Hysterectomy, n (% yes) 1000 (41) 500 (40) 500 (40) Dietary factor Alcohol, g/day a 0.9 (6.5) 1.0 (7.0) 0.6 (6.2) Percent calories from carbohydrates b 52.2 (13.4) 53.0 (12.8) 51.2 (13.7) Percent calories from protein a 16.8 (4.2) 16.9 (4.2) 16.8 (4.2) Total energy, Kcal/day a 1506.4 (728.3) 1531.3 (702.4) 1482.2 (752.6)

a The median (IQR) of continuous variables b The mean (SD) of continuous variables c Medications included anticoagulants, diabetes medications, and lipid lowering medications Abbreviations: BMI, body mass index; CHD, coronary heart disease; LI, lipophilic index; MET, metabolic equivalent of task; IQR, interquartile range; SD, standard deviation; PL, phospholipid.

32 Table 2.3 Hazard Ratios or Odds Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary/Plasma PL Lipophilic

Index in the Cohort Study (1994-2014, N=85,563) and the Matched Case-Control Study (1994-2005, N=2428) of Women’s Health

Initiative

Lipophilic Index P for trend g Q1 Q2 Q3 Q4 Dietary LI in Cohort Study No. of CHD events/patients 1034/20,375 1009/21,352 994/21,335 1158/21,501 Median LI (IQR) 24.4 (2.1) 26.8 (0.9) 28.4 (0.8) 30.5 (1.6) Model 1 a, HRs (95% CI) ref 1.08 (0.98, 1.19) 1.11 (1.01, 1.22) 1.23 (1.11, 1.35) <0.01 Model 2 b, HRs (95% CI) ref 1.04 (0.94, 1.15) 1.07 (0.97, 1.18) 1.18 (1.07, 1.31) <0.01 Model 3 c, HRs (95% CI) ref 1.06 (0.93, 1.20) 1.09 (0.93, 1.28) 1.22 (1.00, 1.49) <0.01 Plasma PL LI in Case-control Study No. of cases/participants 226/529 280/584 324/627 384/688 Median LI (IQR) 20.0 (1.2) 21.6 (0.6) 22.7 (0.6) 24.3 (1.2) Model 4 d, ORs (95% CI) ref 1.29 (1.01, 1.64) 1.53 (1.12, 1.94) 1.87 (1.45, 2.40) <0.01 Model 5 e, ORs (95% CI) ref 1.22 (0.94, 1.58) 1.46 (1.11, 1.91) 1.76 (1.33, 2.33) <0.01 Model 6 f, ORs (95% CI) ref 1.16 (0.89, 1.52) 1.35 (1.02, 1.79) 1.56 (1.14, 2.14) <0.01 a Model 1: unadjusted model. b Model 2: adjusted for age, region, race/ethnicity, education, income, BMI, physical activity, smoking, family history of myocardial infarction/diabetes/stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia/ hysterectomy status, dietary alcohol, percent calories from protein and carbohydrates, and total energy.

33 c Model 3: Model 2 plus dietary PUFA:SFA ratio. d Model 4: adjusted for matching factors (age, ethnicity, enrollment date, and hysterectomy status). e Model 5: Model 4 plus lifestyle factors (physical activity, BMI, smoking), CHD risk factors (family history of myocardial infarction/diabetes/stroke, medication use, postmenopausal hormone use, and self-reported baseline hypertension /diabetes/ cancer/ hypercholesterolemia), dietary alcohol, percent calories from protein and carbohydrates, and total energy f Model 6: Model 5 plus plasma PL PUFA:SFA ratio. g Test for trends was conducted by treating the median value for each quartile of LI as a continuous variable. Abbreviations: BMI, body mass index; CHD, coronary heart disease; CI, confidence interval; HR, hazard ratio; IQR, interquartile range; LI, lipophilic index; MET, metabolic equivalent of task; OR, odds ratio; PL, phospholipid; PUFA, polyunsaturated fatty acid; SFA, saturated fatty acid.

34

Figure 2.1 Population Selection Process. CHD: Coronary Heart Disease

35 2.8 Supplemental Material

Supplemental Table 2.1 Melting Points and Median (IQR) Levels for Individual Fatty Acids, and Spearman’s Correlation

Coefficients (r) for Correlation Between Individual Fatty Acids and the Lipophilic Index Measured in Diets from the Cohort

Study (1994-2014) and in Plasma PL from the Case-Control Study (1994-2005) of the Women’s Health Initiative

Diet Plasma Melting Molecular Fatty acids Individual FA intake Individual FA level Point (℃) Weight r r (g/day) (mol %) SFA 4:0, Butyric Acid -7.9 88.105 0.31 (0.4) 0.60* NA a NA 6:0, Caproic Acid -3.4 116.158 0.15 (0.2) 0.59* NA a NA 8:0, Caprylic Acid 16.7 144.211 0.12 (0.1) 0.59* NA a NA 10:0, Capric Acid 31.6 172.265 0.23 (0.3) 0.60* NA a NA 12:0, Lauric Acid 44.2 200.318 0.30 (0.4) 0.54* 0.06 (0.0) 0.13* 14:0, Myristic Acid 53.9 228.371 1.22 (1.2) 0.59* 0.67 (0.3) 0.22* 15:0, Pentanoic Acid 52.3 242.398 NA a NA 0.22 (0.1) 0.01 16:0, Palmitic Acid 63.1 256.424 8.37 (6.2) 0.39* 30.46 (2.9) 0.19* 17:0, Margaric Acid 61.3 270.451 0.04 (0.1) 0.46* NA a NA 18:0, Stearic Acid 69.6 284.477 4.05 (3.2) 0.43* 13.32 (2.0) 0.04 20:0, Arachidic Acid 76.75 312.53 0.06 (0.1) -0.06* 0.23 (0.1) 0.08* 22:0, Beheric Acid 81.5 340.584 0.04 (0.1) -0.19* 0.6 (0.3) 0.12* 24:0, Tetracosanoic Acid 87.75 368.637 NA a NA 0.42 (0.2) 0.06* MUFA 14:1, Myrisoleic Acid -4 226.355 0.04 (0.0) 0.44* 0.1 (0.1) 0.06* 16:1 n-7, Palmitoleic Acid 0 254.408 0.72 (0.5) 0.41* 0.78 (0.4) 0.27* 16:1 n-9, 7-Hexadecenoic Acid 35 254.408 NA a NA 0.11 (0.1) 0.11*

36 18:1 n-7, Vaccenic Acid 15 282.461 NA a NA 1.35 (0.4) -0.22* 18:1 n-9, Oleic Acid 16 282.461 16.33 (12.1) 0.18* 8.35 (1.7) 0.39* 20:1 n-9, Eicosanoic Acid 23.25 310.515 0.13 (0.1) -0.09* 0.07 (0) -0.21* 22:1 n-11, Cetoleic Acid 33.35, 338.568 0.01 (0.0) -0.28* NA a NA 24:1 n-9, Tetrasenoic Acid 42.25 366.621 NA a NA 0.72 (0.4) -0.05* PUFA PUFA n-3 18:3 n-3, Linolenic Acid -11.15 278.43 1.01 (0.7) -0.16* 0.2 (0.1) 0.19* 20:5 n-3, Eicosapentanoic Acid (EPA) -54.1 302.451 0.03 (0.0) -0.15* 0.69 (0.4) -0.28* 22:5 n-3, Docosapentanoic Acid (DPA) -78 330.504 0.01 (0.0) -0.14* 0.82 (0.2) -0.29* 22:6 n-3, Docosahexanoic Acid (DHA) -44.15 328.488 0.06 (0.1) -0.17* 3.09 (1.3) -0.53* PUFA n-6 18:2 n-6, Linoleic Acid -5 280.445 8.48 (6.2) -0.11* 20.74 (4) 0.27* 18:3 n-6, Gamma Linolenic Acid -11.15 278.43 NA a NA 0.09 (0.1) 0.19* 18:4 n-6, Octadecatetraenoic acid -57 276.414 0.00 (0.0) -0.11* NA a NA 20:2 n-6, Eicosadienoic Acid NA 308.499 NA a NA 0.42 (0.3) NA 20:3 n-6, Eicosatrienoic Acid NA 306.483 NA a NA 3.23 (1.0) NA 20:4 n-6, Arachidonic Acid -49.5 304.467 0.08 (0.1) 0.09* 10.94 (2.7) -0.70* 22:4 n-6, Docosatetraenoic Acid NA 332.52 NA a NA 0.41 (0.1) NA 22:5 n-6, Docosapentaenoic Acid NA 330.504 NA a NA 0.33 (0.2) NA TFA 16:1T 31 254.408 0.02 (0.0) 0.44* NA a NA 18:1T 48.7 282.461 2.56 (2.5) 0.26* 0.49 (0.4) 0.10* 18:2T 5.7 280.445 0.34 (0.3) 0.33* 0.13 (0.1) 0.23* a Fatty acids that were not specifically measured in plasma PL or not calculated from food frequency questionnaires. Abbreviations: IQR, interquartile range; MUFA, monounsaturated fatty acid; PUFA, polyunsaturated fatty acid; PL, phospholipid; SFA, saturated fatty acid; TFA, trans fatty acid.

37 Supplemental Table 2.2 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipophilic Index/Load among Participants without Missing Covariates in the Cohort Study (1994-2014, N=60,079) of Women’s Health Initiative

Lipophilic Index/Load P for trend c Q1 Q2 Q3 Q4 Dietary LI No. of CHD events/patients 652/14,869 669/15,030 649/15,108 759/15,072 Median LI (IQR) 24.4 (2.1) 26.8 (0.9) 28.4 (0.8) 30.4 (1.6) Unadjusted model, HRs (95% CI) ref 1.18 (1.05, 1.33) 1.16 (1.04, 1.31) 1.30 (1.16, 1.47) <0.01 Adjusted model 1 a, HRs (95% CI) ref 1.13 (1.00, 1.27) 1.12 (0.99, 1.26) 1.23 (1.08, 1.40) 0.02 Adjusted model 2 b, HRs (95% CI) ref 1.13 (0.97, 1.32) 1.12 (0.92, 1.36) 1.23 (0.96, 1.58) 0.33 Dietary LL No. of CHD events/patients 617/14,450 658/15,229 685/15,342 769/15,058 Median LL (IQR) 656.4 (236.2) 1055.4 (200.0) 1511.2 (281.2) 2373.15 (850.9) Unadjusted model, HRs (95% CI) ref 1.04 (0.93, 1.18) 1.03 (0.91, 1.16) 1.24 (1.10, 1.40) <0.01 Adjusted model 1 a, HRs (95% CI) ref 1.02 (0.89, 1.17) 0.98 (0.83, 1.16) 1.14 (0.89, 1.46) 0.23 Adjusted model 2 b, HRs (95% CI) ref 0.98 (0.85, 1.12) 0.91 (0.77, 1.09) 1.02 (0.78, 1.32) 0.25 a Adjusted for age, region, race/ethnicity, education, income, BMI, physical activity, smoking, family history of myocardial infarction/diabetes/stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia/ hysterectomy status, dietary alcohol, percent calories from protein and carbohydrates, and total energy. b Model 1 plus dietary PUFA:SFA ratio c Test for trends was conducted by treating the median value for each quartile of LI/LL as a continuous variable Abbreviations: BMI, body mass index; CHD, coronary heart disease; CI, Confidence intervals; IQR, interquartile range; LI, lipophilic index; LL, lipophilic load; HR, hazard ratio; PUFA, polyunsaturated fatty acid; SFA, saturated fatty acid.

38 Supplemental Table 2.3 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipids in the Cohort

Study (1994-2014, N=85,563) of Women’s Health Initiative

Dietary Lipids P for trend c Q1 Q2 Q3 Q4 Dietary LI No. of CHD events/patients 1034/20,375 1009/21,352 994/21,335 1158/21,501 Median LI (IQR) 24.4 (2.1) 26.8 (0.9) 28.4 (0.8) 30.5 (1.6) Unadjusted model, HRs (95% CI) ref 1.08 (0.98, 1.19) 1.11 (1.01, 1.22) 1.23 (1.11, 1.35) <0.01 Adjusted model 1 a, HRs (95% CI) ref 1.04 (0.94, 1.15) 1.07 (0.97, 1.18) 1.18 (1.07, 1.31) <0.01 Adjusted model 2 b, HRs (95% CI) ref 1.06 (0.93, 1.20) 1.09 (0.93, 1.28) 1.22 (1.00, 1.49) <0.01 Dietary LI without TFA No. of CHD events/patients 1057/21,398 989/21,331 994/21,337 1155/21,497 Median LI (IQR) 23.0 (2.0) 25.5 (0.9) 27.2 (0.9) 29.5 (1.8) Unadjusted model, HRs (95% CI) ref 1.03 (0.93, 1.13) 1.09 (1.00, 1.20) 1.14 (1.04, 1.26) 0.02 Adjusted model 1 a, HRs (95% CI) ref 1.01 (0.92, 1.11) 1.09 (0.99,1.20) 1.14 (1.03, 1.27) 0.02 Dietary LI without PUFA No. of CHD events/patients 1047/21,389 1027/21,368 1022/21,364 1099/21,442 Median LI (IQR) 35.6 (1.2) 36.8 (0.4) 37.6 (0.4) 38.6 (0.7) Unadjusted model, HRs (95% CI) ref 1.03 (0.94, 1.13) 1.06 (0.97, 1.17) 1.15 (1.04, 1.26) <0.01 Adjusted model 1 a, HRs (95% CI) ref 1.00 (0.91, 1.09) 1.03 (0.94, 1.14) 1.15 (1.04, 1.27) 0.01 Dietary LL No. of CHD events/patients 983/21,325 997/21,339 1035/21,377 1180/21,522 Median LL (IQR) 648.9 (241.4) 1054.6 (199.3) 1511.1 (281.6) 2384.9 (873.8) Unadjusted model, HRs (95% CI) ref 1.07 (0.97, 1.19) 1.07 (0.97, 1.18) 1.23 (1.12, 1.36) <0.01

39 Adjusted model 1 a, HRs (95% CI) ref 1.04 (0.93, 1.16) 1.00 (0.87, 1.15) 1.10 (0.90, 1.35) 0.35 Adjusted model 2 b, HRs (95% CI) ref 1.01 (0.90, 1.13) 0.95 (0.82, 1.09) 1.01 (0.82, 1.25) 0.45 Dietary PUFA:SFA ratio No. of CHD events/patients 1147/21,488 966/21,309 1027/21,369 1055/21,397 Median (IQR) 0.4 (0.1) 0.6 (0.1) 0.7 (0.1) 0.9 (0.2) Unadjusted model, HRs (95% CI) ref 0.93 (0.85, 1.02) 0.88 (0.80, 0.96) 0.92 (0.84, 1.01) 0.04 Adjusted model 1 a, HRs (95% CI) ref 0.92 (0.84, 1.01) 0.86 (0.78, 0.94) 0.89 (0.81, 0.99) 0.01 Dietary n-3 PUFA No. of CHD events/patients 1071/21,413 988/21,330 1011/21,353 1125/21,467 Mean (SD), g/day 0.6 (0.1) 1.0 (0.1) 1.4 (0.1) 2.3 (0.7) Unadjusted model, HRs (95% CI) ref 0.89 (0.81, 0.98) 0.87 (0.79, 0.95) 0.99 (0.90, 1.09) <0.01 Adjusted model 1 a, HRs (95% CI) ref 0.88 (0.80, 0.98) 0.84 (0.76, 0.94) 0.90 (0.78, 1.03) <0.01 a Adjusted for age, region, race/ethnicity, education, income, BMI, physical activity, smoking, family history of myocardial infarction/diabetes/stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia/ hysterectomy status, dietary alcohol, percent calories from protein and carbohydrates, and total energy. b Model 1 plus dietary PUFA:SFA ratio c Test for trends was conducted by treating the median value for each quartile of LI/LL as a continuous variable Abbreviations: BMI, body mass index; CHD, coronary heart disease; CI, Confidence intervals; IQR, interquartile range; LI, lipophilic index; LL, lipophilic load; HR, hazard ratio; PUFA, polyunsaturated fatty acid; SD, standard deviation; SFA, saturated fatty acid; TFA, trans fatty acid.

40 Supplemental Table 2.4 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipophilic Index and

Lipids among Participants with Blood Lipids in the Cohort Study (1994-2014, N=6,188) of Women’s Health Initiative

Quartiles of Exposures P for trend c Q1 Q2 Q3 Q4 Dietary LI No. of CHD events/patients 198/1864 231/1666 201/1425 215/1233 Median LI (IQR) 24.2 (2.3) 26.8 (0.9) 28.4 (0.8) 30.3 (1.5) Adjusted model 1 a, HRs (95% CI) ref 1.18 (0.96, 1.45 ) 0.94 (0.76, 1.17) 1.35 (1.08, 1.68) <0.01 Adjusted model 2 b, HRs (95% CI) ref 1.19 (0.97, 1.46) 0.95 (0.77, 1.18) 1.35 (1.08, 1.68) <0.01 Triglycerides No. of CHD events/patients 110/1477 159/1473 255/1589 321/1649 Median (IQR), mg/dL 62 (17) 91 (14) 123 (22) 191 (70) Adjusted model 1 a, HRs (95% CI) ref 1.07 (0.83, 1.37) 1.29 (1.02, 1.63) 1.40 (1.11, 1.77) <0.01 LDL:HDL ratio No. of CHD events/patients 167/1497 184/1529 227/1558 267/1604 Median (IQR) 1.4 (0.4) 2.0 (0.3) 1.6 (0.4) 3.6 (0.9) Adjusted model 1 a, HRs (95% CI) ref 1.23 (0.93, 1.62) 0.85 (0.63, 1.15) 1.35 (1.00, 1.82) <0.01 a Adjusted for age, region, race/ethnicity, education, income, BMI, physical activity, smoking, family history of myocardial infarction/ diabetes/ stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia/ hysterectomy status, dietary alcohol, percent calories from protein and carbohydrates, and total energy. b Model 1 plus LDL:HDL ratio. c Test for trends was conducted by treating the median value for each quartile of LI/LL as a continuous variable Abbreviations: BMI, body mass index; CHD, coronary heart disease; CI, Confidence intervals; HDL, high-density lipoprotein; IQR, interquartile range; LDL, low-density lipoprotein; LI, lipophilic index; HR, hazard ratio.

41 Supplemental Table 2.5 Hazard Ratios (95% Confidence Intervals) for CHD according to Quartiles of Dietary Lipophilic Index/Load

Calculated Using Different Methods among Participants in the Cohort Study (1994-2014, N=85,563) of Women’s Health Initiative

Lipophilic Index/Load P for trend b Q1 Q2 Q3 Q4 Cumulative Dietary LI Adjusted model a, HRs (95% CI) ref 1.04 (0.94, 1.15) 1.07 (0.97, 1.18) 1.18 (1.07, 1.31) <0.01 Baseline Dietary LI Adjusted model a, HRs (95% CI) ref 0.92 (0.84, 1.00) 0.91 (0.84, 1.00) 1.05 (0.96, 1.15) 0.29 Cumulative Dietary LL Adjusted model a, HRs (95% CI) ref 1.04 (0.93, 1.16) 1.00 (0.87, 1.15) 1.10 (0.90, 1.35) 0.35 Baseline Dietary LL Adjusted model a, HRs (95% CI) ref 0.94 (0.85, 1.03) 0.92 (0.81, 1.03) 0.96 (0.81, 1.15) 0.98 a Adjusted for age, region, race/ethnicity, education, income, BMI, physical activity, smoking, family history of myocardial infarction/diabetes/stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia/ hysterectomy status, dietary alcohol, percent calories from protein and carbohydrates, and total energy. b Test for trends was conducted by treating the median value for each quartile of LI/LL as a continuous variable Abbreviations: BMI, body mass index; CHD, coronary heart disease; CI, Confidence intervals; IQR, interquartile range; LI, lipophilic index; LL, lipophilic load; HR, hazard ratio.

42

Supplemental Figure 2.1 Causal Diagrams Hypothesized for the Two Study Designs

I. Dietary LI in cohort study. II. Plasma PL LI in matched case-control study. III. Dietary LI in matched case-control study. C1-3 represent variables related to confounding bias and selection bias. C1 includes age, region, race/ethnicity, education, income, BMI, physical activity, smoking, family history of myocardial infarction/diabetes/stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/diabetes/cancer/hypercholesterolemia/hysterectomy status, dietary alcohol, percent calories from protein and carbohydrates, and total energy intake. C2 includes age, region, race/ethnicity, education, income, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia, and hysterectomy status. C3 includes age, race/ethnicity, hysterectomy status, BMI, physical activity, smoking, family history of myocardial infarction/diabetes/stroke, medication use, postmenopausal hormone use, self-reported baseline hypertension/ diabetes/ cancer/ hypercholesterolemia, dietary alcohol, percent calories from protein and carbohydrates, and total energy intake. A box around a node represents conditioning on that node.

43 2.9 References

1. Mozaffarian D, Benjamin EJ, Go AS, et al. Heart disease and stroke statistics-

-2015 update: a report from the American Heart Association. Circulation

2015;131:e29-322.

2. Kaiser KA, Shikany JM, Keating KD, Allison DB. Will reducing sugar-

sweetened beverage consumption reduce obesity? Evidence supporting

conjecture is strong, but evidence when testing effect is weak. Obesity reviews

: an official journal of the International Association for the Study of Obesity

2013;14:620-33.

3. Cooper RA. Abnormalities of cell-membrane fluidity in the pathogenesis of

disease. The New England journal of medicine 1977;297:371-7.

4. Hodson L, Skeaff CM, Fielding BA. Fatty acid composition of adipose tissue

and blood in humans and its use as a biomarker of dietary intake. Progress in

lipid research 2008;47:348-80.

5. Ding EL, Sun Q, Campos H, Hu FB. Lipophilic index of fatty acid fluidity in

erythrocyte and plasma and risk of coronary heart disease. Circulation

2008;118:S_1089.

6. Wu H, Ding EL, Toledo ET, et al. A novel fatty acid lipophilic index and risk

of CHD in US men: the health professionals follow-up study. The British

journal of nutrition 2013;110:466-74.

7. Toledo E, Campos H, Ding EL, et al. A novel fatty acid profile index--the

lipophilic index--and risk of myocardial infarction. American journal of

epidemiology 2013;178:392-400.

8. Design of the Women's Health Initiative clinical trial and observational study.

The Women's Health Initiative Study Group. Controlled clinical trials

1998;19:61-109.

44 9. Hays J, Hunt JR, Hubbell FA, et al. The Women's Health Initiative

recruitment methods and results. Annals of epidemiology 2003;13:S18-77.

10. Curb JD, McTiernan A, Heckbert SR, et al. Outcomes ascertainment and

adjudication methods in the Women's Health Initiative. Annals of

epidemiology 2003;13:S122-8.

11. Matthan NR, Ooi EM, Van Horn L, Neuhouser ML, Woodman R,

Lichtenstein AH. Plasma Phospholipid Fatty Acid Biomarkers of Dietary Fat

Quality and Endogenous Metabolism Predict Coronary Heart Disease Risk: A

Nested Case-Control Study Within the Women's Health Initiative

Observational Study. J Am Heart Assoc 2014;3.

12. Lichtenstein AH, Matthan NR, Jalbert SM, Resteghini NA, Schaefer EJ,

Ausman LM. Novel soybean oils with different fatty acid profiles alter

cardiovascular disease risk factors in moderately hyperlipidemic subjects. Am

J Clin Nutr 2006;84:497-504.

13. Hu FB, Stampfer MJ, Rimm E, et al. Dietary fat and coronary heart disease: a

comparison of approaches for adjusting for total energy intake and modeling

repeated dietary measurements. American journal of epidemiology

1999;149:531-40.

14. Bertoia ML, Triche EW, Michaud DS, et al. Long-term alcohol and caffeine

intake and risk of sudden cardiac death in women. The American journal of

clinical nutrition 2013;97:1356-63.

15. Resche-Rigon M, White IR. Multiple imputation by chained equations for

systematically and sporadically missing multilevel data. Statistical methods in

medical research 2016.

16. Schwingshackl L, Hoffmann G, Kalle-Uhlmann T, Arregui M, Buijsse B,

Boeing H. Fruit and Vegetable Consumption and Changes in Anthropometric

45 Variables in Adult Populations: A Systematic Review and Meta-Analysis of

Prospective Cohort Studies. PloS one 2015;10:e0140846.

17. McTiernan A, Kooperberg C, White E, et al. Recreational physical activity

and the risk of breast cancer in postmenopausal women: the Women's Health

Initiative Cohort Study. Jama 2003;290:1331-6.

18. Hernán MA, Hernandez-Diaz S, Robins JM. A structural approach to selection

bias. Epidemiology 2004;15:615-25.

19. Howe CJ, Cole SR, Lau B, Napravnik S, Eron JJ, Jr. Selection Bias Due to

Loss to Follow Up in Cohort Studies. Epidemiology 2016;27:91-7.

20. Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge

as a prerequisite for confounding evaluation: an application to birth defects

epidemiology. American journal of epidemiology 2002;155:176-84.

21. Ding EL, De Vito KM, Wu H, et al. Dietary Lipophilic Load and Dietary

Lipophilic Index with Risk of Coronary Heart Disease in Middle-Aged

Women: Beyond Conventional Fat Classifications. Circulation 2015;131:A19-

A.

22. van Meer G, Voelker DR, Feigenson GW. Membrane lipids: where they are

and how they behave. Nature reviews Molecular cell biology 2008;9:112-24.

23. Zicha J, Kunes J, Devynck MA. Abnormalities of membrane function and

lipid metabolism in hypertension: a review. American journal of hypertension

1999;12:315-31.

24. Singh U, Jialal I. Oxidative stress and atherosclerosis. Pathophysiology : the

official journal of the International Society for Pathophysiology / ISP

2006;13:129-42.

46 25. Dumas D, Latger V, Viriot ML, Blondel W, Stoltz JF. Membrane fluidity and

oxygen diffusion in cholesterol-enriched endothelial cells. Clinical

hemorheology and microcirculation 1999;21:255-61.

26. Gojova A, Barakat AI. Vascular endothelial wound closure under shear stress:

role of membrane fluidity and flow-sensitive ion channels. Journal of applied

physiology (Bethesda, Md : 1985) 2005;98:2355-62.

27. Tsuda K. Association of resistin with impaired membrane fluidity of red blood

cells in hypertensive and normotensive men: an electron paramagnetic

resonance study. Heart and vessels 2015.

28. Sola R, Baudet MF, Motta C, Maille M, Boisnier C, Jacotot B. Effects of

dietary fats on the fluidity of human high-density lipoprotein: influence of the

overall composition and phospholipid fatty acids. Biochimica et biophysica

acta 1990;1043:43-51.

29. Soutar A. Does dietary fat influence plasma lipoprotein structure? Nature

1978;273:11-2.

30. Parks JS, Huggins KW, Gebre AK, Burleson ER. Phosphatidylcholine fluidity

and structure affect lecithin:cholesterol acyltransferase activity. Journal of

lipid research 2000;41:546-53.

31. Berrougui H, Isabelle M, Cloutier M, Grenier G, Khalil A. Age-related

impairment of HDL-mediated cholesterol efflux. Journal of lipid research

2007;48:328-36.

32. Harper CR, Jacobson TA. The fats of life: the role of omega-3 fatty acids in

the prevention of coronary heart disease. Arch Intern Med 2001;161:2185-92.

33. Demaison L, Moreau D. Dietary n-3 polyunsaturated fatty acids and coronary

heart disease-related mortality: a possible mechanism of action. Cellular and

molecular life sciences : CMLS 2002;59:463-77.

47 34. Mozaffarian D, Wu JH. Omega-3 fatty acids and cardiovascular disease:

effects on risk factors, molecular pathways, and clinical events. Journal of the

American College of Cardiology 2011;58:2047-67.

35. Clarke SD. The multi-dimensional regulation of gene expression by fatty

acids: polyunsaturated fats as nutrient sensors. Current opinion in lipidology

2004;15:13-8.

36. Manco M, Calvani M, Mingrone G. Effects of dietary fatty acids on insulin

sensitivity and secretion. Diabetes, obesity & metabolism 2004;6:402-13.

37. Mozaffarian D, Katan MB, Ascherio A, Stampfer MJ, Willett WC. Trans fatty

acids and cardiovascular disease. The New England journal of medicine

2006;354:1601-13.

38. Sun Q, Ma J, Campos H, et al. A prospective study of trans fatty acids in

erythrocytes and risk of coronary heart disease. Circulation 2007;115:1858-65.

48

Chapter 3:

Discovery of Biological Pathways and Gene Networks for Heart Failure with

Preserved and Reduced Ejection Fraction in Women across Ethnicities

49 3.1 Introduction

According to the American Heart Association, approximately 6.5 million U.S. adults have heart failure (HF) in 2018 1, representing a major cause of morbidity and mortality in the United States. HF is phenotypically and genetically heterogeneous, though much remain unknown about the etiology for different subtypes of HF (i.e.,

40% to 71% have preserved ejection fraction (HFpEF), compared with HF with reduced ejection fraction (HFrEF)) 2. Moreover, HF is understudied especially in women who experienced a higher mortality rate than men1. Additionally, African

Americans (AA) have the highest incidence of HF, followed by Hispanic and

Caucasian Americans (CA), with the morbidity and mortality rate at about two times higher for AA as for CA 3-5. Epidemiological studies have shown that while patients with HFpEF and HFrEF often display similar clinical symptoms, they have very different risk factors, pathophysiological processes, and response to therapies 6. Many therapies with unequivocal benefit in HFrEF have failed to show efficacy for HFpEF

7,8. Therefore, research providing evidence in biological pathways and gene networks for HFpEF and HFrEF is warranted to shed light in the pathophysiological understanding of the disease processes and serve as foundation for novel treatment modalities, especially for women, who are disproportionately affected by HF 1.

Recent candidate gene studies and genome-wide association studies (GWAS) have identified several genetic loci (such as ADRB1, USP3, ITPK1, and BAG3 genes

9,10) associated with HF risk. However, these studies are either focused mainly on genes likely to be associated with inherited HF (mostly HFrEF), limited by low reproducibility and small effect size, or conducted primarily among CA women 9.

Moreover, few studies to date have directly investigated the differential genetic mechanisms underlying HF subtypes, specifically HFpEF and HFrEF 11,12, and none have been conducted in an ethnically diverse population of women. As genes tend to

50 behave conjointly on HF process and analyzing a cluster of genes with related biological functions improves the power to identify significant genetic variants, using integrative pathway and network methodology approach 13,14 would enhance the understanding of biological mechanisms underlying different HF subtypes (HFpEF verses HFrEF) as well as ethnic disparities in women with HF. Using systems biology approach, we investigated genetic risk factors and biological pathways predisposing to HF and its subtypes in both AA and CA women participated in the Women’s

Health Initiative (WHI) study.

3.2 Objectives

HF events were adjudicated in a sub-cohort of women with measured left ventricular ejection fraction (LVEF) thus allowing for the differentiation of HFrEF and HFpEF. Participants in the WHI SNP Health Association Resource (WHI-

SHARe) and the Genomics and Randomized Trials Network (WHI-GARNET) studies also provided available genotypic data. We therefore sought to perform firstly,

GWAS analysis to identify genetic variants associated with HFpEF and HFrEF; secondly, pathway/network analysis to examine biological mechanisms underlying

HF (HFpEF and HFrEF) in different ethnicities; and thirdly, identification of key driver genes regulating relevant gene networks for HF. Further, we conducted replicative analysis using GWAS data from the Jackson Heart Study (JHS) and

Framingham Heart Study (FHS).

3.3 Methods

3.3.1 Study Population

Discovery Population

The WHI study enrolled 161,808 postmenopausal women aged between 50 and 79 years old from 1993 to 1998. The original WHI study has two major

51 components: a partial factorial randomized clinical trial (CT) including 68,132 participants and an observational study (OS) of 93,676 participants. Detailed study design has been reported elsewhere 15. Briefly, medical records from enrollment through September 2014 for 44,174 WHI participants, including all women randomized to the hormone trial component (n = 27,347) and all AA participants (n =

11,880) and Hispanic participants (n = 4,947) from the CT and the OS, were sent to the University of North Carolina (UNC) for HF adjudications.

Of the participants enrolled in the WHI-OS, 8,515 self-identified AA women had consented to and were eligible for the WHI-SNP Health Association Resource

(SHARe), and of the participants enrolled in the WHI hormone trial, 4,909 CA women were included in the WHI-Genomics and Randomized Trials Network

(GARNET). After quality control, the standard GWAS and pathway analyses were conducted among 8,298 AA and 4,257 CA participants of the WHI.

Population for validation and replication

The current study included two populations as validation and replication: the

Jackson Heart Study (JHS) and the Framingham Heart Study (FHS). Considering

WHI only enrolled postmenopausal women, we only replicated the proposed analyses among female participants in the JHS and FHS. The main FHS enrolled three generations: the original generation (started in 1948), offspring generation (started in

1971), and generation three (started in 2005). Because of the poor measurement of left ventricular ejection fraction (LVEF) among the original generation and the relatively young age of generation three (baseline age < 40 years old), we only included the offspring generation in the analysis. In total, we conducted the proposed analyses among 1,871 AA women in the JHS and 1,764 CA women in the FHS.

52 3.3.2 Definition of Heart Failure

In the WHI, HF adjudicated at UNC was based on the Atherosclerosis Risk in

Communities (ARIC) classification guidelines 16, in which HF was defined as having acute decompensated HF (ADHF) and chronic stable HF. Details of the adjudication process have been published elsewhere 17. Participants with adjudicated HF were further classified as HFrEF or HFpEF according to their LVEF. For patients with

ADHF, those with LVEF < 45% were considered as HFrEF and those with LVEF ³

45% were considered as HFpEF. For patients with chronic stable HF, baseline LVEF or lowest estimated LVEF on medical records were used to classify HF subtypes.

Similar criteria were applied to replication cohorts, the JHS and FHS. Participants without LVEF were excluded from the analysis (n=440).

3.3.3 Genotype Data

Genome-wide genotyping of the WHI-SHARe and JHS participants were performed using the Affymetrix 6.0 array (Affymetrix, Inc, Santa Clara, CA), and

WHI-GARNET and FHS participants were genotyped using Illumina HumanOmni1-

Quad SNP platform (Illumina, Inc, San Diego, CA). As the gene chips used for genotyping are designed to capture common genetic variants, genetic variants with frequency ≥ 0.05 were genotyped. Reference panels from the 1000 Genomes (1000G)

Project Consortium (Version 3, March 2012 release), which provide near complete coverage of common genetic variation with minor allele frequency ≥ 0.5%, were used for genotype imputation.

53 3.3.4 Statistical Analysis

Genome-wide association analyses

We performed standard GWAS analysis for HFrEF and HFpEF for AA and

CA women, using multivariable logistic regressions. The regression models were implemented using allelic dosage at each single-nucleotide polymorphism (SNP) as the independent variable, with covariate adjustment for: 1) age, age2, region, and first four principal components (PCs) for global ancestry in the WHI-SHARe; 2) age, age2, region, randomized hormone treatment group, baseline hysterectomy status, and first four PCs in the WHI-GARNET; 3) age, age2, and first four PCs in the JHS and FHS.

Since the associations between germline genetic variants and HF are not confounded by demographic and lifestyle factors, no other confounders were adjusted in the

GWAS analysis. The general form of the GWAS model is specified as follows:

!"#$% '([*|,, . ] = 12 + 14, + 15., where Y denotes HF subtype, G denotes SNPs, and V denotes adjusted covariates.

Common genetic variants reaching the suggestive significance (5 × 10:;) were identified as potential GWAS hits and were validated in the JHS and FHS.

Pathway analysis

We obtained knowledge-driven metabolic and signaling pathways from three databases: the Kyoto Encyclopedia of Genes and Genomes (KEGG) 18, Reactome 19, and BioCarta 20. Each pathway was tested for enrichment of genetic signals for

HFrEF and HFpEF by ethnic groups. To avoid the bias due to the use of a particular method, we applied two different well-established methods based on known biological pathways: 1) GSA-SNP 13, and 2) Mergeomics 14. Biological pathways were defined as significant if they met the following criteria: 1) identified by both methods from the WHI study with a FDR-adjusted q value < 0.2; and 2) validated by

54 GSA-SNP or Mergeomics with a significant P value after Bonferroni correction in

JHS (as replication of WHI-AA) and FHS (as replication of WHI-CA). We then performed cross-phenotype and cross-ethnicity analyses within WHI participants to examine shared pathways between HFrEF and HFpEF, as well as phenotype-specific pathways, across ethnicities.

Key Driver Analysis for Identification of Key Regulatory Genes for HF-related

Pathways

As hundreds of genes are involved in the biological pathways, we seek to further prioritize key driver (KD) genes, defined as genes that played a central role in the disease progress and once perturbed, should have major impact on many other genes. We integrated all genes involved in significant pathways with seven Bayesian networks and one protein-protein interaction network, by using KD analysis methods14,21,22. As multiple KD genes were identified from multiple networks, we designed a normalized rank score (NRS) to summarize the consistency and strength of identified KD genes across multiple networks 23, where <=> = ?@A × ∑?@A = ; B FGH DEF

IDE is the count of networks from which a KD was identified; IDE is then normalized by total number of networks N to represent the consistency of a KD across all networks tested (seven Bayesian networks from seven tissues, including adipose, blood, brain, islet, liver, kidney, and muscle, and one protein-protein interaction). The

KD strength is represented by the summation of normalized statistical rank in each network $ (=DEF) across all networks from which the KD is identified; =DEF =

LMNO@AP , which was calculated by dividing the rank of a KD based on the P values of B@AP the Fisher exact test in descending order (=QRSDEF) by the total number of KDs

55 identified from a network $ (

3.4 Results

Among WHI participants, 860 (10.4%) AA and 601 (14.1%) CA were initially identified as having HF. After excluding those without LVEF measurement (n=316 and 124 for WHI-AA and CA, respectively), we performed primary analyses among

7,982 AA and 4,133 CA women in the WHI, and replication analysis among 1,853

AA women in the JHS and 1,755 CA women in the FHS. The descriptive statistics on demographic and lifestyle factors of each study population are shown in Table 3.1.

Compared to WHI-CA, WHI-AA women were younger in age and less physically active, had higher BMI and lower intakes of alcohol and total calories, and with a higher proportion of cardiovascular disease and diabetes.

3.4.1 Identification of Significant Genetic Loci Using Standard GWAS Analysis

In the validation analysis of previously reported 93 suggestive variants for HF from the GWAS catalog 24, we validated 20 SNPs from 18 loci, which were further allocated into HFrEF and/or HFpEF in the WHI population with P < 0.05 (Table 3.2).

Variants from MAML3, BNC2, PLCG1, PRTFDC1, and close to TEAD4, CLMP,

FAR2, and TBC1D4 were associated with HFrEF, while variants from GPM6A,

ABCA12, ZFHX3, SLC30A3, GCKR, and close to LRRIQ3, DUSP10, HDAC9, and

TRIB1 were associated with HFpEF. In particular, one SNP rs4420638 on 19, close to APOE and APOC genes, was found to be associated with both HFrEF and HFpEF among CA, and HFpEF among AA. Detailed information regarding the validated and allocated loci can be found in Supplemental Tables 3.1 and 3.3.

56 The standard GWAS results for HFrEF and HFpEF within WHI-AA

(n=7,982) and WHI-CA (n=4,133) are shown in the Manhattan plots (Supplementary

Figure 3.1). Among AA, this discovery analysis revealed one significant (P < 5´10-8) and 57 suggestive (P < 5´10-6) SNPs related to HFrEF, and 3 significant (P < 5´10-8) and 94 suggestive (P < 5´10-6) SNPs related to HFpEF. Among CA, we failed to identify significant SNPs, but found 50 and 47 suggestive (P < 5´10-6) SNPs related to HFrEF and HFpEF, respectively.

In the replication analysis for AA women among JHS (n=1,853) participants,

8 SNPs from 4 loci (lead SNPs: rs12067046, rs114553497, rs10229703, and rs149663839) out of 94 SNPs for HFpEF reached the threshold of P < 0.05. In the replication analysis for CA women among FHS (n=1,755) participants, 1 SNP

(rs12719020) reached the P < 0.05 threshold among the 50 suggestive SNPs for

HFrEF; and 19 SNPs concentrated on chromosome 16 (lead SNP: rs12599260) among the suggestive 47 SNPs for HFpEF, reached the threshold of P < 0.05. The effect of all lead SNPs on HF was in the same direction in the discovery population and the replication population (Table 3.2 and Supplemental Table 3.2). Detailed information regarding the newly discovered loci can be found in Supplemental Table

3.3.

3.4.2 Identification of Biological Pathways Using Integrative Pathway Analysis

We firstly identified 21 pathways for HFrEF (9 for CA and 12 for AA) and 42 pathways for HFpEF (31 for CA and 17 for AA) among WHI participants, of which

11 and 15 pathways were validated for HFrEF and HFpEF, respectively, among the

JHS and FHS women. The results of cross-phenotype and cross-ethnicity analysis were presented in Table 3.3 and Supplemental Tables 3.4 and 3.5. Based on the functions of the pathways, we identified three major themes: 1) angiogenesis and

57 vascular patterning; 2) inflammation; and 3) non-specific cell signaling and metabolism. Five pathways, emerging from angiogenesis and vascular patterning, were shared between HFrEF and HFpEF across AA and CA women, namely, extracellular matrix (ECM)-receptor interaction, cell adhesion molecules (CAMs), axon guidance, netrin-1 signaling, and developmental biology (Figure 3.1). The five shared pathways were also found to be highly interconnected as demonstrated by a shared common set of 256 genes among them (Figure 3.2).

In addition, we found 6 pathways specifically enriched for HFpEF, namely, vascular smooth muscle contraction, adherens junction, endocytosis, heparan sulfate/heparin (HS)- glycosaminoglycan (GAG) biosynthesis and degradation, and phosphatidylinositol signal system; all of which corresponded to the theme of angiogenesis and vascular patterning (Figures 3.1 and 3.3).

3.4.3 Identification of Key Drivers for HFpEF and HFrEF

In the KD analysis to identify potential genes that played a central role in the significant pathways for HF, we used 8 different regulatory or interaction networks that capture gene-gene or protein-protein interactions. The top 10 KD genes for the 5 shared pathways (developmental biology, axon guidance, netrin-1 signaling, ECM- receptor interaction, and CAMs) between HFrEF and HFpEF across two ethnicities are COL1A1, COL1A2, COL3A1, COL4A2, COL5A1, and COL6A3 from ECM and axon guidance pathways, and HLA-DQA1, HLA-DQB1, HLA-DRB1, and HLA-DMB from CAMs pathway (Figure 3.2 and Supplemental Figure 3.2). For the 6 pathways specific for HFpEF, the top 10 KD genes are MYH11, MYLK and PRKACB from vascular smooth muscle contraction, PRKCG, PIK3R1 from phosphatidylinositol signal system, HGS, EGF, and SH3KBP1 from endocytosis, and CTNNB1 and RAC1 from adherens junction (Figure 3.3 and Supplemental Figure 3.2).

58

3.5 Discussion

In this GWAS analysis of 7,982 AA and 4,133 CA women from the WHI, we validated 18 previously reported genetic loci and allocated them to HFrEF and

HFpEF, and additionally discovered one HFrEF and 5 HFpEF potential genetic loci unknown in the past. In the integrative pathway analysis, 5 biological pathways were consistently found to be shared for both HFrEF and HFpEF across AA and CA women and 6 pathways were specific for HFpEF. Our results suggested the presence of core mechanisms across HF subtypes (HFrEF and HFpEF), such as angiogenesis and vascular patterning, inflammation, as well as phenotype-specific mechanisms such as non-specific cell signaling and metabolism.

The genetic architecture underlying HF remains challenging to delineate.

Although a group of mechanisms – genetic cardiomyopathies, including dilated, hypertrophic, arrhythmogenic, restrictive cardiomyopathies, and left ventricular non- compaction, have been revealed and several genetic loci strongly related to the aforementioned mechanisms have been identified from Mendelian families 9, these forms of inherited HF only account for a small number of cases 25. The majority of

HF arises from variants in many genes with minor individual effects and a complex interaction between genetic and environmental factors 26.

For genes allocated to HFrEF, MAML3 might be relevant to imperfect cardiovascular development through Notch signaling pathway 27; TEAD4 has been reported to be required for vascular endothelial growth factor (VEGF)-induced angiogenesis 28; BNC2 is associated with blood pressure 29 and one metabolic profile

– pyroglutamine 30, which might be a predictor of HF 31; CLMP is key regulator of adhesion and inflammation 32, which is implicated in development of obesity 33 and cardiac conduction 34; PLCG1 encodes a protein which is a required downstream of

59 VEGF during arterial development 35; PRTFDC1 is related to magnesium ion binding

36; FAR2 gene involves in lipid metabolism 37; and TBC1D4 is a diabetic gene by playing important roles in glucose homeostasis and contributing to insulin resistance

38.

For genes allocated to HFpEF, variants in GPM6A is related to peripheral nerve regeneration and angiogenesis 39; DUSP10 negatively regulates members of the

MAPK superfamily and has been found to regulate vascular patterning and inflammation 40; HDAC9 promotes angiogenesis via Notch signaling and is associated with heart ischemic injury 41; TWIST1, as a cancer-related gene, has been found to upregulate VEGF and play functional roles in angiogenesis 42; TRIB1 is a lipid- associated gene and has been linked to plasma triglyceride level and coronary heart disease 43; ABCA12 is a lipid transporter, which also contributes to the development of atherosclerosis 44; ZFHX3 has been linked to atrial fibrillation with unknown mechanism 45 and obesity through the control of energy intake and expenditure46;

SLC30A3 plays important roles in glucose and Zinc transporting; and GCKR is a diabetic risk gene, which regulates glucose metabolism 47 and is also associated with hypertriglyceridemia 48. Genes in the Apolipoprotein family (APOE, APOC1,

APOC2, etc.), allocated to both HFrEF and HFpEF, encode lipid transport proteins that regulate cholesterol metabolism and have long been found to be associated with obesity and cardiovascular disease 49,50.

In addition, we discovered 1 and 5 loci from intergenic regions that may be associated with HFrEF and HFpEF, respectively, and the functions of genes close to the identified loci coincide with the aforementioned three major themes. Variant rs12719020, associated with HFrEF, is located upstream (< 20 Kb) to COBL, a gene which is related to vasculitis and has been linked to type 1 diabetes 51. For variants that have potential effects on HFpEF, rs12067046 is located 500 Kb downstream of

60 PLXNA2, a protein that belongs to the Plexin family and is related to the development of blood vessel 52 and inflammatory-induced immune disorders 53; the linkage disequilibrium (LD) block around rs12599260 is upstream (5 Kb) to HEATR3, which encodes a protein involved in NOD2-mediated NF-kB signaling and regulates inflammatory immune response 54; rs149663839 is located upstream (50 Kb) to CAT, a key antioxidant enzyme in the body’s defense against oxidative stress, which is hypothesized to play a role in the development of many chronic or late-onset diseases such as HF 55; the LD block around rs114553497 is 60 Kb upstream to ACTA1, a gene fundamental for skeletal muscle contraction and found to cause skeletal muscle myopathy probably through calcium signaling 56; rs10229703 is located 5 Kb downstream to CALD1, which plays an essential role in the regulation of smooth muscle contraction and might be related to calcium channel activity and magnesium- mediated vascular relaxation 57.

Genetic pathway and network analysis, as a novel approach to integrate genetic signals that complements current GWAS analysis, has been conducted and yield new insight into the biology of coronary heart disease 58, type 2 diabetes 23, obesity 43, and LV function 59, but not for HF. Our pathway-based analysis revealed 5 consistent pathways between HFrEF and HFpEF across the two ethnicities (Table

3.3). All the 5 pathways were linked to angiogenesis and vascular patterning, among which 3 pathways, ECM-receptor interaction, CAMs, and Netrin-1 signaling were also linked to inflammation (Figure 3.1). From the 5 pathways shared by both HFrEF and HFpEF, three pathways, axon guidance, ECM-receptor interaction, and CAMs, have been implicated previously in cardiovascular disease, type 2 diabetes, and LV function 23,59. Although the main function of axon guidance pathway is related to localization and neuronal extension, molecules within the pathway have been connected to angiogenesis and vascular patterning. For instance, axon guidance

61 molecules and receptors, including semaphorins, plexins and neuropilins, slits and roundabouts, and netrins and netrin receptors, play functional roles in vascular development 60. The molecules can either be directly linked to sprouting and guidance of capillary tip cells for angiogenesis, or have involved to vasculature, especially modulation of the activity of the VEGF signaling pathway, thus affecting guided vascular patterning by rendering vessels more or less responsive to VEGF 60. In particular, SNP rs12067046 near PLXNA2 was identified from this GWAS analysis to be associated with HFpEF, which further supported the functional roles of axon guidance molecules on HF. ECM has been known to contribute to the angiogenesis and vascular patterning via multiple ways. ECM interacts with VEGF-A, modulating its availability, gradient organization and signaling properties. ECM can indirectly signal through the Notch pathway by engaging a specific array of integrins.

Moreover, ECM influences cellular tension and cytoskeleton organization, which are key aspects during sprouting 61. In addition, the cardiac ECM (primarily Collagen I) has long been shown to provide a platform for cardiomyocytes to maintain structure and function, and the change in ECM properties following an insult strongly drives the progression toward HF via multiple mechanisms, including myocardial fibrosis and changed ECM protein orientation 62,63. The CAMs, which are glycoproteins expressed on the cell surface, are critical players in the angiogenic cascade. For example, CAMs interact with endothelial cells by modulating the processes of cell adhesion, migration and proliferation, and with ECM molecules in the response to

VEGF 64. In addition, the CAMs play functional roles in a wide range of other biological processes including homeostasis, immune response, and inflammation 23, and have been connected with metabolic syndrome 65 and coronary heart disease 66 in multiethnic populations. Other two pathways, developmental biology and netrin-1 signaling pathways are closely interconnected with axon guidance. Especially the

62 netrin-1 signaling, part of axon guidance, points out the fundamental role of netrin-1 and its receptor UNC5B on angiogenesis, atherosclerosis, and vascular endothelial function, thus further supporting this axon guidance cue as an important therapeutic target for HF and other cardiovascular diseases 67. Details regarding the main functions of the identified pathways can be found in Supplemental Table 3.6.

Therefore, these pathways highlighted importance of angiogenesis and vascular patterning as well as inflammation on HF, and appear to link HFrEF and HFpEF to other cardio-metabolic health outcomes via multiple mechanisms. The fact that these pathways were consistently identified across multiple ethnicities further highlight their central role in the join mechanisms between these related cardio-metabolic health outcomes.

In addition to the shared mechanisms, we also identified six pathways specifically for HFpEF across ethnicities. All the 6 pathways were linked to angiogenesis and vascular patterning, from which 2 pathways (HS-GAG biosynthesis and degradation) were additionally linked to inflammation, 2 pathways (vascular smooth muscle contraction and endocytosis) to the theme of non-specific cell signaling and metabolism, and 1 pathway (phosphatidylinositol signal system) was linked to all the three themes (Figure 3.1). None of the 6 HFpEF pathways have been implicated previously in pathway-based studies. Vascular smooth muscle contraction, triggered by angiotensin-induced signaling, is the key regulator of blood pressure, and its dysfunction is directly related to hypertension, a well-understood risk factor of HF

68. Meanwhile, many other mechanisms, especially inflammation and subsequent reduction-oxidation reaction, are also involved in regulating vascular smooth muscle function. It has been demonstrated that impaired endothelial and vascular smooth muscle functions are associated with a poor prognosis in HF 69. Adherens junctions are ubiquitous along the vascular tree. Adherens junctions in endothelial cells regulate

63 angiogenesis and vessel maintenance. For example, vascular endothelial cadherin and its intracellular partners mediate adhesion via multiple mechanisms, including the direct activation of phosphatidylinositol -3 kinase (PI3K) and Rac, and the formation of complexes with VEGF receptors 70. Fascia adherens junction is one category of intercalated disks which maintain structural integrity and synchronized contraction of cardiac tissue. One component of this adherens junction, b-catenin, has been linked to hypertrophic cardiomyopathy and end-stage HF, suggesting some adherens junction proteins may have unique functions in different types or at least in different stages of

HF 71. HS and heparin are anticoagulant GAG, and are covalently attached to core proteins to generate HS proteoglycans (HSPGs) 72. HSPGs have been observed to involve in angiogenesis by mediating the angiogenic activity of the VEGF receptor 2

73. HSPGs are also found to interact with lipoproteins, cytokines or other proteins in the progression of atherosclerosis and diabetes 74. An animal study also found HS-

GAG functions might be associated with aging, thus partly supporting HFpEF as an age-related disorder 75. Phosphatidylinositol signaling system regulates a broad of molecular functions, which involve vascular patterning, inflammation, and cell signaling. For example, PI3K signaling mediates angiogenesis and expression of

VEGF in endothelial cells 76, and it has been shown to regulate cardiomyocyte size, survival and inflammation in cardiac hypertrophy and HF, probably by interacting with calcium signaling 77,78. Endocytosis involves in a variety of cell functions which can be linked to angiogenesis and cardiac dysfunction. For example, endocytosis regulates VEGF signaling during angiogenesis through spatial control of VEGF receptors endocytosis 79, and also involves in the desensitization and internalization of b-adrenergic receptors (bARs), which are critical regulators of cardiovascular physiology 80. Importantly, PI3K, as previously described, has also been found in regulating bARs endocytosis 81. In addition, a group of endocytic regulatory proteins -

64 C-terminal Eps15 homology domain-containing proteins (EHDs) have been linked to

Na+ and Ca2+ homeostasis, which is fundamental for cardiac arrhythmia 82. Details regarding the main functions of the HFpEF-specific pathways can be found in

Supplemental Table 3.6. Detailed knowledge of these relationships at the molecular level will allow researcher to understand the distinct mechanisms underlying HFpEF and enable the development of effective therapeutic strategies.

The key driver analysis has been done in the past to prioritize KD genes of coronary heart disease 83, type 2 diabetes 23, and obesity 84, but not for HF. In our key driver analysis based on shared pathways between HFrEF and HFpEF, we found that the KD genes were Collagen gene family, shared between axon guidance and ECM- receptor interaction, and HLA genes from CAMs pathway. HLA gene family plays a central role in immune system and has long been found to contribute to type 1 diabetes susceptibility 85. Collagen gene family, as previously described, encode proteins to regulate vascular patterning and maintain the structure and function of cardiomyocytes. These KD genes further highlighted the effect of angiogenesis

/vascular patterning and inflammation in HF. Importantly, genes COL1A1 and

COL3A1 were also found to be the KD genes for cardiovascular disease and type 2 diabetes 23, thus showing potential shared biological mechanisms underlying these related diseases as well as the pleiotropic effects of these KD genes. The KD genes for HFpEF are mainly from vascular smooth muscle contraction, phosphatidylinositol signal system and endocytosis, therefore additionally highlighted the functional roles of non-specific cell signaling and metabolism in HFpEF.

Our study has several unique strengths. This is the first study to systematically and integratively examine genetic predisposition of HF subtypes, using pathway and network approaches to complement current GWAS methodologies. Therefore, we have the potential to reveal the similarities and differences between HFrEF and

65 HFpEF with higher statistical power. Second, our study employed large and multiethnic populations which allows the detection of HF mechanisms shared across ethnicities. In addition, we have JHS and FHS women as replication populations for the WHI AA and CA women, strongly showing the robustness of our findings. One major limitation is that our analysis was performed among women only, which failed to examine the effect of gender on HF. Although similar mechanisms can be expected for men, evidence is still warranted to show why women are generally more vulnerable for HF, especially HFpEF. In addition, all the results were based upon germline mutations. Therefore, we are still unclear whether mutations in the identified genomic regions and pathways will impact downstream expression levels, and whether the identified genes and pathways are up-/down-regulated before and after

HF events. Future studies are warranted to provide evidence linking HF-related germline mutations to corresponding gene expressions.

3.6 Conclusion

From this integrative genetic pathway analysis of heart failure, we validated and identified 24 genetic loci for HF, 5 shared pathways for HFrEF and HFpEF, and 6

HFpEF-specific pathways, that followed the three themes of angiogenesis and vascular patterning, inflammation, and non-specific cell signaling and metabolism.

The results facilitated our understanding of the differential and similar mechanisms underlying subtypes of HF, thus providing potential targets for more effective therapies.

66 3.7 Tables and Figures

Table 3.1 Baseline Characteristics of African and Caucasian American Women in Study Populations a

Discovery Populations Replication Populations b WHI-AA WHI-CA JHS-AA FHS-CA (N=7,982) (N=4,133) (N=1,853) (N=1,755) No. of participants with HF, n (%) c 544 (6.8) 477 (11.5) 166 (9) 108 (6.2) HFpEF, n (%) 345 (63.4) 304 (63.7) 140 (84.3) 103 (95.4) HFrEF, n (%) 199 (36.6) 173 (36.3) 26 (15.7) 5 (4.6) Years of follow-up, year (SD) 10.0 (5.2) 8.9 (5.3) 6.4 (2.3) 32.5 (7.0) Age, year (SD) 61.5 (7.0) 65.6 (6.9) 55.5 (12.7) 34.8 (9.8) Region, n(%) d Northeast 1410 (17.7) 1039 (25.1) -- -- South 3902 (48.9) 879 (21.3) -- -- Midwest 1854 (23.2) 1185 (28.7) -- -- West 816 (10.2) 1030 (24.9) -- -- Current smoking, n (%) 901 (11.3) 449 (10.9) 202 (10.9) 723 (41.2) Cardiovascular disease, n (%) e 1355 (17) 647 (15.7) 191 (10.3) 0 (0) Diabetes, n (%) 1054 (13.2) 285 (6.9) 436 (23.5) 8 (0.5) BMI, kg/m2 (SD) 31 (6.5) 29.6 (6.1) 33.3 (7.8) 24 (4.4) Physical Activity, MET-h/week (SD) d 9.7 (12.7) 10.2 (12.8) -- -- Alcohol drinking, serving/week (SD) 1.1 (3.9) 2.3 (5.1) 1.5 (6.2) 3.5 (5.8) Total energy intake, kcal/day (SD) 1614.1 (759.5) 1683.3 (663.1) 1826.2 (751.8) 1757 (579.6)

Abbreviations: AA (African American), ADHF (acute decompensated heart failure), BMI (body mass index), CA (Caucasian American), FHS (Framingham Heart Study), HF (heart failure), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection

67 fraction), JHS (Jackson Heart Study), LVEF (left ventricular ejection fraction), SD (standard deviation), WHI (Women’s Health Initiative Study). a Continuous variables were presented as mean (SD). b 62.0% in the JHS and 52.5% in the FHS participants were women. c HF was defined as having ADHF or chronic stable HF, and further classified as HFrEF (LVEF< 45%) or HFpEF (LVEF ³ 45%). For WHI patients with ADHF, LVEF closest to the diagnosis date of ADHF was used, and for patients with chronic stable HF, baseline LVEF or lowest estimated LVEF on medical records were used to classify HF subtypes. In the JHS, considering over 50% of ADHF patients were missing LVEF, baseline LVEF was used to determine HFrEF and HFpEF for those without coronary heart disease before HF, and for chronic stable HF patients, baseline LVEF was used. In the FHS, HF was defined as incident or prevalent congestive HF. Since LVEF was not measured at baseline, LVEF closest to the diagnosis date of HF was used to define HFrEF and HFpEF. Participants without LVEF measurement were excluded in the analysis. d Results in the replication populations were not presented for variables which were not applicable (region) or measured in different scales (physical activity) . e Cardiovascular disease was defined as self-reported coronary heart disease, heart failure, stroke, and peripheral artery disease at baseline.

68 Table 3.2 Validated and Newly Discovered Loci for Heart Failure among African Americans and Caucasian Americans in the Women’s

Health Initiative Study

HFrEF HFpEF Lead SNP Candidate Gene a CA AA CA AA b (SE) P value b (SE) P value b (SE) P value b (SE) P value Validation and Allocation of Previous Known Loci rs1869717 MAML3 -- b -- -0.23 (0.1) 0.03 ------rs12310617 Near TEAD4, TSPAN9 -0.69 (0.2) <0.01 ------rs760762 PLCG1 -- -- 0.26 (0.1) 0.04 ------rs10124550 BNC2 0.53 (0.3) 0.04 ------rs12420422 Near BSX, CLMP -- -- 0.65 (0.3) 0.04 ------rs11014306 PRTFDC1 -- -- -0.24 (0.1) 0.02 ------rs1606355 Near CCDC91, FAR2 -- -- -0.25 (0.1) 0.04 ------rs548097 Near TBC1D4 -- -- 0.30 (0.1) 0.03 ------rs7687921 GPM6A ------0.45 (0.2) 0.04 -- -- rs2341263 Near LRRIQ3 ------0.22 (0.1) 0.03 -- -- rs12733856 Near DUSP10 ------0.16 (0.1) 0.04 rs17140821 Near HDAC9, TWIST1 ------0.21 (0.1) 0.04 rs4006531 Near TRIB1 ------0.21 (0.1) 0.04 rs940274 ABCA12 ------0.29 (0.1) <0.01 rs7190256 ZFHX3 ------0.98 (0.5) 0.04 -- -- rs6759518 SLC30A3 ------0.36 (0.1) <0.01 rs780094 GCKR ------0.23 (0.1) 0.03 rs4420638 Near APOE, APOC 0.64 (0.2) <0.01 -- -- 0.33 (0.1) 0.02 0.38 (0.1) <0.01 Discovery of Novel Loci rs12719020 Near COBL -1.08 (0.2) 3.1×10-6 ------rs12067046 Near PLXNA2 ------0.47 (0.1) 4.2×10-6 rs12599260 Near CNEP1R1, HEATR3 ------0.42 (0.1) 4.9×10-6 -- --

69 rs149663839 Near ABTB2, CAT ------2.61 (0.9) 1.4×10-6 rs114553497 Near CCSAP, ACTA1 ------0.92 (0.2) 5.2×10-7 rs10229703 Near CALD1, AGBL3 ------0.49 (0.1) 3.6×10-6

Abbreviations: AA (African Americans), CA (Caucasian Americans), CHR (chromosome), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction), MAF (minor allele frequency), SE (standard error), SNP (single-nucleotide polymorphism). a Closest genes within ± 300 kb of the lead SNPs. b Variants failed to show significant P values.

70 Table 3.3 Biological Pathways Enriched for HFrEF and HFpEF among African and Caucasian American Women across Ethnicities a

HFrEF HFpEF Pathway CA AA CA AA Angiogenesis and Vascular Patterning Extracellular matrix-receptor interaction X X X X Cell adhesion molecules X X X X Axon guidance X X X X Netrin-1 signaling X X X X Developmental biology X X X X Vascular smooth muscle contraction X X X Adherens junction X X X Endocytosis X X X HS-GAG biosynthesis X X X HS-GAG degradation X X Phosphatidylinositol signal system X X Mucin type O-glycan biosynthesis X X Pre-Notch expression and processing X Signaling by BMP X Cell-cell junction organization X Intrinsic pathway X Inflammation Extracellular matrix-receptor interaction X X X X Cell adhesion molecules X X X X Netrin-1 signaling X X X X HS-GAG biosynthesis X X X HS-GAG degradation X X Phosphatidylinositol signal system X X Rac1 pathway X X NRAGE signals death through JNK X X Signaling by Rho GTPases X X Cell-cell junction organization X Intrinsic pathway X Non-specific Cell Signaling and Metabolism Vascular smooth muscle contraction X X X Endocytosis X X X Phosphatidylinositol signal system X X Ion transport by P-type ATPases X Ion channel transport X

Abbreviations: AA (African Americans), BMP (Bone morphogenetic proteins), CA (Caucasian Americans), HS (Heparan sulfate), GAG (glycosaminoglycan), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction), JNK (JUN Kinase). a Biological pathways presented in the table were identified from AA and CA women in the Women’s Health Initiative Study, and validated in the Jackson Heart Study and Framingham Heart Study. A pathway was marked with “X” when nominal P value from GSA-SNP or Mergeomics < 0.05 in the Women’s Health Initiative Study.

71

Pathways by Three Themes

Pathways for HFpEF Angiogenesis/vascular patterning Vascular smooth muscle contraction Angiogenesis/vascular patterning & Inflammation Angiogenesis/vascular patterning Pathways for HFrEF Endocytosis & Non-specific cell signaling/metabolism Angiogenesis/vascular patterning Extracellular matrix-receptor interaction & Inflammation & Non-specific cell signaling/metabolism Cell adhesion molecules Adherens junction Netrin-1 signaling Axon guidance HS-GAG biosynthesis Developmental biology HS-GAG degradation

Phosphatidylinositol signal system

Figure 3.1 Venn Diagram for Biological Pathways Enriched for HFrEF and HFpEF among African and Caucasian American Women across Ethnicities

Abbreviations: GAG (glycosaminoglycan), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction), HS (Heparan sulfate).

72

Developmental Biology

Shared Genes

Axon Guidance Netrin-1 Signaling

Key Driver Genes

Extracellular Matrix-Receptor Interaction

Cell Adhesion Molecules

Figure 3.2 Network of 5 Pathways Enriched for HFrEF and HFpEF with Top 10

Key Driver Genes among African and Caucasian American Women

The diamond nodes represent pathway and the ellipse modes represent genes, and the edge shows the interaction, that is, the association between a gene and a pathway. The color nodes are: red, top 10 key driver genes; light green, genes involved in ³ 2 pathways; others are pathway-specific genes. The figure was created using Cytoscape 86. Abbreviations: AXON (axon guidance), CAMS (Cell adhesion molecules), DB (developmental biology), ECM (Extracellular matrix-receptor interaction), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction), NS (Netrin-1 signaling).

73

Adhesions Junctions Heparan Sulfate (HS) – Glycosaminoglycan (GAG) Biosynthesis

HS-GAG Degradation Key Driver Genes Shared Genes

Vascular Smooth Muscle Contraction

Endocytosis

Phosphatidylinositol signal system

Figure 3.3 Network of 6 Pathways Enriched for HFpEF with Top 10 Key Driver

Genes among African and Caucasian American Women

The diamond nodes represent pathway and the ellipse modes represent genes, and the edge shows the interaction, that is, the association between a gene and a pathway. The color nodes are: red, top 10 key driver genes; light green, genes involved in ³ 2 pathways; others are pathway-specific genes. The figure was created using Cytoscape 86. Abbreviations: AJ (adherens junction), ECT (endocytosis), HGB (heparan sulfate- glycosaminoglycan biosynthesis), HGD (heparan sulfate-glycosaminoglycan degradation), HFpEF (heart failure with preserved ejection fraction), PSS (phosphatidylinositol signal system), VSMC (vascular smooth muscle contraction).

74 3.8 Supplemental Material

Supplemental Table 3.1 Validation and Allocation of Previously Identified GWAS Loci for Heart Failure in the Women’s Health

Initiative Study a

Position b Minor/Major Associations CHR Lead SNP Candidate Gene c MAF (hg 19) Allele Beta (SE) P value Caucasian American - HFrEF 9p22 rs10124550 9:16512748 BNC2 A/G 0.07 0.53 (0.3) 0.04 12p13 rs12310617 12:3169493 Near TEAD4, TSPAN9 C/T 0.05 -0.69 (0.2) <0.01 19q13 rs4420638 19:45422946 Near APOE, APOC1, APOC1P1, APOC2 A/G 0.17 0.64 (0.2) <0.01 African American - HFrEF 4q31 rs1869717 4:140751121 MAML3 G/C 0.28 -0.23 (0.1) 0.03 8q21 rs6473383 8:83669120 G/A 0.24 0.31 (0.1) <0.01 10p12 rs11014306 10:25224462 PRTFDC1 T/C 0.33 -0.24 (0.1) 0.02 11q24 rs12420422 11:122880281 Near BSX, CLMP G/A 0.04 0.65 (0.3) 0.04 12p11 rs1606355 12:28815792 Near CCDC91, FAR2 A/G 0.21 -0.25 (0.1) 0.04 13q22 rs548097 13:75776268 Near TBC1D4 T/G 0.15 0.30 (0.1) 0.03 20q12 rs760762 20:39776046 PLCG1 T/C 0.30 0.26 (0.1) 0.04 Caucasian American - HFpEF 1p31 rs2341263 1:74083035 Near LRRIQ3 C/T 0.26 0.22 (0.1) 0.03 4q34 rs7687921 4:176859026 GPM6A A/T 0.07 0.45 (0.2) 0.04 8q21 rs6473383 8:83669120 G/A 0.15 -0.28 (0.1) 0.02 16q22 rs7190256 16:72996983 ZFHX3 T/C 0.02 0.98 (0.5) 0.04 19q13 rs4420638 19:45422946 Near APOE, APOC1, APOC1P1, APOC2 A/G 0.17 0.33 (0.1) 0.02

75 African American - HFpEF 1q41 rs12733856 1:221551539 Near DUSP10 C/A 0.37 -0.16 (0.1) 0.04 2p23 rs6759518 2:27486595 SLC30A3 G/C 0.07 -0.36 (0.1) <0.01 2p23 rs780094 2:27741237 GCKR C/T 0.24 0.23 (0.1) 0.03 2q35 rs940274 2:215951630 ABCA12 C/T 0.27 0.29 (0.1) <0.01 7p21 rs17140821 7:19217204 Near HDAC9, TWIST1 G/A 0.14 0.21 (0.1) 0.04 8q24 rs4006531 8:126867505 Near TRIB1 A/G 0.46 -0.21 (0.1) 0.04 19q13 rs4420638 19:45422946 Near APOE, APOC1, APOC1P1, APOC2 A/G 0.17 0.38 (0.1) <0.01

Abbreviations: CHR (chromosome), GWAS (genome-wide association study), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction), MAF (minor allele frequency), SE (standard error), SNP (single-nucleotide polymorphism). a African Americans (N=7,982) and Caucasian Americans (N=4,133) within the Women’s Health Initiative Study. b Closest genes within ± 100 kb of the lead SNPs. c Allele with elevated risk for heart failure were underlined.

76 Supplemental Table 3.2 Novel Genome-Wide Significant SNPs for Heart Failure among African and Caucasian American Women

Minor/ c d Position Discovery Replication CHR Lead SNP Candidate Gene a Major (hg 19) MAF Beta (SE) P value Beta (SE) P value Allele b Caucasian American - HFrEF -6 7p12 rs12719020 7:51066547 Near COBL C/A 0.04 -1.08 (0.2) 3.1×10 -2.59 (0.9) 0.02 Caucasian American - HFpEF 16q12 rs12599260 16:50093238 Near CNEP1R1, HEATR3 A/G 0.31 -0.42 (0.1) 4.9×10-6 -0.43 (0.2) <0.01 African American - HFpEF 1q32 rs12067046 1:208913216 Near PLXNA2 G/A 0.16 -0.47 (0.1) 4.2×10-6 -0.44 (0.2) <0.01 1q42 rs114553497 1:229507916 Near CCSAP, ACTA1 C/T 0.03 -0.92 (0.2) 5.2×10-7 -0.67 (0.3) 0.03 7q33 rs10229703 7:134661230 Near CALD1, AGBL3 A/G 0.28 0.49 (0.1) 3.6×10-6 0.41 (0.2) 0.02 11p13 rs149663839 11:34408645 Near ABTB2, CAT A/G 0.02 2.61 (0.9) 1.4×10-6 1.69 (0.8) <0.01

Abbreviations: CHR (chromosome), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction), MAF (minor allele frequency), SE (standard error), SNP (single-nucleotide polymorphism). a Closest genes within ± 300 kb of the lead SNPs. b Allele with elevated risk for heart failure were underlined. c The discovery populations were African Americans (N=7,982) and Caucasian Americans (N=4,133) within the Women’s Health Initiative study. d The replication populations were African Americans women (N=1,853) in the Jackson Heart Study and Caucasian American women (N=1,755) in the Framingham Heart Study.

77 Supplemental Table 3.3 Common Name for Genes Close to Validated and Newly Discovered Heart Failure Loci

Lead SNP References Close Gene a Common Gene Name Validated loci rs1869717 He, et al 10 MAML3 Mastermind Like Transcriptional Coactivator 3 rs12310617 He, et al 10 Near TEAD4 TEA Domain Transcription Factor 4 Near TSPAN9 Tetraspanin 9 rs760762 He, et al 10 PLCG1 Phospholipase C Gamma 1 rs10124550 Yu, et al 30 BNC2 Basonuclin 2 rs12420422 He, et al 10 Near BSX Brain Specific Homeobox Near CLMP Coxsackie- And Adenovirus Receptor-Like Membrane Protein rs11014306 Yu, et al 30 PRTFDC1 Phosphoribosyl Transferase Domain Containing 1 rs1606355 Yu, et al 30 Near CCDC91 Coiled-Coil Domain Containing 91 Near FAR2 Fatty Acyl-CoA Reductase 2 rs548097 Smith, et al 87 Near TBC1D4 TBC1 Domain Family Member 4 rs7687921 Morrison, et al 88 GPM6A Glycoprotein M6A rs2341263 Yu, et al 30 Near LRRIQ3 Leucine Rich Repeats And IQ Motif Containing 3 rs12733856 Morrison, et al 88 Near DUSP10 Dual Specificity Phosphatase 10 rs17140821 He, et al 10 Near HDAC9 Histone Deacetylase 9 Near TWIST1 Twist Family BHLH Transcription Factor 1 rs4006531 Yu, et al 30 Near TRIB1 Tribbles Pseudokinase 1 rs940274 He, et al 10 ABCA12 ATP Binding Cassette Subfamily A Member 12 rs7190256 He, et al 10 ZFHX3 Zinc Finger Homeobox 3 rs6759518 He, et al 10 SLC30A3 Solute Carrier Family 30 Member 3 rs780094 He, et al 10 GCKR Glucokinase Regulator rs4420638 He, et al 10 Near APOE Apolipoprotein E Near APOC Apolipoprotein C Discovered loci rs12719020 Near COBL Cordon-Bleu WH2 Repeat Protein rs12067046 Near PLXNA2 Plexin A2 rs12599260 Near CNEP1R1 Nuclear Envelope Phosphatase 1-Regulatory Subunit 1

78 Near HEATR3 HEAT Repeat Containing 3 rs149663839 Near ABTB2 Ankyrin Repeat And BTB Domain Containing 2 Near CAT Catalase rs114553497 Near CCSAP Centriole, Cilia And Spindle Associated Protein Near ACTA1 Actin, Alpha 1, Skeletal Muscle rs10229703 Near CALD1 Caldesmon 1 Near AGBL3 ATP/GTP Binding Protein Like 3 a Closest genes within ± 300 kb of the lead SNPs.

79 Supplemental Table 3.4 Pathways Identified Specifically for HFrEF by Multiple Pathway Methodologies a

Pathways Discovery Findings b Validation Findings c HFrEF - CA HFrEF - AA HFrEF - CA HFrEF - AA HFpEF - CA HFpEF - AA Vascular smooth muscle contraction G, M -- d G, M G Extracellular matrix-receptor interaction G, M -- G G, M G Cell adhesion molecules G, M -- G G G, M Intrinsic pathway G, M -- Rac1 pathway G, M -- M Mucin type O-glycan biosynthesis G, M -- G NRAGE signals death through JNK G, M -- G, M Signaling by Rho GTPases G, M -- M Developmental biology G, M G -- G G Axon guidance G, M G -- G, M G Netrin-1 signaling G, M G -- G G

Abbreviations: AA (African American), CA (Caucasian American), JNK (JUN Kinase), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction). a G represents GSA-SNP, M represents Mergeomics. b The discovery findings were significant pathways (FDR-adjusted q value < 0.2) for HFrEF identified from the Women’s Health Initiative Study (N=7,982 for AA and N=4,133 for CA) and replicated from the Jackson Heart Study (N=1,853) and Framingham Heart Study (N= 1,755). c The validation findings were presented as pathways with nominal P value < 0.05 from the Women’s Health Initiative Study (N=7,982 for AA and N=4,133 for CA). d Pathways suggested by discovery findings no need to be validated in the same ethnic group.

80 Supplemental Table 3.5 Pathways Identified Specifically for HFpEF by Multiple Pathway Methodologies a

Pathways Discovery Findings b Validation Findings c HFpEF - CA HFpEF - AA HFrEF - CA HFrEF - AA HFpEF - CA HFpEF - AA Axon guidance G, M G G, M -- d G Phosphatidylinositol signal system G, M -- G, M Adherens junction G, M M -- G Pre-NOTCH expression and processing G, M -- HS-GAG degradation G, M G, M -- -- HS-GAG biosynthesis G, M G -- G, M Ion transport by P-type ATPases G, M G, M -- Ion channel transport G, M -- Cell adhesion molecules G, M G, M G G -- Long-term depression G, M G -- Endocytosis G, M M G -- Signaling by BMP G, M -- The role of Nef in HIV-1 replication G, M M -- and disease pathogenesis Cell-cell junction organization G, M --

Abbreviations: AA (African American), BMP (bone morphogenetic protein), CA (Caucasian American), GAG (glycosaminoglycan), HFpEF (heart failure with preserved ejection fraction), HS (heparan sulfate). a G represents GSA-SNP, M represents Mergeomics. b The discovery findings were significant pathways (FDR-adjusted q value < 0.2) for HFpEF identified from the Women’s Health Initiative Study (N=7,982 for AA and N=4,133 for CA) and replicated from the Jackson Heart Study (N=1,853) and Framingham Heart Study (N= 1,755). Pathways shared with HFrEF were not included. c The validation findings were presented as pathways with nominal P value < 0.05 from the Women’s Health Initiative Study (N=7,982 for AA and N=4,133 for CA). d Pathways suggested by discovery findings no need to be validated in the same ethnic group.

81 Supplemental Table 3.6 Characteristics of Identified Significant Pathways for Heart Failure by Ethnicities

Pathways No. of genes Min. P value Main functions Caucasian Americans – HFrEF (n=5) Vascular smooth 115 4.29×10-4 The vascular smooth muscle cell (VSMC) is a highly specialized cell whose principal muscle contraction function is contraction. On contraction, VSMCs shorten, thereby decreasing the diameter of a blood vessel to regulate the blood flow and pressure. 18 Extracellular matrix 84 3.79×10-5 The extracellular matrix consists of a complex mixture of structural and functional -receptor interaction macromolecules and serves an important role in tissue and organ morphogenesis and in the maintenance of cell and tissue structure and function. 18 Cell adhesion 134 3.32×10-6 Cell adhesion molecules are glycoproteins expressed on the cell surface and play a molecules critical role in a wide array of biologic processes, including hemostasis, immune response, inflammation, etc. 18 Intrinsic 23 2.85×10-5 The second phase of blood coagulation or clotting – the activation of prothrombin. 20 prothrombin activation pathway Rac-1 pathway 23 2.68×10-38 Rac-1 is a small G-protein in the Rho family that regulates cell motility in response to extracellular signals. 20 African Americans – HFrEF (n=6) Mucin type O- 30 6.42×10-5 O-glycans are a class of glycans that modify serine or threonine residues of glycan biosynthesis proteins. Mucins are highly O-glycosylated glycoproteins ubiquitous in mucous secretions on cell surfaces and in body fluids. 18 NRAGE signals 43 2.28×10-5 Once bound by either NGF or proNGF, p75NTR interacts with NRAGE, thus leading death through JUN to phosphorylation and activation of JNK. JNK controls apoptosis in two ways: it Kinase (JNK) induces transcription of pro-apoptotic genes, and directly activates the cell death machinery. 19

82 Signaling by Rho 113 1.04×10-7 Rho GTPases belong to the Rho family, typically binary switches controlling a GTPases variety of biological processes, including dynamic rearrangements of the plasma membrane-associated actin cytoskeleton, regulating actomyosin contractility and microtubule dynamics, cell growth control, cytokinesis, cell motility, cell-cell and cell-extracellular matrix adhesion, cell transformation and invasion, and development. 19 Developmental 396 6.18×10-5 This pathway includes processes that a fertilized egg gives rise to the diverse tissues biology of the body. 19 Axon guidance 251 7.64×10-8 This pathway is the process by which neurons send out axons to reach the correct targets. 19 Netrin-1 signaling 41 2.38×10-5 Netrins are proteins that play a crucial role in neuronal migration and in axon guidance. Netrin-1 is the most studied member of the family and has been shown to play a crucial role in neuronal navigation during nervous system development mainly through its interaction with its receptors DCC and UNC5. 19 Caucasian Americans – HFpEF (n=8) Phosphatidylinositol 76 1.63×10-5 Phosphatidylinositol is a small lipid molecule which can be phosphorylated by a host signal system of lipid kinases to produce a variety of phosphatidylinositol monophosphates (PI3P, PI4P, and PI5P), diphosphates, and a triphosphate that are collectively known as phosphoinositides. Phosphoinositides are universal signaling entities that regulate cell activities through direct interaction with membrane proteins or through membrane recruitment of cytosolic proteins containing domains that directly bind phosphoinositides. 18 Adherens junction 75 1.19×10-9 Cell-cell adherens junction is the most common type of intercellular adhesions, and is important for maintaining tissue architecture and cell polarity and can limit cell movement and proliferation. 18 Pre-NOTCH 44 2.45×10-6 This pathway is the expression and processing of nascent forms of NOTCH expression and precursors, which undergo extensive posttranslational modifications in the processing endoplasmic reticulum and Golgi apparatus to become functional. 19

83 Heparan sulfate 20 8.81×10-5 This pathway is the degradation of heparan sulfate/heparin. HS-GAG, once combined (HS)- with the core proteins, can interact with growth factors, lipoproteins, cytokines or glycosaminoglycan other proteins in the progression of various disorders, including atherosclerosis and (GAG) degradation diabetes. 19 HS-GAG 31 3.73×10-4 This pathway is the biosynthesis of heparan sulfate/heparin. 19 biosynthesis Axon guidance 251 1.44×10-9 Also found for HFrEF among AA Ion transport by P- 34 2.92×10-6 The P-type ATPases are a large group of ion and lipid pumps, and serve as a basis for type ATPases nerve impulses, relaxation of muscles, secretion and absorption in the kidney, absorption of nutrient in the intestine and other physiological processes. 19 Ion channel 55 8.45×10-6 This is a group of ion channels that mediate the flow of ions across the plasma transport membrane of cells. 19 African Americans – HFpEF (n=7) Cell adhesion 134 6.88×10-6 Also found for HFrEF among CA molecules Long term 70 1.14×10-5 It is a process involving a decrease in the synaptic strength between parallel fiber and depression Purkinje cells, and it is a molecular and cellular basis for cerebellar learning. 18 Endocytosis 183 2.76×10-4 Endocytosis is a mechanism for cells to remove ligands, nutrients, and plasma membrane proteins, and lipids from the cell surface, bringing them into the cell interior. 18 Signaling by Bone 23 2.93×10-28 BMPs are members of the Transforming growth factor-Beta (TGFB) family. BMP morphogenetic signaling is linked to a wide variety of clinical disorders, including vascular diseases, proteins (BMP) skeletal diseases and cancer. 19 The role of Nef in 28 1.24×10-9 This pathway plays an important role in several steps of HIV replication. 19 HIV-1 replication and disease pathogenesis HS-GAG 20 9.34×10-7 Also found for HFpEF among CA degradation

84 Cell-cell junction 56 7.77×10-8 Epithelial cell-cell contacts consist of three major adhesion systems: adherens organization junctions (AJs), tight junctions (TJs), and desmosomes. These adhesion systems have different functions and compositions. Adherens junctions play a critical role in initiating cell-cell contacts and promoting the maturation and maintenance of the contacts. Tight junctions form physical barriers in various tissues and regulate paracellular transport of water, ions, and small water-soluble molecules. Desmosomes mediate strong cell adhesion linking the intermediate filament cytoskeletons between cells and playing roles in wound repair, tissue morphogenesis, and cell signaling. 19

Abbreviations: AA (African American), CA (Caucasian American), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction).

85 A. B.

86 C. D.

Supplemental Figure 3.1 Manhattan plots of heart failure among African Americans (n=7,982) and Caucasian Americans (n=4,133) in the Women’s Health Initiative Study

(A: HFrEF among WHI-CA; B: HFrEF among WHI-AA; C: HFpEF among WHI-CA; D: HFpEF among WHI-AA) Abbreviations: AA (African Americans), CA (Caucasian Americans), HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction), WHI (Women’s Health Initiative).

87 A.

88 B.

89 Supplemental Figure 3.2 Network key driver genes of the pathways enriched for HFrEF and HFpEF among African and Caucasian

American Women

(A: HFrEF and HFpEF shared; B: HFpEF specific) Top 10 ranked multi-tissue key driver genes of the heart failure pathways in the Bayesian network and the protein-protein interaction network. The modes represent genes, and the edge shows the interaction. The color nodes are: light red, top 10 ranked key driver genes; mint, genes interact with the key drivers. The figure was created using Cytoscape 86. Abbreviations: HFpEF (heart failure with preserved ejection fraction), HFrEF (heart failure with reduced ejection fraction).

90 3.9 References

1. Benjamin EJ, Virani SS, Callaway CW, et al. Heart Disease and Stroke

Statistics-2018 Update: A Report From the American Heart Association.

Circulation 2018;137:e67-e492.

2. Eaton CB, Pettinger M, Rossouw J, et al. Risk Factors for Incident

Hospitalized Heart Failure With Preserved Versus Reduced Ejection Fraction

in a Multiracial Cohort of Postmenopausal Women. Circulation Heart failure

2016;9.

3. Dries DL, Exner DV, Gersh BJ, Cooper HA, Carson PE, Domanski MJ.

Racial differences in the outcome of left ventricular dysfunction. N Engl J

Med 1999;340:609-16.

4. Bahrami H, Kronmal R, Bluemke DA, et al. Differences in the incidence of

congestive heart failure by ethnicity: the multi-ethnic study of atherosclerosis.

Archives of internal medicine 2008;168:2138-45.

5. Desai CS, Ning H, Lloyd-Jones DM. Competing cardiovascular outcomes

associated with electrocardiographic left ventricular hypertrophy: the

Atherosclerosis Risk in Communities Study. Heart (British Cardiac Society)

2012;98:330-4.

6. Yancy CW, Jessup M, Bozkurt B, et al. 2013 ACCF/AHA guideline for the

management of heart failure: executive summary: a report of the American

College of Cardiology Foundation/American Heart Association Task Force on

practice guidelines. Circulation 2013;128:1810-52.

7. Schwartzenberg S, Redfield MM, From AM, Sorajja P, Nishimura RA,

Borlaug BA. Effects of vasodilation in heart failure with preserved or reduced

ejection fraction implications of distinct pathophysiologies on response to

therapy. Journal of the American College of Cardiology 2012;59:442-51.

91 8. Zile MR, Brutsaert DL. New concepts in diastolic dysfunction and diastolic

heart failure: Part II: causal mechanisms and treatment. Circulation

2002;105:1503-8.

9. Lopes LR, Elliott PM. Genetics of heart failure. Biochimica et biophysica acta

2013;1832:2451-61.

10. He L, Kernogitski Y, Kulminskaya I, et al. Pleiotropic Meta-Analyses of

Longitudinal Studies Discover Novel Genetic Variants Associated with Age-

Related Diseases. Frontiers in genetics 2016;7:179.

11. Franceschini N, Kopp JB, Barac A, et al. Association of APOL1 With Heart

Failure With Preserved Ejection Fraction in Postmenopausal African

American Women. JAMA cardiology 2018;3:712-20.

12. Tayal U, Prasad S, Cook SA. Genetics and genomics of dilated

cardiomyopathy and systolic heart failure. Genome medicine 2017;9:20.

13. Nam D, Kim J, Kim SY, Kim S. GSA-SNP: a general approach for gene set

analysis of polymorphisms. Nucleic acids research 2010;38:W749-54.

14. Arneson D, Bhattacharya A, Shu L, Makinen VP, Yang X. Mergeomics: a

web server for identifying pathological pathways, networks, and key

regulators via multidimensional data integration. BMC genomics 2016;17:722.

15. Design of the Women's Health Initiative clinical trial and observational study.

The Women's Health Initiative Study Group. Controlled clinical trials

1998;19:61-109.

16. Rosamond WD, Chang PP, Baggett C, et al. Classification of heart failure in

the atherosclerosis risk in communities (ARIC) study: a comparison of

diagnostic criteria. Circulation Heart failure 2012;5:152-9.

92 17. Hall PS, Nah G, Howard BV, et al. Reproductive Factors and Incidence of

Heart Failure Hospitalization in the Women's Health Initiative. Journal of the

American College of Cardiology 2017;69:2517-26.

18. Kyoto Encyclopedia of Genes and Genomes. 2018. (Accessed Oct 17th, 2018,

at https://www.genome.jp/kegg/.)

19. Reactome. 2018. (Accessed Oct 17th, 2018, at https://reactome.org/.)

20. BioCarta. 2018. (Accessed Oct 17th, 2018, at

http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=CP:BI

OCARTA.)

21. Zhu J, Zhang B, Smith EN, et al. Integrating large-scale functional genomic

data to dissect the complexity of yeast regulatory networks. Nature genetics

2008;40:854-61.

22. Wang IM, Zhang B, Yang X, et al. Systems analysis of eleven rodent disease

models reveals an inflammatome signature and key drivers. Molecular

systems biology 2012;8:594.

23. Chan KH, Huang YT, Meng Q, et al. Shared molecular pathways and gene

networks for cardiovascular disease and type 2 diabetes mellitus in women

across diverse ethnicities. Circulation Cardiovascular genetics 2014;7:911-9.

24. GWAS Catalog. 2018. (Accessed Oct 18th, 2018, at

https://www.ebi.ac.uk/gwas/.)

25. MacRae CA. The Genetics of Congestive Heart Failure. Heart failure clinics

2010;6:223-30.

26. Skrzynia C, Berg JS, Willis MS, Jensen BC. Genetics and Heart Failure: A

Concise Guide for the Clinician. Current Cardiology Reviews 2015;11:10-7.

93 27. Kitagawa M. Notch signalling in the nucleus: roles of Mastermind-like

(MAML) transcriptional coactivators. Journal of biochemistry 2016;159:287-

94.

28. Wang X, Freire Valls A, Schermann G, et al. YAP/TAZ Orchestrate VEGF

Signaling during Developmental Angiogenesis. Developmental cell

2017;42:462-78.e7.

29. Li C, He J, Chen J, et al. Genome-Wide Gene-Potassium Interaction Analyses

on Blood Pressure: The GenSalt Study (Genetic Epidemiology Network of

Salt Sensitivity). Circulation Cardiovascular genetics 2017;10.

30. Yu B, Zheng Y, Alexander D, et al. Genome-wide association study of a heart

failure related metabolomic profile among African Americans in the

Atherosclerosis Risk in Communities (ARIC) study. Genetic epidemiology

2013;37:840-5.

31. Zheng Y, Yu B, Alexander D, et al. Associations between metabolomic

compounds and incident heart failure among African Americans: the ARIC

Study. American journal of epidemiology 2013;178:534-42.

32. Ortiz-Zapater E, Santis G, Parsons M. CAR: A key regulator of adhesion and

inflammation. The international journal of biochemistry & cell biology

2017;89:1-5.

33. GeneCards. 2018. (Accessed Nov 7th, 2018, at

https://www.genecards.org/cgi-

bin/carddisp.pl?gene=CLMP&keywords=CLMP.)

34. Lisewski U, Shi Y, Wrackmeyer U, et al. The tight junction protein CAR

regulates cardiac conduction and cell-cell communication. The Journal of

experimental medicine 2008;205:2369-79.

94 35. Lawson ND, Mugford JW, Diamond BA, Weinstein BM. phospholipase C

gamma-1 is required downstream of vascular endothelial growth factor during

arterial development. Genes & development 2003;17:1346-51.

36. GeneCards-PRTFDC1. 2018. (Accessed Nov 11th, 2018, at

https://www.genecards.org/cgi-

bin/carddisp.pl?gene=PRTFDC1&keywords=PRTFDC1.)

37. GeneCards-FAR2. 2018. (Accessed Nov 11th, 2018, at

https://www.genecards.org/cgi-

bin/carddisp.pl?gene=FAR2&keywords=FAR2.)

38. Moltke I, Grarup N, Jorgensen ME, et al. A common Greenlandic TBC1D4

variant confers muscle insulin resistance and type 2 diabetes. Nature

2014;512:190-3.

39. Wang H, Zhu H, Guo Q, et al. Overlapping Mechanisms of Peripheral Nerve

Regeneration and Angiogenesis Following Sciatic Nerve Transection.

Frontiers in cellular neuroscience 2017;11:323.

40. Chatterjee TK, Aronow BJ, Tong WS, et al. Human coronary artery

perivascular adipocytes overexpress genes responsible for regulating vascular

morphology, inflammation, and hemostasis. Physiological genomics

2013;45:697-709.

41. Lehmann LH, Worst BC, Stanmore DA, Backs J. Histone deacetylase

signaling in cardioprotection. Cellular and molecular life sciences : CMLS

2014;71:1673-90.

42. Qin Q, Xu Y, He T, Qin C, Xu J. Normal and disease-related biological

functions of Twist1 and underlying molecular mechanisms. Cell research

2012;22:90-106.

95 43. Wang L, Jing J, Fu Q, et al. Association study of genetic variants at newly

identified lipid gene TRIB1 with coronary heart disease in Chinese Han

population. Lipids Health Dis 2015;14:46.

44. Fu Y, Mukhamedova N, Ip S, et al. ABCA12 regulates ABCA1-dependent

cholesterol efflux from macrophages and the development of atherosclerosis.

Cell metabolism 2013;18:225-38.

45. Zhai C, Cong H, Liu Y, et al. Rs7193343 polymorphism in zinc finger

homeobox 3 (ZFHX3) gene and atrial fibrillation: an updated meta-analysis of

10 case-control comparisons. BMC cardiovascular disorders 2015;15:58.

46. Turcot V, Lu Y, Highland HM, et al. Protein-altering variants associated with

body mass index implicate pathways that control energy intake and

expenditure in obesity. Nature genetics 2018;50:26-41.

47. Cid-Soto MA, Martinez-Hernandez A, Garcia-Ortiz H, et al. Gene variants in

AKT1, GCKR and SOCS3 are differentially associated with metabolic traits in

Mexican Amerindians and Mestizos. Gene 2018;679:160-71.

48. Perez-Martinez P, Corella D, Shen J, et al. Association between glucokinase

regulatory protein (GCKR) and apolipoprotein A5 (APOA5) gene

polymorphisms and triacylglycerol concentrations in fasting, postprandial, and

fenofibrate-treated states. The American journal of clinical nutrition

2009;89:391-9.

49. Kei AA, Filippatos TD, Tsimihodimos V, Elisaf MS. A review of the role of

apolipoprotein C-II in lipoprotein metabolism and cardiovascular disease.

Metabolism: clinical and experimental 2012;61:906-21.

50. Haan MN, Mayeda ER. Apolipoprotein E Genotype and Cardiovascular

Diseases in the Elderly. Current cardiovascular risk reports 2010;4:361-8.

96 51. Torn C, Hadley D, Lee HS, et al. Role of Type 1 Diabetes-Associated SNPs

on Risk of Autoantibody Positivity in the TEDDY Study. Diabetes

2015;64:1818-29.

52. Sun Q, Liu S, Liu K, Jiao K. Role of Semaphorin Signaling During

Cardiovascular Development. Journal of the American Heart Association

2018;7.

53. Takamatsu H, Kumanogoh A. Diverse roles for semaphorin-plexin signaling

in the immune system. Trends in immunology 2012;33:127-35.

54. Zhang W, Hui KY, Gusev A, et al. Extended haplotype association study in

Crohn's disease identifies a novel, Ashkenazi Jewish-specific missense

mutation in the NF-kappaB pathway gene, HEATR3. Genes and immunity

2013;14:310-6.

55. Ribeiro-Samora GA, Rabelo LA, Ferreira ACC, et al. Inflammation and

oxidative stress in heart failure: effects of exercise intensity and duration.

Brazilian journal of medical and biological research = Revista brasileira de

pesquisas medicas e biologicas 2017;50:e6393.

56. Ochala J, Ravenscroft G, Laing NG, Nowak KJ. Nemaline myopathy-related

skeletal muscle alpha-actin (ACTA1) mutation, Asp286Gly, prevents proper

strong myosin binding and triggers muscle weakness. PloS one

2012;7:e45923.

57. Tang J, He A, Li N, et al. Magnesium Sulfate-Mediated Vascular Relaxation

and Calcium Channel Activity in Placental Vessels Different From

Nonplacental Vessels. Journal of the American Heart Association 2018;7.

58. de las Fuentes L, Yang W, Davila-Roman VG, Gu C. Pathway-based genome-

wide association analysis of coronary heart disease identifies biologically

97 important gene sets. European journal of human genetics : EJHG

2012;20:1168-73.

59. Wells QS, Veatch OJ, Fessel JP, et al. Genome-wide association and pathway

analysis of left ventricular function after anthracycline exposure in adults.

Pharmacogenetics and genomics 2017;27:247-54.

60. Adams RH, Eichmann A. Axon guidance molecules in vascular patterning.

Cold Spring Harbor perspectives in biology 2010;2:a001875.

61. Mettouchi A. The role of extracellular matrix in vascular branching

morphogenesis. Cell adhesion & migration 2012;6:528-34.

62. Jourdan-Lesaux C, Zhang J, Lindsey ML. Extracellular matrix roles during

cardiac repair. Life sciences 2010;87:391-400.

63. Bayomy AF, Bauer M, Qiu Y, Liao R. Regeneration in heart disease-Is ECM

the key? Life sciences 2012;91:823-7.

64. Brooks PC. Cell adhesion molecules in angiogenesis. Cancer and Metastasis

Reviews 1996;15:187-94.

65. Miller MA, Cappuccio FP. Cellular adhesion molecules and their relationship

with measures of obesity and metabolic syndrome in a multiethnic population.

International journal of obesity (2005) 2006;30:1176-82.

66. Shai I, Pischon T, Hu FB, Ascherio A, Rifai N, Rimm EB. Soluble

intercellular adhesion molecules, soluble vascular cell adhesion molecules,

and risk of coronary heart disease. Obesity (Silver Spring, Md) 2006;14:2099-

106.

67. Bongo JB, Peng DQ. The neuroimmune guidance cue netrin-1: a new

therapeutic target in cardiovascular disease. Journal of cardiology 2014;63:95-

8.

98 68. Touyz RM, Alves-Lopes R, Rios FJ, et al. Vascular smooth muscle

contraction in hypertension. Cardiovascular research 2018;114:529-39.

69. Brozovich FV, Nicholson CJ, Degen CV, Gao YZ, Aggarwal M, Morgan KG.

Mechanisms of Vascular Smooth Muscle Contraction and the Basis for

Pharmacologic Treatment of Smooth Muscle Disorders. Pharmacological

reviews 2016;68:476-532.

70. Lampugnani MG, Dejana E. Adherens junctions in endothelial cells regulate

vessel maintenance and angiogenesis. Thrombosis research 2007;120 Suppl

2:S1-6.

71. Sheikh F, Ross RS, Chen J. Cell-cell connection to cardiac disease. Trends in

cardiovascular medicine 2009;19:182-90.

72. Rosenberg RD, Shworak NW, Liu J, Schwartz JJ, Zhang L. Heparan sulfate

proteoglycans of the cardiovascular system. Specific structures emerge but

how is synthesis regulated? The Journal of clinical investigation

1997;100:S67-75.

73. Chiodelli P, Mitola S, Ravelli C, Oreste P, Rusnati M, Presta M. Heparan

Sulfate Proteoglycans Mediate the Angiogenic Activity of the Vascular

Endothelial Growth Factor Receptor-2 Agonist Gremlin. Arteriosclerosis,

thrombosis, and vascular biology 2011;31:e116-e27.

74. Kaznowska-Bystryk I. The heparan sulfate and its diverse biological activities.

Current Issues in Pharmacy and Medical Sciences 2015;27:209-12.

75. Huynh MB, Morin C, Carpentier G, et al. Age-related changes in rat

myocardium involve altered capacities of glycosaminoglycans to potentiate

growth factor functions and heparan sulfate-altered sulfation. The Journal of

biological chemistry 2012;287:11363-73.

99 76. Jiang BH, Zheng JZ, Aoki M, Vogt PK. Phosphatidylinositol 3-kinase

signaling mediates angiogenesis and expression of vascular endothelial growth

factor in endothelial cells. Proceedings of the National Academy of Sciences

of the United States of America 2000;97:1749-53.

77. Ghigo A, Laffargue M, Li M, Hirsch E. PI3K and Calcium Signaling in

Cardiovascular Disease. Circulation research 2017;121:282-92.

78. Aoyagi T, Matsui T. Phosphoinositide-3 kinase signaling in cardiac

hypertrophy and heart failure. Current pharmaceutical design 2011;17:1818-

24.

79. Gaengel K, Betsholtz C. Endocytosis regulates VEGF signalling during

angiogenesis. Nature cell biology 2013;15:233-5.

80. Madamanchi A. Beta-adrenergic receptor signaling in cardiac function and

heart failure. McGill journal of medicine : MJM : an international forum for

the advancement of medical sciences by students 2007;10:99-104.

81. Naga Prasad SV, Jayatilleke A, Madamanchi A, Rockman HA. Protein kinase

activity of phosphoinositide 3-kinase regulates beta-adrenergic receptor

endocytosis. Nature cell biology 2005;7:785-96.

82. Curran J, Makara MA, Mohler PJ. Endosome-based protein trafficking and

Ca(2+) homeostasis in the heart. Frontiers in physiology 2015;6:34.

83. Makinen VP, Civelek M, Meng Q, et al. Integrative genomics reveals novel

molecular pathways and gene networks for coronary artery disease. PLoS

genetics 2014;10:e1004502.

84. Wang L, Perez J, Heard-Costa N, et al. Integrating genetic, transcriptional, and

biological information provides insights into obesity. International journal of

obesity (2005) 2018.

100 85. Roark CL, Anderson KM, Simon LJ, Schuyler RP, Aubrey MT, Freed BM.

Multiple HLA epitopes contribute to type 1 diabetes susceptibility. Diabetes

2014;63:323-31.

86. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for

integrated models of biomolecular interaction networks. Genome research

2003;13:2498-504.

87. Smith NL, Felix JF, Morrison AC, et al. Association of genome-wide variation

with the risk of incident heart failure in adults of European and African

ancestry: a prospective meta-analysis from the cohorts for heart and aging

research in genomic epidemiology (CHARGE) consortium. Circulation

Cardiovascular genetics 2010;3:256-66.

88. Morrison AC, Felix JF, Cupples LA, et al. Genomic variation associated with

mortality among adults of European and African ancestry with heart failure:

the cohorts for heart and aging research in genomic epidemiology consortium.

Circulation Cardiovascular genetics 2010;3:248-55.

101

Chapter 4:

Pathways and Gene Networks for Obesity and Potential Interactions with

Dietary Lipid in Women across Ethnicities

102 4.1 Introduction

Globally, the prevalence of obesity (body mass index (BMI) ≥ 30) among adults has increased to approximately 600 million individuals according to WHO estimates in 2014. Obesity is a genetically and phenotypically complex condition causing enormous morbidity. Multiple lines of evidence indicate that a large proportion of obesity risk is mediated by genetic factors, with studies estimating that

30-70% of human variation of BMI is due to genetic risk factors 1,2. Over the past several years, genome-wide association studies (GWAS) have contributed substantially to our understanding of the genetic underpinnings of obesity. For example, over 30 novel genes (e.g. FTO, TMEM18, and MC4R) have been identified by powerful GWAS to be associated with obesity and related phenotypes, and also replicated in different populations3.

However, most GWAS performed to date have used a stringent genome-wide level of statistical significance, P value < 5´10-8, as threshold to identify single nucleotide polymorphisms (SNPs) and/or copy number variants that are strongly associated with obesity. Considering the challenge of multiple testing and the tradeoff between type I and type II error, GWAS has been questioned by high risk of false negative rates and small effect sizes. In addition, it has been well known that multiple genes and their products work together, interacting in functional pathways, to contribute to disease susceptibility, although any single gene only has moderate effect. Therefore, GWAS lacks the ability to dissect the complex genetic architecture of obesity and current findings appear to explain only a small proportion of obesity risk, leaving much unknown to be explored concerning the genetic architecture of obesity.

Besides a large number of common variants with small effect size unidentified by GWAS, the interactions between genetic variants and environmental factors also

103 contribute the so-called “missing” heritability 4. Dietary lipids have long been speculated to be associated with obesity and other cardio-metabolic disorders based on animal studies and epidemiologic studies 5,6. Not only the quantity of the dietary lipids, the quality – measured as Lipophilic Index (LI), summarizing individual fatty acid levels and their melting points, also contributed to the risk of cardio-metabolic diseases 7. While observational studies have been done to examine potential gene- dietary fat interactions on obesity 8,9, it is still unclear if interactions between genes and the quality of dietary lipid also exist.

4.2 Objectives

Although a large number of studies have successfully identified genetic variations for obesity, our understanding of the genetic architecture underlying obesity is still limited considering a large number of genetic loci with small effect sizes are not yet identified by GWAS. In addition, potential gene-dietary lipid interactions on obesity have not been well explored. Therefore, we performed this

GWAS analysis for obesity followed by integrative pathway and gene-diet interaction analyses in the Women’s Health Initiative SNP Health Association Resource (WHI-

SHARe) and the Genomics and Randomized Trials Network (WHI-GARNET) studies, and replicated our findings in the Jackson Heart Study (JHS) and

Framingham Heart Study (FHS). These cohorts provide unique opportunities to understand the biological bases of obesity susceptibility and encourage innovative prevention strategies across multiple ethnicities.

104 4.3 Methods

4.3.1 Study Population

The Women’s Health Initiative Study

The Women’s Health Initiative (WHI) study enrolled 161,808 postmenopausal women aged between 50 and 79 years old at 40 clinical centers from 1993 to 1998, among which 93,676 women were in an observational study (OS) and 68,132 women were included in a partial factorial randomized clinical trial (CT). A detailed description of the WHI study design has been published elsewhere 10. Among all WHI participants, a subset of 8,515 self-identified African Americans (AA) and 4,909

Caucasian Americans (CA) provided consents for genotyping and produced as the

WHI- SNP Health Association Resource (SHARe) and WHI- Genomics and

Randomized Trials Network (GARNET) studies. After quality control and excluding those participants without baseline BMI and dietary measurements, our GWAS analysis was conducted among 7,678 AA and 4,079 CA participants of the WHI.

Cohorts for replication and validation

We have two population cohorts for our findings’ replication: the Jackson

Heart Study as replication of WHI-AA, and the Framingham Heart Study as replication of WHI-CA. Considering the primary analysis was performed among postmenopausal women, only female participants from the JHS and FHS were included for replication. The main FHS enrolled three generations: the original generation (started in 1948), offspring generation (started in 1971), and generation three (started in 2005). Because of the very late measurement of diet among the original generation cohort (at Visit 20), we only included the offspring generation and generation three in the analysis. In total, we conducted the proposed analyses among

1,714 JHS and 3,188 FHS female participants.

105

4.3.2 Genetic Data

Genome-wide genotyping of the WHI-SHARe and JHS participants were performed using the Affymetrix 6.0 array (Affymetrix, Inc, Santa Clara, CA), and

WHI-GARNET and FHS participants were genotyped using Illumina HumanOmni1-

Quad SNP platform (Illumina, Inc, San Diego, CA). As the gene chips used for genotyping are designed to capture common genetic variants, genetic variants with frequency ≥ 0.05 were genotyped. Reference panels from the 1000 Genomes (1000G)

Project Consortium (Version 3, March 2012 release), which provide nearly complete coverage of common and low-frequency genetic variation with minor allele frequency

≥ 0.5%, were used for genotype imputation.

4.3.3 Measurement of Dietary Lipid

In the WHI, diets were assessed at baseline by validated food frequency questionnaires (FFQ) 11, from which dietary fatty acids, carbohydrate, protein, alcohol, and total energy intake were estimated. When examining the gene-dietary lipid interactions, we employed a novel index - LI, to summarize the quality of dietary fatty acids. LI was calculated as a summation of the products of the levels of fatty acids and their specific melting points (℃) using the following equation:

∑ [.%$$' %/"0 ( 3) × 7#8$"93 :;"9$ (℃) ] !"#$%&' )* = 2 5 5 ∑2 .%$$' %/"0 ( 3)5 where i indicates individual fatty acid, k indicates number of fatty acids used to calculate LI. Information about melting point and molecular weight was acquired from the LipidBank Database 12. Details regarding the calculation of dietary LI can be found in Chapter 2 7.

106 4.3.4 Measurement of Outcome

Among WHI participants, BMI was calculated as the weight (kg) divided by the square of measured height (m2), and was measured at each visit with less missing measurements at baseline and year 3. Because of the well-established temporality between genetic risk and BMI measurements during study follow-up, participants with baseline or year 3 BMI ≥ 30 kg/m2 were defined as obesity. In the gene-diet interaction analysis and mediation analysis, BMI measured at year 3 was used as the outcome. Among JHS and FHS participants, BMI was measured at each follow-up visit. Therefore, similar criterion for obesity was applied to the replication cohorts.

4.3.5 Statistical Analysis

Genome-wide association analyses

We performed standard GWAS analysis for obesity among each ethnic population using multivariable logistic regressions. The regression models were implemented using allelic dosage at each SNP as the independent variable, with covariate adjustment for: 1) age, age2, region, and first four principal components

(PCs) for global ancestry in the WHI-SHARe; 2) age, age2, region, randomized hormone treatment group, baseline hysterectomy status, and first four PCs for global ancestry in the WHI-GARNET; 3) age, age2, and first four PCs for global ancestry in the JHS and FHS. Since the associations between germline genetic variants and obesity are not confounded by demographic and lifestyle factors, no other confounders were adjusted in the GWAS analysis. The general form of the GWAS model is specified as follows:

8;3"$ ?&[@|B, D ] = EF + EHB + EID,

107 where Y denotes obese status, G denotes SNPs, and V denotes adjusted covariates.

Common genetic variants reaching the suggestive significance (5 × 10MN) were identified as potential GWAS hits and were further validated in the JHS and FHS.

Pathway analysis

Functionally related genes involved in metabolic and signaling pathways were obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, and BioCarta databases. Each pathway was tested for enrichment of genetic signals for obesity among each ethnic group. To avoid the bias due to the use of a particular method, two well-established methods based on known biological pathways were used to investigate whether functionally related genetic signals are enriched for the risk of obesity: 1) GSA-SNP 13, and 2) Mergeomics 14. Biological pathways were defined as significant if they meet the following criteria: 1) identified by both methods from the WHI study with FDR adjusted q value < 0.2; and 2) validated by

GSA-SNP or Mergeomics with significant P value after Bonferroni correction in JHS

(as replication of WHI-AA) and FHS (as replication of WHI-CA). When examining pathways shared between two ethnicities, we performed cross-ethnicity validation analysis using nominal P value at 0.05 significance level.

Key Driver Analysis for Identification of Key Regulatory Genes for Obesity-related

Pathways

As hundreds of genes are involved in the biological pathways, we seek to further prioritize key driver (KD) genes. The KD genes represent central network genes which can potentially affect a large number of genes involved in the obesity pathway and thus exert stronger impact than other genes. We integrated all genes involved in significant pathways with seven Bayesian networks and one protein-

108 protein interaction network, by using KD analysis methods14-16. As multiple KD genes were identified from multiple networks, we designed a normalized rank score (NRS) to summarize the consistency and strength of identified KD genes across multiple networks 17, where OPQ = RST × ∑RST P ; Z is the count of networks from U 5XY VW5 VW which a KD was identified; ZVW is then normalized by total number of networks N to represent the consistency of a KD across all networks tested (seven Bayesian networks from seven tissues, including adipose, blood, brain, islet, liver, kidney, and muscle, and one protein-protein interaction); The KD strength is represented by the summation of normalized statistical rank in each network " (PVW5) across all networks

[\]2ST^ from which the KD is identified; PVW5 = , which was calculated by dividing UST^ the rank of a KD based on the P values of the Fisher exact test in descending order

(P%9_VW5) by the total number of KDs identified from a network " (OVW5). KDs with high NRS were those with high network enrichment for pathways and high consistency across tested networks.

Gene-Diet Interaction Analysis

We further tested the interactions between identified KD genes at SNP level and dietary LI on BMI using linear regression models. The general form of the mode is specified as follow:

` [@|B, `, D] = aF + aHB + ab` + aHbB × ` + aID, where Y denotes BMI measured at year 3; G denotes selected SNPs in the form of additive genetic model; E denotes dietary LI calculated at baseline and treated as continuous variable; G´E denotes the interaction between each SNP and dietary LI; V denotes covariates, including age, age2, region, and first 4 PCs for population stratification, income, education, baseline cardiovascular disease (CVD)/diabetes,

109 physical activity, smoking behavior, hormone therapy, hysterectomy status, and dietary factors (alcohol consumption, percent calories from carbohydrate and protein, and total energy intake). Linear regression model additionally adjusted for randomized hormone treatment group for Caucasian Americans in the WHI-

GARNET cohort. Selected SNPs were those within the top 10 KD gene region, minor allele frequency (MAF) ³ 0.01, imputation quality score Rsq > 0.3, passing LD filter

(r2 < 0.8), and showed potential marginal effect on obesity in the GWAS analysis (P value < 0.005)

Mediation analysis

As BMI may serve as a mediator that links genetic predisposition to more severe cardio-metabolic health outcomes, such as type 2 diabetes (T2D), coronary heart disease (CHD), and heart failure (HF), we additionally performed a mediation analysis to explore the mediated effect through BMI. Participants with myocardial infraction, stroke, coronary revascularization, diabetes, and HF at baseline were additionally excluded. Genetic predisposition of obesity was defined as genetic risk score (GRS) 18. The GRS for each selected gene was the sum of minor alleles weighted by the signs of their estimated EH from the GWAS analysis, and standardized to have mean = 0 and standard deviation = 1. Genes involved in biological pathways that were shared between obesity and other cardio-metabolic health outcomes were selected. Participants were further classified as high or low

GRS according to the population mean GRS as cut off. We used the following two models, a logistic regression for the outcome Y and a linear regression for the mediator M (BMI) under the assumption of no exposure-mediator interaction:

8;3"$ c?(@ = 1|3, 7, ) )d = qF + qY3 + qe7 + qf) (1)

( | ) ` 7 3, ) = gF + gY3 + gf) (2)

110 Y indicates the dichotomous outcomes: T2D, CHD, and HF; G denotes dichotomized genetic risk score; M denotes BMI measured at year 3; and L denotes potential confounders measured at baseline (age, age2, region, first four PCs, physical activity, education, income, smoking, hormone usage, hysterectomy status, alcohol consumption, percent calories from protein and carbohydrates, and total energy intake).

If the regression (1) and (2) are correctly specified and the following four identification assumptions: i) no unmeasured exposure-mediator-outcome confounders; ii) no mediator-outcome confounder that is affected by the exposure; iii) no exposure-mediator interaction; and iv) rare outcome assumption hold, the natural direct effect (NDE) and natural indirect effect (NIE) of high GRS can be estimated

UWh Ukh 19 by: gP ≈ #j: (qY), gP ≈ #j: (qegY) . The proportion of mediation (PM), which is the proportion of the effect of the exposure mediated by BMI, can be estimated based on following two equations:

?7 (;9 $ℎ# 8;3 ;00m m/%8#) = 8;3(gPUkh)/[8;3(gPUkh) + 8;3(gPUWh)],

?7 (;9 $ℎ# &"m_ 0"oo# /# m/%8#) = gPUWh × (gPUkh − 1) / (gPUWh ×

gPUkh − 1).

4.4 Results

The descriptive statistics on demographic and lifestyle factors of each study population are shown in Table 4.1. Among 7,678 AA and 4,079 CA women, 4,407

(57%) and 2,028 (50%) were obese, respectively. Compared to WHI-CA, AA women were younger in age and less physically active, had higher BMI and lower consumption of alcohol and total calories. Across two ethnic groups, women with obesity were younger in age, with a lower proportion of current smoking and higher proportions of CVD and T2D. Obese women were also less physically active and had

111 higher total calories, compared to non-obese women. Baseline characteristics for participants in the JHS and FHS can be found in Supplemental Table 4.1.

4.4.1 Identification of Significant Genetic Loci Using Standard GWAS Analysis

The results of the standard GWAS analysis for obesity among WHI-AA and

WHI-CA women are shown in the Manhattan plots (Supplemental Figure 4.1). The discovery analysis revealed 78 and 273 suggestive (P < 5´10-6) SNPs for obesity among AA and CA, respectively. One genomic locus (rs2163585, an intergenic variant at 2q22, close to LRP1B gene) reached genome-wide significance (P < 5´10-8) in the GWAS analysis among WHI-CA women, and was also validated among FHS-

CA women. Other top 3 SNPs for AA and CA were summarized in Table 4.2. In addition, among 200 previously established variants with imputed values in our study population, 24 variants from 22 loci for WHI-AA, and 29 variants from 23 loci for

WHI-CA were validated in our study (Table 4.3).

4.4.2 Identification of Biological Pathways Using Integrative Pathway Analysis

We have in total tested 1077 pathways from KEGG, Reactome, and BioCarta databases, among which 69 pathways were found to be associated with obesity among

WHI-AA and 26 pathways were found among WHI-CA women. After replication analysis among JHS and FHS women, eight unique pathways were found to be associated with obesity among AA (n=6) or CA (n=4) women (Supplemental Table

4.2). In the cross-ethnicity analysis, 6 out of the 8 pathways were shared between AA and CA, namely, axon guidance, hypertrophic cardiomyopathy (HCM), arrhythmogenic right ventricular cardiomyopathy (ARVC), dilated cardiomyopathy, mitogen-activated protein kinase (MAPK) signaling, and neural cell adhesion molecule (NCAM) signaling for neurite out-growth (Table 4.4). The six pathways

112 involved 559 genes, of which four genes (BDNF, ADCY9, MAP2K5, and ADCY3) have been implicated previously in obesity 20. The six pathways were also found to be interconnected by sharing 150 genes (Figure 4.1).

4.4.3 Identification of Key Drivers for Obesity

To identify potential KD genes among the significant pathways for obesity shared across ethnicities, we integrated the pathway genes with eight different regulatory or interaction networks that capture gene-gene or protein-protein interactions. The top 10 KD genes for the six biological pathways were COL1A1,

COL1A2, COL2A1, COL3A1, COL4A2, COL5A1, COL6A2, COL6A3, JUN, and

TPM3 (Figure 4.1 and Supplemental Figure 4.2).

4.4.4 Gene-dietary Lipid Interactions on Obesity

Variants within four KD genes suggested potential interactions with dietary

LI. Results with respect to the marginal effect of genes and dietary LI, and their joint effect were summarized in Table 4.5. For WHI-AA, three LD blocks from COL1A1,

COL5A1, and COL6A3 interacted with dietary LI on BMI, and for WHI-CA, one LD block from COL4A2 showed significant interaction with dietary LI. To illustrate the interpretation of the interaction, we take an example of variant from COL6A3 gene.

Among WHI-AA, compared with individuals with AA genotype and having the average value of dietary LI (27.1), individuals carrying one more copy of G at rs149820492 had a higher BMI of 0.57; individuals in the highest quartile of dietary

LI (29.8) had a higher BMI of 0.06, although not significant; and individuals carrying one copy of G while having high dietary LI had a higher BMI of 1.18. The result indicated the existence of positive additive interaction, meaning that there was a significant synergistic effect between polymorphisms at rs149820492 and dietary LI

113 so that the effect of having both high genetic risk and dietary LI on BMI may be more substantial than the sum of the effect from having high genetic risk alone and the effect of having high dietary LI alone, compared to those who have both low genetic risk and low dietary LI.

4.4.5 Genetic Predisposition on Cardio-metabolic Disorders Mediated Through

Obesity

Considering axon guidance pathway was enriched for obesity, HF (Chapter

3), T2D and CVD 17, we calculated GRS based on 251 genes that involved in the axon guidance pathway. BMI significantly mediated the relationship between GRS and

T2D, CHD, and HF, with the exception of CHD among CA. No significant direct effect was observed. BMI accounted for over 85% of the total effect of high GRS on

T2D among CA, and over 90% on HF among AA (Supplemental Table 4.3).

4.5 Discussion

In this genome-wide assessment of 7,678 AA, 4,079 CA from WHI study, and

1,714 AA from the JHS and 3,188 CA from the FHS, we identified one genetic variant close to LRP1B gene and six pathways to be consistently and significantly associated with obesity across African and Caucasian Americans. We further examined potential novel regulators of these pathways to uncover mechanisms underlying obesity that are shared across ethnicities. Our results suggested the presence of core mechanisms across ethnicities as well as ethnicity-specific pathways of obesity.

Our standard GWAS analysis of two ethnic populations with corresponding validation cohorts identified one biologically plausible disease locus (rs2163585, close to

114 LRP1B) for obesity among CA. In the validation of previously reported obesity loci, variants from 6 genes (SEC16B, ADCY3, SPAG16, TFAP2B, CALCR, and FTO) met significant level in both ethnic groups. Ethnicity-specific loci might also exist. For example, the LD block around rs939583 (40 Kb upstream to TMEM18 gene) was associated with obesity among AA, and the LD block around rs6265 (within BNDF gene) was found among CA.

The LRP1B locus was found to be associated with obesity among CA in our study and it was also previously identified with BMI and eating behaviors among individuals of European ancestry 21,22. The LRP1B gene is a member of the LDL receptor gene family, which plays important roles in lipid metabolism and calcium ion binding, and is associated with other cardiometabolic-related phenotypes, including insulin/ leptin/ TNF-alpha levels, and diabetes and CVD 23,24. From Animal studies, the LRP1B is highly expressed in the vascular wall and has been found to regulate the migration of smooth muscle cells through the modulation of platelet derived growth factor b-receptor-mediated pathway, which leads to atherosclerotic lesion progression

24-26. Although we failed to identify significant variants from AA population, we were able to replicate the previously reported genetic variants in the TMEM18 gene that may influence the risk of obesity 27. TMEM18 was first proposed as an important obesity-related locus by the GIANT consortium 28 and has been validated in multiple populations 29-31. TMEM18 gene codes a small poorly characterized transmembrane protein of 140 amino acids, which is located in the nuclear envelop in neural stem cells 32. Although the function of TMEM18 is yet to be elucidated, it was shown to be expressed in the brain, including hypothalamus, the region responsible for the control of energy homeostasis, which in line with the proposed role of obesity 31.

Our findings also contributed to the current evidence supporting the involvement of multiple regulatory gene networks in the pathogenesis of complex

115 cardio-metabolic disorders, although individual genes may exert subtle effects 17,33-35.

Our integrative pathway-based analysis for obesity identified 6 consistent pathways among AA and CA. Interestingly, four pathways, specifically, axon guidance, HCM,

ARVC, and dilated cardiomyopathy, were found to be enriched for both CVD and

T2D across different ethnicities in the same WHI population 17, therefore showing potential shared biological mechanisms underlying these complex cardio-metabolic health outcomes across ethnicities.

HCM, ARVC, and dilated cardiomyopathy pathways are closely interconnected, and the common mechanisms among them seem to be intracellular calcium homeostasis (regulated by CACN gene family 36,37, RYR2 38 and SLC8A1 39) and cell-matrix interactions via calcium signaling (ITG gene family 40 and DMD 41).

RYR2 and SLC8A1 are also involved in insulin activity 38,42. Intracellular calcium is fundamental in regulating adipocyte lipid metabolism and energy balance, which are processes related to obesity 43. Intracellular calcium concentration in cardiac myocytes and b-cells is also related to the contraction and relaxation of the heart 44, and insulin secretion and action 45, thus linking obesity with diabetes and CVD. HCM pathway also includes genes that regulate inflammation, fatty acid and cholesterol synthesis, insulin signaling and resistance, and glycolysis (PRKAA2, PRKAG3, TNF, and IL6, etc.46-48) thus leading to obesity. Genes within the axon guidance pathway have been connected to obesity and T2D. Particularly, the circulating SLIT2 protein was found to activate a thermogenic, PKA-dependent pathway in adipocytes and improve glucose homeostasis in diet-induced obese mice49. In addition, the SLIT-

ROBO family play a critical role in pancreatic b-cell survival and potentiate glucose- stimulated insulin secretion and increased the frequency of glucose-induced Ca2+ oscillations 45. Genes in the axon guidance pathway were also shared with vasoactive intestinal peptide (VIP) pathway (PPP3CC, PPP3CA, and PPP3CB), which was

116 linked to obesity via multiple mechanisms, including mediating feeding behavior, influencing energy metabolism, and stimulating insulin and glucagon secretion 34.

Therefore, these pathways appear to link obesity to more severe cardio-metabolic health conditions via multiple mechanisms.

Two pathways, MAPK and NCAM signaling pathways, were not reported previously. The MAPK pathway has long been proved to play roles in the regulation of energy metabolism 50 and adipocyte differentiation 51 thus leading to obesity and insulin resistance. The NCAM signaling is part of axon guidance, specifically pointed out the function of NCAMs on obesity. Although the NCAM is a member of the immunoglobulin superfamily and mainly involves in the formation and maintenance of the nervous system, it also induces activation of intracellular signaling cascades through the activation of a number of signaling molecules, including fibroblast growth factor receptor (FGFR), MAPK pathway, intracellular calcium, and extracellular matrix molecules (mainly Collagen) 52-54, and further has been shown to promote insulin signaling and adipocyte differentiation 55.

In addition to shared pathways across ethnicities, we also identified some obesity pathways for AA only (Table 4.4): tight junction (TJ) and G alpha (z) signaling events. TJ, the principal determinant of epithelial paracellular permeability, is closely interconnected with HCM, ARVC, and dilated cardiomyopathy pathways, and it helps to regulate epithelial barrier properties mainly organized by claudin family of transmembrane proteins 56. Animal studies have shown that high-fat diet- induced obesity can cause profound TJ-restructuring by claudin switching 57, which can increase intestinal permeability and alter intestinal immunity thus leading to insulin resistance and metabolic disease 58. The TJ pathway also includes genes that regulate magnesium homeostasis, such as CLDN16 59. G(z) is a member of the G(i) family of trimeric G proteins whose primary role in cell physiology is still unknown.

117 However, it has been found to interact with the Rap1 GTPase activating protein to attenuate Rap1 signaling 60, which causes dysregulation in adipose function thus leading to glucose intolerance, insulin resistance, and excess fat accumulation 61. In addition, regulator of G-protein signaling (RGS) proteins can also interact with G proteins likely in controlling fatty acid and glucose homeostasis 62,63. Detailed knowledge of these relationships at the molecular level will allow researcher to understand the ethnic disparity of obesity and enable to the development of more effective interventions to prevent obesity.

We did not identify any biological pathway that involves the LRP1B gene, which we found to be significantly associated with obesity among CA women. This is because the LRP1B gene is not included in the knowledge-driven pathway databases that we used in the primary analysis – KEGG, Reactome, and BioCarta. However,

LRP1B gene does play pivotal roles in biological processes, such as calcium ion binding, as noted in the Gene Oncology pathway database. Therefore, we additionally tested the enrichment of calcium ion binding pathway in obesity development, and we did find a significant enrichment among all discovery and replication populations (P value = 4´10-9, 3´10-8, 4´10-5, and 0.01 for WHI-AA, JHS female, WHI-CA, and

FHS female, respectively).

Because of hundreds of genes are involved in the biological pathways we identified for obesity, prioritizing KD genes is important in terms of interpretation and providing evidence for future treatment targets. The KD genes, once perturbed, should have major impact on the pathways, activating or inhibiting downstream genes, and hence trigger the eventual cardio-metabolic health outcomes – obesity,

CVD, and T2D. The KD genes for obesity were Collagen gene family (COL1A1,

COL1A2, COL2A1, COL3A1, COL4A2, COL5A1, COL6A2, and COL6A3), shared between axon guidance and NCAM signaling pathways; TPM3, a gene shared

118 between HCM and dilated cardiomyopathy pathways; and JUN, belongs to the

MAPK pathway. The Collagen gene family, besides involving in the NCAM pathway as previously described, encode proteins to maintain the structure and function of adipocyte 64 and cardiomyocytes 65. TPM3 encodes a member of the tropomyosin family of actin-binding proteins. Mutations in this gene have been found to result in autosomal dominant myopathy characterized by generalized muscle weakness due to reduced calcium sensitivity 66. However, the mechanism of TPM3 on obesity is still unclear.

Interestingly, the KD genes were not identified in standard GWAS analysis, suggesting that important regulatory genes may not harbor common susceptibility polymorphisms because of evolutionary constraints 67. Our results also supported that obesity is a complex polygenic condition and current identified GWAS hits only server as subtle modifiers of disease predisposition. When comparing obesity KD genes with other cardio-metabolic disorders, we found that the Collagen gene family are also drivers for CVD, T2D 17, and HF (Chapter 3), therefore supporting the shared mechanisms between these related diseases as well as the pleiotropic effects of these KD genes.

In the gene-dietary LI interaction analysis on BMI, significant results were identified for Collagen genes, specifically COL1A1, COL4A2, COL5A1, and

COL6A3. These genes encode subunits of Collagen I, IV, V, and VI, which are important extracellular matrix proteins in adipose tissue, especially Collagen VI 68.

Adipose tissue has been extensively studied for its role in the etiology of obesity and corresponding metabolic disorders 69. Studies have found that Collagen genes, especially COL6A3, were associated with obesity and other metabolic disorders by reducing adipose tissue oxygenation, increasing inflammation, and altering insulin signaling and lipid metabolism 69,70. An animal study also found potential interactions

119 between COL6A genes and high-fat diet on weight gain 70. Collagen IV, a basement membrane protein, is most abundant in blood vessels and COL4A2 has been linked to

CHD by affecting smooth muscle cell survival and atherosclerotic plaque stability 71, however, we still lack evidence to show potential mechanisms linking Collagen IV to obesity. As Collagens also play important roles in glucose metabolism and insulin resistance 72, the interactions we identified exhibited biological plausibility of the synergistic effect between Collagen genes and dietary lipid on obesity and other cardio-metabolic disorders.

Although it has long been established that T2D, CHD, and HF share multiple biological mechanisms, such as vascular dysfunction, inflammation, and metabolism, the proportion of each mechanism that contributes to disease progression might differ

73. In the attempt to assess the mediated effect of obesity on more severe cardio- metabolic disorders, we observed significant mediated effects through BMI on T2D,

CHD, and HF. Compared with CHD, the mediated effect through BMI was stronger for T2D and HF (ORs > 1.20), thus suggesting a stronger obesity-predisposing effect of genes involved in axon guidance pathway on T2D and HF.

There are several strengths of this study. First, we used the WHI-SHARe and

GARNET studies as the discovery samples and replicated our results among JHS and

FHS female participants, which strongly showed the robustness of our findings.

Second, our study employed large and multiethnic populations allowing the detection of shared and ethnicity-specific mechanisms of obesity. In addition, using a systemic biology framework that integrates GWAS, pathway, and gene networks, we were able to shed more light on obesity-related mechanisms and identify potential targets for intervention and therapy. The examination of gene-diet interaction and the mediation analysis linking obesity to other cardio-metabolic disorders additionally contribute to our understanding of lipid metabolism on obesity and obesity-induced

120 pathophysiological change on T2D, CHD, and HF. The most important limitation is that we only used knowledge-driven biological pathways from existing pathway databases to identify obesity-related signaling. A more sophisticated method is to employ data-driven networks, which represents a more unbiased source to uncover previously unknown functional pathways. However, considering the larger number of data-driven networks which may sacrifice study power and difficulty in dealing with network inconsistencies across study populations, using the knowledge-driven pathways can still provide straightforward explanations with well-defined functionally related genes. Findings from this paper merit further validation in populational studies with gene expressions and biomarkers of potential biological pathways.

4.6 Conclusion

From this integrative genetic pathway analysis of obesity, we identified several significant biological pathways of obesity shared across ethnicities as well as ethnicity-specific pathways. In addition, we found significant interactions between dietary lipid and key driver genes of obesity, specifically the Collagen gene family, on

BMI. The results are consistent with our prior expectations that shared genetic and molecular mechanisms exist underlying the complex cardio-metabolic health outcomes – obesity, type 2 diabetes, coronary heart disease, and heart failure, and gene-dietary lipid interactions may also play central roles in the disease process.

121 4.7 Tables and Figures

Table 4.1 Baseline Characteristics of Participants in the Women’s Health Initiative Study

WHI-SHARe African American WHI-GARNET Caucasian American Overall Obese Non-obese Overall Obese Non-obese

(N=7,678) (N=4,407) (N=3,271) (N=4,079) (N=2,028) (N=2,051) Age, year a 61.6 (7.0) 61.1 (6.8) 62.2 (7.3) 65.6 (6.9) 64.6 (6.8) 66.6 (6.9) Region, n(%) Northeast 1371 (18) 760 (17) 611 (19) 1018 (25) 485 (24) 533 (26) South 3716 (48) 2175 (49) 1541 (47) 861 (21) 421 (21) 440 (21) Midwest 1797 (23) 1066 (24) 731 (22) 1176 (29) 638 (31) 538 (26) West 794 (10) 406 (9) 388 (12) 1024 (25) 484 (24) 540 (26) Current smoking, n (%) 864 (11) 402 (9) 462 (14) 443 (11) 175 (9) 268 (13) Cardiovascular disease, n (%)b 1420 (18) 875 (20) 545 (17) 684 (17) 363 (18) 321 (16) Type 2 diabetes, n (%) 1064 (14) 737 (17) 327 (10) 288 (7) 198 (10) 90 (4) BMI, kg/m2 a 31.1 (6.6) 35.2 (5.6) 25.6 (2.7) 29.7 (6.1) 34.4 (4.9) 25.1 (2.8) Physical activity, MET-h/wk a 9.7 (12.7) 8.4 (11.9) 11.4 (13.6) 10.1 (12.6) 7.8 (10.9) 12.4 (13.6) Alcohol drinking, servings/wk a 1.1 (3.8) 0.9 (3.7) 1.4 (4.0) 2.3 (5.2) 1.5 (4.2) 3.0 (5.9) % calories from carb a 49.8 (9.5) 48.7 (9.2) 51.1 (9.7) 48.8 (9.1) 47.9 (8.8) 49.8 (9.3) % calories from protein a 16.0 (3.5) 16.2 (3.5) 15.6 (3.4) 16.9 (3.3) 17.1 (3.3) 16.7 (3.2) Dietary LI 1 27.1 (2.5) 27.3 (2.5) 26.9 (2.7) 28.1 (2.5) 28.3 (2.5) 28.0 (2.5) Total energy intake, kcal/day a 1618.6 (763.1) 1695.9 (811.9) 1514.4 (678.4) 1687.9 (666.8) 1771.6 (709.9) 1605.2 (610.2)

Abbreviations: BMI (body mass index), Carb (carbohydrate), LI (Lipophilic Index), SD (standard deviation), WHI-GARNET (Women’s Health Initiative-Genomics and Randomized Trials Network), WHI-SHARe (Women’s Health Initiative SNP Health Association Resource), wk (week). a Continuous variables were presented as mean (SD). b Cardiovascular disease was defined as self-reported coronary heart disease, heart failure, stroke, and peripheral artery disease at baseline.

122 Table 4.2 Top Genome-wide SNPs for Obesity among African Americans and Caucasian Americans

Minor/Major Discovery Cohorts c Replication Cohorts d CHR Lead SNP Position (hg19) Candidate Gene a Allele b MAF Beta (SE) P value P value African Americans 4p12 rs10938398 4:45186139 Near GABRG1 G/A 0.29 -0.20 (0.04) 1.8×10-7 0.25 11q23 rs1784692 11:113949232 ZBTB16 T/C 0.17 0.22 (0.04) 4.5×10-7 0.06 11q24 rs77773026 11:121602163 Near BLID T/C 0.08 -0.28 (0.06) 4.0×10-7 0.53 Caucasian Americans 2q22 rs2163585 2:140208911 Near LRP1B T/C 0.43 -0.27 (0.05) 2.3×10-9 0.04 5p15 rs183732088 5:7511516 ADCY2 G/A 0.01 -1.38 (0.29) 2.1×10-7 0.90 9p23 rs2841171 9:13344504 Near MPDZ T/A 0.03 0.83 (0.18) 8.1×10-7 0.38

Abbreviations: CHR (chromosome), MAF (minor allele frequency), SNP (single nucleotide polymorphism), SE (standard error). a Closest genes within ± 300 kb of the lead SNPs. b Allele with elevated risk for obesity were underlined. c The discovery cohorts were African Americans (N=7,678) and Caucasian Americans (N=4,079) within the Women’s Health Initiative Study. d The replication cohorts were African Americans women (N=1,714) in the Jackson Heart Study and Caucasian American women (N=3,188) in the Framingham Heart Study.

123 Table 4.3 Validation of Previously Identified GWAS Obesity Loci in the Women’s Health Initiative Study a

Minor/Major MAF Associations CHR Lead SNP Position (hg19) Candidate Gene b Allele c Beta (SE) P value African Americans 1p13 rs17024393 1:110154688 GNAT2 T/C 0.06 -0.19 (0.07) 0.003 1q25 rs543874 1:177889480 Near SEC16B A/G 0.23 -0.08 (0.04) 0.043 2p25 rs939583 2:622531 Near TMEM18 T/C 0.12 0.13 (0.05) 0.015 2p23 rs10182181 2:25150296 Near ADCY3 and DNAJC27 G/A 0.31 0.09 (0.04) 0.043 2q31 rs4667682 2:172127920 Near TLK1 and METTL8 C/T 0.14 -0.13 (0.06) 0.034 2q34 rs10167079 2:215257671 SPAG16 G/A 0.16 0.09 (0.05) 0.037 2q36 rs2176040 2:227092802 G/A 0.28 -0.07 (0.04) 0.043 3q27 rs1516725 3:185824004 ETV5 C/T 0.15 0.12 (0.05) 0.007 4p12 rs10938397 4:45182527 Near GNPDA2 A/G 0.29 -0.20 (0.04) 3.8×10-7 4q21 rs17001654 4:77129568 SCARB2 C/G 0.19 -0.11 (0.04) 0.014 6p12 rs17665162 6:50275258 Near DEFB112 T/C 0.07 -0.28 (0.12) 0.019 6p12 rs2207139 6:50845490 Near TFAP2B A/G 0.15 -0.16 (0.06) 0.005 7q21 rs9641123 7:93197732 CALCR G/C 0.16 0.13 (0.06) 0.016 8p12 rs7844647 8:34503776 T/C 0.40 0.10 (0.04) 0.008 11p15 rs10540 11:494662 RNH1 G/A 0.07 0.22 (0.09) 0.023 12q24 rs10773049 12:124506631 ZNF664-RFLNA T/C 0.48 -0.08 (0.03) 0.017 13q32 rs9634489 13:97049004 HS6ST3 G/A 0.44 -0.09 (0.03) 0.007 16p11 rs4787491 16:30015337 INO80E and BOLA2 A/G 0.49 -0.10 (0.03) 0.002 16q12 rs62033400 16:53811788 FTO A/G 0.17 -0.13 (0.05) 0.011 18q12 rs555267 18:40992698 Near SYT4 G/T 0.36 0.07 (0.03) 0.044 18q21 rs7243357 18:56883319 Near SEC11C and GRP T/G 0.16 0.10 (0.05) 0.047 20p12 rs8123881 20:15819495 MACROD2 A/G 0.28 -0.10 (0.03) 0.006 Caucasian Americans 1p31 rs2389438 1:83692383 C/A 0.16 0.15 (0.07) 0.031 1p12 rs9659323 1:119504361 TBX15 A/G 0.41 0.10 (0.05) 0.027 1q25 rs10913469 1:177913519 SEC16B T/C 0.21 -0.13 (0.06) 0.022

124 2p23 rs10182181 2:25150296 Near ADCY3 and DNAJC27 G/A 0.49 0.11 (0.04) 0.016 2p21 rs1979755 2:42708405 KCNG3 C/G 0.42 0.10 (0.05) 0.035 2q34 rs10167079 2:215257671 SPAG16 G/A 0.11 0.18 (0.07) 0.016 3p14 rs2365389 3:61236462 FHIT T/C 0.44 -0.09 (0.05) 0.048 3q13 rs1436351 3:104617973 T/G 0.26 0.11 (0.05) 0.033 3q28 rs1152846 3:188420897 LPP C/T 0.27 0.13 (0.05) 0.015 5q15 rs7713317 5:95716722 Near ELL2 and PCSK1 A/G 0.29 0.14 (0.05) 0.003 6p12 rs2207139 6:50845490 Near TFAP2B A/G 0.18 0.13 (0.06) 0.025 7q21 rs9641123 7:93197732 CALCR G/C 0.38 -0.10 (0.05) 0.026 9p21 rs10968576 9:28414339 LINGO2 A/G 0.30 -0.12 (0.05) 0.016 9q34 rs2270204 9:131042734 SWI5 T/G 0.29 -0.13 (0.05) 0.011 11p14 rs6265 11:27679916 BDNF C/T 0.19 0.13 (0.06) 0.025 12q13 rs11168854 12:49485373 DHH C/G 0.37 -0.10 (0.05) 0.048 12q21 rs2579103 12:90633507 A/C 0.25 0.13 (0.05) 0.011 12q24 rs11057405 12:122781897 CLIP1 G/A 0.09 0.21 (0.08) 0.005 16q12 rs62033400 16:53811788 FTO A/G 0.39 -0.12 (0.05) 0.012 16q22 rs889398 16:69556715 Near CYB5B and NFAT5 C/T 0.40 0.10 (0.05) 0.033 19p13 rs2304130 19:19789528 ZNF101 A/G 0.09 0.20 (0.08) 0.017 19q13 rs29941 19:34309532 Near KCTD15 G/A 0.32 0.10 (0.05) 0.037 19q13 rs2287019 19:46202172 QPCTL C/T 0.19 0.13 (0.06) 0.019

Abbreviations: CHR (chromosome), GWAS (genome-wide association study), MAF (minor allele frequency), SE (standard error), SNP (single nucleotide polymorphism). a African Americans (N=7,678) and Caucasian Americans (N=4,079) within the Women’s Health Initiative Study. b Closest genes within ± 300 kb of the lead SNPs. c Allele with elevated risk for obesity were underlined.

125 Table 4.4 Pathways Identified for Obesity by Multiple Pathway Methodologies a

Pathways Discovery Findings b Validation Findings c AA CA AA CA Axon guidance G, M --d G, M Tight junction (TJ) G, M -- Hypertrophic cardiomyopathy (HCM) G, M -- G Arrhythmogenic right ventricular cardiomyopathy (ARVC) G, M G, M -- -- Dilated cardiomyopathy G, M G, M -- -- G alpha (z) signaling events G, M -- Mitogen-activated protein kinase (MAPK) signaling G, M G -- Neural cell adhesion molecule (NCAM) signaling for neurite out-growth G, M G --

Abbreviations: AA (African American), CA (Caucasian American). a G represents GSA-SNP, M represents Mergeomics. b The discovery findings were significant pathways (FDR-adjusted q value < 0.2) for obesity identified from the Women’s Health Initiative Study (N=7,678 for AA and N=4,079 for CA) and replicated from the Jackson Heart Study (N=1,714) and Framingham Heart Study (N= 3,188). c The validation findings were presented as pathways with nominal P value < 0.05 from the Women’s Health Initiative Study (N=7,678 for AA and N=4,079 for CA). d Pathways suggested by discovery findings no need to be validated in the same ethnic group.

126 Table 4.5 Interactions between Key Driver Genes and Dietary Lipophilic Index on BMI in the Women’s Health Initiative Study a

Minor/Major BMI Differences (95% CIs) b Lead SNP Position (hg19) KD Genes MAF Allele G c E d G-E joint e P value f African Americans rs149820492 2:238345636 COL6A3 G/A 0.05 0.57 (0.26, 0.97) 0.06 (-0.08, 0.20) 1.18 (0.67, 1.70) 0.03 rs7869149 9:137601064 COL5A1 A/G 0.20 0.36 (0.18, 0.66) -0.04 (-0.22, 0.13) 0.74 (0.42, 1.11) < 0.01 rs2696288 17:48252804 COL1A1 C/T 0.15 0.28 (0.05, 0.50) -0.01 (-0.22, 0.13) 0.66 (0.36, 0.97) < 0.01 Caucasian Americans rs57442307 13:111048768 COL4A2 T/C 0.02 0.66 (0.06, 1.66) 0.15 (0.01, 0.34) 2.08 (1.00, 3.27) 0.02

Abbreviations: BMI (body mass index), CIs (confidence intervals), KD (key driver), LI (Lipophilic Index), MAF (minor allele frequency), SNP (single nucleotide polymorphism) a African Americans (N=7,678) and Caucasian Americans (N=4,079) within the Women’s Health Initiative Study. b Estimates presented are the BMI differences between each high-risk group versus the reference group (individuals without risk allele and having the average dietary LI) adjusting for the following confounders: age, age2, region, first four PCs, physical activity, baseline diabetes/ cardiovascular disease, education, income, smoking, hormone usage, hysterectomy status, alcohol consumption, percent calories from protein and carbohydrates, and total energy intake. Model additionally adjusted for hormone replacement therapy among Caucasian Americans. 95% CIs were calculated based on bootstrap method. c BMI was estimated for individuals with one risk allele and having the average dietary LI (LI=27.1 for African Americans and 28.1 for Caucasian Americans) d BMI was estimated for individuals without risk allele and having the average value of dietary LI among those in the highest quartile (LI = 29.8 for African Americans, and 31.8 for Caucasian Americans). e BMI was estimated for individuals with one risk allele and having the average value of dietary LI among those in the highest quartile (LI = 29.8 for African Americans, and 31.8 for Caucasian Americans). f Nominal P value for the interaction term between SNP and dietary LI

127 Mitogen-activated protein kinase signaling Axon guidance

Shared Genes

Key Driver

Neural cell adhesion molecule signaling for neurite out-growth Hypertrophic cardiomyopathy

Arrhythmogenic right ventricular cardiomyopathy

Dilated cardiomyopathy

128 Figure 4.1 Network of 6 Pathways Enriched for Obesity with Top 10 Key Driver Genes among African and Caucasian American

Women

The diamond nodes represent pathway and the ellipse modes represent genes, and the edge shows the interaction, that is, the association between a gene and a pathway. The color nodes are: red, top 10 key driver genes; light green, genes involved in ³ 2 pathways; others are pathway- specific genes. The figure was created using Cytoscape 74. Abbreviations: AXON (Axon guidance), ARVC (Arrhythmogenic right ventricular cardiomyopathy), DC (Dilated cardiomyopathy), HCM (Hypertrophic cardiomyopathy), MAPK (Mitogen-activated protein kinase signaling), NCAM (Neural cell adhesion molecule signaling for neurite out-growth).

129 4.8 Supplemental Material

Supplemental Table 4.1 Baseline Characteristics of Female Participants in the Jackson Heart Study and Framingham Heart Study

JHS-AA Female FHS-CA Female Overall Obese Non-obese Overall Obese Non-obese (N=1,714) (N=1,210) (N=504) (N=3,188) (N=968) (N=2,218) Age, years (SD) a 55.8 (12.5) 55.1 (12.2) 57.6 (13) 46.4 (11.7) 48.5 (11) 45.5 (11.9) Current smoking, n (%) 170 (9.9) 113 (9.3) 57 (11.3) 856 (26.9) 289 (29.9) 567 (25.6) Cardiovascular disease b, n (%) 178 (10.4) 129 (10.7) 49 (9.7) 7 (0.2) 4 (0.4) 3 (0.1) Type 2 diabetes, n (%) 411 (24) 340 (28.1) 71 (14.1) 30 (0.9) 19 (2) 11 (0.5) BMI, kg/m2 (SD) a 33.3 (7.8) 36.2 (6.7) 26.2 (5) 26.6 (5.9) 33.4 (5.7) 23.6 (2.7) Alcohol drinking, serving/week (SD) a 1.5 (6.2) 1.3 (5.7) 1.9 (7.2) 3.6 (5.2) 2.6 (5) 4 (5.2) Total energy intake, kcal/day (SD) a 1824.8 (751.3) 1850.6 (768.8) 1762.8 (704.4) 1855 (616.9) 1857.4 (632.7) 1853.7 (610.2)

Abbreviations: AA (African American), BMI (body mass index), CA (Caucasian American), FHS (Framingham Heart Study), JHS (Jackson Heart Study), SD (standard deviation). a Continuous variables were presented as mean (SD). b Cardiovascular disease was defined as self-reported coronary heart disease, heart failure, stroke, and peripheral artery disease at baseline.

130 Supplemental Table 4.2 Characteristics of Identified Significant Pathways for Obesity by Ethnicities

Pathways No. of genes Min. P value Main functions African Americans (n=6) Axon guidance 251 2.89´10-8 This pathway is the process by which neurons send out axons to reach the correct targets. 75 Tight Junction 134 1.44´10-7 Tight junctions form physical barriers in various tissues and regulate paracellular transport of water, ions, and small water-soluble molecules. Desmosomes mediate strong cell adhesion linking the intermediate filament cytoskeletons between cells and playing roles in wound repair, tissue morphogenesis, and cell signaling. 75 Hypertrophic 85 1.37´10-12 Hypertrophic cardiomyopathy is a primary myocardial disorder with an autosomal cardiomyopathy dominant pattern of inheritance that is characterized by hypertrophy of the left ventricles with histological features of myocyte hypertrophy, myofibrillar disarray, and interstitial fibrosis. These mutations increase the Ca2+ sensitivity of cardiac myofilaments 76. Arrhythmogenic 76 5.87´10-8 Arrhythmogenic right ventricular cardiomyopathy is an inherited heart muscle right ventricular disease that may result in arrhythmia, heart failure, and sudden death. The myocardial cardiomyopathy injury may be accompanied by inflammation. Defects in the ryanodine receptor could result in an imbalance of calcium homeostasis that might trigger cell death 76. Dilated 92 2.57´10-9 Dilated cardiomyopathy is a heart muscle disease characterized by dilation and cardiomyopathy impaired contraction of the left or both ventricles that results in progressive heart failure and sudden cardiac death from ventricular arrhythmia. Current hypotheses regarding causes of idiopathic dilated cardiomyopathy focus on chronic viral myocarditis and/or on autoimmune abnormalities. Viral myocarditis may progress to an autoimmune phase and then to progressive cardiac dilatation. Antibodies to the b1-adrenergic receptor, which are detected in a substantial number of patients with idiopathic dilated cardiomyopathy, may increase the concentration of intracellular cAMP and intracellular calcium, a condition often leading to a transient hyper- performance of the heart followed by depressed heart function and heart failure 76.

131 G alpha (z) 44 2.58´10-4 The heterotrimeric G protein G alpha (z), is a member of the G (i) family. G alpha (z) signaling events interacts with the Rap1 GTPase activating protein (Rap1GAP) to attenuate Rap1 signaling 75. Caucasian Americans (n=4) Arrhythmogenic 76 5.82´10-6 Also found among African Americans right ventricular cardiomyopathy Dilated 92 1.20´10-4 Also found among African Americans cardiomyopathy Mitogen-activated 267 4.06´10-15 The MAPK cascade is a highly conserved module that is involved in various cellular protein kinase functions, including cell proliferation, differentiation and migration 76. (MAPK) signaling Neural cell 64 6.74´10-10 The NCAM, is a member of the immunoglobulin (Ig) superfamily and is involved in adhesion molecule a variety of cellular processes of importance for the formation and maintenance of the (NCAM) signaling nervous system. The role of NCAM in neural differentiation and synaptic plasticity is for neurite out- presumed to depend on the modulation of intracellular signal transduction cascades. growth NCAM based signaling complexes can initiate downstream intracellular signals by at least two mechanisms: (1) activation of fibroblast growth factor receptor (FGFR) and (2) formation of intracellular signaling complexes by direct interaction with cytoplasmic interaction partners 75.

132 Supplemental Table 4.3 Effect of Genes Involved in Axon Guidance Pathway on Cardio-metabolic Disorders Mediated Through BMI among Participants in the Women’s Health Initiative Study

Effect Decomposition Proportion Mediated Number of Ethnicity OR (95%CIs) a (% of total effect) events Direct Effect b Indirect Effect c Log odds scale Risk difference scale Type 2 Diabetes AA 1216 0.92 (0.81, 1.04) 1.25 (1.20, 1.30) -- d -- CA 987 1.05 (0.91, 1.23) 1.36 (1.28, 1.44) 0.86 0.88 Coronary Heart Disease AA 330 0.96 (0.76, 1.18) 1.11 (1.02, 1.20) -- -- CA 523 0.93 (0.77, 1.11) 0.96 (0.90, 1.03) -- -- Heart Failure AA 383 1.02 (0.83, 1.28) 1.29 (1.20, 1.39) 0.92 0.93 CA 395 0.96 (0.76, 1.21) 1.21 (1.13, 1.31) -- --

Abbreviations: AA (African Americans), BMI (body mass index), CA (Caucasian Americans), OR (odd ratio), CIs (confidence intervals). a Effects are shown as ORs (95% CIs). CIs were calculated using bootstrapping. Models were adjusted for age, age2, region, first four principal components, physical activity, education, income, smoking, hormone usage, hysterectomy status, alcohol consumption, percent calories from protein and carbohydrates, and total energy intake. Model additionally adjusted for hormone replacement therapy among Caucasian Americans. b OR for type 2 diabetes/coronary heart disease/heart failure in persons who had high genetic risk score vs those who had low genetic risk score if the BMI at year 3 was what it would have been with low genetic risk score. c OR for type 2 diabetes/coronary heart disease/heart failure for those who had high genetic risk score, comparing the risk if the BMI at year 3 was what it would have been with high vs low genetic risk score. d Proportion mediated is undefined because the point estimate of the indirect effect is not in the same direction as the direct effect, or both the direct and indirect effects are not statistically significant.

133 A. B.

Supplemental Figure 4.1 Manhattan Plots of Obesity among African Americans (A. n=7,678) and Caucasian Americans (B. n=4,079) in the Women’s Health Initiative Study

134

Supplemental Figure 4.2 Network Key Driver Genes of the Pathways Enriched for Obesity among African Americans and Caucasian Americans

Top 10 ranked multi-tissue key drivers of obesity pathways in the Bayesian network of adipose, blood, brain, islet, liver, kidney, and muscle tissues, and the protein- protein interaction network. The nodes represent genes, and the edge shows the interaction. The color nodes are: light red, top 10 ranked key driver genes; mint, genes interact with the key drivers. The figure was created using Cytoscape 74.

135 4.9 References

1. Yang J, Bakshi A, Zhu Z, et al. Genetic variance estimation with imputed

variants finds negligible missing heritability for human height and body mass

index. Nature genetics 2015;47:1114-20.

2. Bray MS, Loos RJ, McCaffery JM, et al. NIH working group report-using

genomic information to guide weight management: From universal to

precision treatment. Obesity (Silver Spring, Md) 2016;24:14-22.

3. Moleres A, Martinez JA, Marti A. Genetics of Obesity. Current Obesity

Reports 2013;2:23-31.

4. Manolio TA, Collins FS, Cox NJ, et al. Finding the missing heritability of

complex diseases. Nature 2009;461:747-53.

5. Willett WC. Dietary fats and coronary heart disease. Journal of internal

medicine 2012;272:13-24.

6. Bray GA, Paeratakul S, Popkin BM. Dietary fat and obesity: a review of

animal, clinical and epidemiological studies. Physiology & behavior

2004;83:549-55.

7. Liu Q, Lichtenstein AH, Matthan NR, et al. Higher Lipophilic Index Indicates

Higher Risk of Coronary Heart Disease in Postmenopausal Women. Lipids

2017;52:687-702.

8. Qi Q, Chu AY, Kang JH, et al. Fried food consumption, genetic risk, and body

mass index: gene-diet interaction analysis in three US cohort studies. BMJ

(Clinical research ed) 2014;348:g1610.

9. Casas-Agustench P, Arnett DK, Smith CE, et al. Saturated fat intake

modulates the association between an obesity genetic risk score and body

mass index in two US populations. Journal of the Academy of Nutrition and

Dietetics 2014;114:1954-66.

136 10. Design of the Women's Health Initiative clinical trial and observational study.

The Women's Health Initiative Study Group. Controlled clinical trials

1998;19:61-109.

11. Anderson GL, Manson J, Wallace R, et al. Implementation of the Women's

Health Initiative study design. Annals of epidemiology 2003;13:S5-17.

12. Japanese Conference on the Biochemistry of Lipids. LipidBank [database].

Tokyo, Japan: Japanese Conference on the Biochemistry of Lipids; 2007.

(Accessed March 20th, 2017, at http://lipidbank.jp/.)

13. Nam D, Kim J, Kim SY, Kim S. GSA-SNP: a general approach for gene set

analysis of polymorphisms. Nucleic acids research 2010;38:W749-54.

14. Arneson D, Bhattacharya A, Shu L, Makinen VP, Yang X. Mergeomics: a

web server for identifying pathological pathways, networks, and key

regulators via multidimensional data integration. BMC genomics 2016;17:722.

15. Zhu J, Zhang B, Smith EN, et al. Integrating large-scale functional genomic

data to dissect the complexity of yeast regulatory networks. Nature genetics

2008;40:854-61.

16. Wang IM, Zhang B, Yang X, et al. Systems analysis of eleven rodent disease

models reveals an inflammatome signature and key drivers. Molecular

systems biology 2012;8:594.

17. Chan KH, Huang YT, Meng Q, et al. Shared molecular pathways and gene

networks for cardiovascular disease and type 2 diabetes mellitus in women

across diverse ethnicities. Circulation Cardiovascular genetics 2014;7:911-9.

18. Belsky DW, Moffitt TE, Sugden K, et al. Development and evaluation of a

genetic risk score for obesity. Biodemography and social biology 2013;59:85-

100.

137 19. Vanderweele TJ, Vansteelandt S. Odds ratios for mediation analysis for a

dichotomous outcome. American journal of epidemiology 2010;172:1339-48.

20. GWAS Catalog. 2018. (Accessed Oct 18th, 2018, at

https://www.ebi.ac.uk/gwas/.)

21. Speliotes EK, Willer CJ, Berndt SI, et al. Association analyses of 249,796

individuals reveal 18 new loci associated with body mass index. Nature

genetics 2010;42:937-48.

22. Cornelis MC, Rimm EB, Curhan GC, et al. Obesity susceptibility loci and

uncontrolled eating, emotional eating and cognitive restraint behaviors in men

and women. Obesity (Silver Spring, Md) 2014;22:E135-41.

23. Johnson MP, Brennecke SP, East CE, et al. Genetic dissection of the pre-

eclampsia susceptibility locus on chromosome 2q22 reveals shared novel risk

factors for cardiovascular disease. Molecular human reproduction

2013;19:423-37.

24. Llorente-Cortes V, Badimon L. LDL receptor-related protein and the vascular

wall: implications for atherothrombosis. Arteriosclerosis, thrombosis, and

vascular biology 2005;25:497-504.

25. Seki N, Bujo H, Jiang M, et al. LRP1B is a negative modulator of increased

migration activity of intimal smooth muscle cells from rabbit aortic plaques.

Biochemical and biophysical research communications 2005;331:964-70.

26. Tanaga K, Bujo H, Zhu Y, et al. LRP1B attenuates the migration of smooth

muscle cells by reducing membrane localization of urokinase and PDGF

receptors. Arteriosclerosis, thrombosis, and vascular biology 2004;24:1422-8.

27. Locke AE, Kahali B, Berndt SI, et al. Genetic studies of body mass index

yield new insights for obesity biology. Nature 2015;518:197-206.

138 28. Willer CJ, Speliotes EK, Loos RJ, et al. Six new loci associated with body

mass index highlight a neuronal influence on body weight regulation. Nature

genetics 2009;41:25-34.

29. Thorleifsson G, Walters GB, Gudbjartsson DF, et al. Genome-wide

association yields new sequence variants at seven loci that associate with

measures of obesity. Nature genetics 2009;41:18-24.

30. Rouskas K, Kouvatsi A, Paletas K, et al. Common variants in FTO, MC4R,

TMEM18, PRL, AIF1, and PCSK1 show evidence of association with adult

obesity in the Greek population. Obesity (Silver Spring, Md) 2012;20:389-95.

31. Hotta K, Nakamura M, Nakamura T, et al. Association between obesity and

polymorphisms in SEC16B, TMEM18, GNPDA2, BDNF, FAIM2 and MC4R

in a Japanese population. Journal of human genetics 2009;54:727-31.

32. Jurvansuu J, Zhao Y, Leung DS, et al. Transmembrane protein 18 enhances

the tropism of neural stem cells for glioma cells. Cancer research

2008;68:4614-22.

33. Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based

approach to human disease. Nature reviews Genetics 2011;12:56-68.

34. Liu YJ, Guo YF, Zhang LS, et al. Biological pathway-based genome-wide

association analysis identified the vasoactive intestinal peptide (VIP) pathway

important for obesity. Obesity (Silver Spring, Md) 2010;18:2339-46.

35. Zhong H, Yang X, Kaplan LM, Molony C, Schadt EE. Integrating pathway

analysis and genetics of gene expression for genome-wide association studies.

American journal of human genetics 2010;86:581-91.

36. Catterall WA. Voltage-gated calcium channels. Cold Spring Harbor

perspectives in biology 2011;3:a003947.

139 37. Villarroel P, Villalobos E, Reyes M, Cifuentes M. Calcium, obesity, and the

role of the calcium-sensing receptor. Nutrition reviews 2014;72:627-37.

38. Santulli G, Pagano G, Sardu C, et al. Calcium release channel RyR2 regulates

insulin release and glucose homeostasis. The Journal of clinical investigation

2015;125:4316.

39. Bonny O, Bochud M. Genetics of calcium homeostasis in humans: continuum

between monogenic diseases and continuous phenotypes. Nephrology,

dialysis, transplantation : official publication of the European Dialysis and

Transplant Association - European Renal Association 2014;29 Suppl 4:iv55-

62.

40. Sjaastad MD, Nelson WJ. Integrin-mediated calcium signaling and regulation

of cell adhesion by intracellular calcium. BioEssays : news and reviews in

molecular, cellular and developmental biology 1997;19:47-55.

41. Gailly P. New aspects of calcium signaling in skeletal muscle cells:

implications in Duchenne muscular dystrophy. Biochimica et biophysica acta

2002;1600:38-44.

42. Hamming KS, Soliman D, Webster NJ, et al. Inhibition of beta-cell sodium-

calcium exchange enhances glucose-dependent elevations in cytoplasmic

calcium and insulin secretion. Diabetes 2010;59:1686-93.

43. Zemel MB. Calcium modulation of hypertension and obesity: mechanisms and

implications. Journal of the American College of Nutrition 2001;20:428S-35S;

discussion 40S-42S.

44. Barry WH, Bridge JH. Intracellular calcium homeostasis in cardiac myocytes.

Circulation 1993;87:1806-15.

45. Yang YH, Manning Fox JE, Zhang KL, MacDonald PE, Johnson JD. Intraislet

SLIT-ROBO signaling is required for beta-cell survival and potentiates insulin

140 secretion. Proceedings of the National Academy of Sciences of the United

States of America 2013;110:16480-5.

46. O'Neill HM, Holloway GP, Steinberg GR. AMPK regulation of fatty acid

metabolism and mitochondrial biogenesis: implications for obesity. Molecular

and cellular endocrinology 2013;366:135-51.

47. Shi C, Zhu L, Chen X, et al. IL-6 and TNF-α induced obesity-related

inflammatory response through transcriptional regulation of miR-146b.

Journal of interferon & cytokine research : the official journal of the

International Society for Interferon and Cytokine Research 2014;34:342-8.

48. Rankinen T, Zuberi A, Chagnon YC, et al. The human obesity gene map: the

2005 update. Obesity (Silver Spring, Md) 2006;14:529-644.

49. Svensson KJ, Long JZ, Jedrychowski MP, et al. A Secreted Slit2 Fragment

Regulates Adipose Tissue Thermogenesis and Metabolic Function. Cell

metabolism 2016;23:454-66.

50. Nies VJ, Sancar G, Liu W, et al. Fibroblast Growth Factor Signaling in

Metabolic Regulation. Frontiers in endocrinology 2015;6:193.

51. Bost F, Aouadi M, Caron L, Binetruy B. The role of MAPKs in adipocyte

differentiation and obesity. Biochimie 2005;87:51-6.

52. Povlsen GK, Ditlevsen DK, Berezin V, Bock E. Intracellular signaling by the

neural cell adhesion molecule. Neurochemical research 2003;28:127-41.

53. Ditlevsen DK, Povlsen GK, Berezin V, Bock E. NCAM-induced intracellular

signaling revisited. Journal of neuroscience research 2008;86:727-43.

54. Nielsen J, Kulahin N, Walmod PS. Extracellular protein interactions mediated

by the neural cell adhesion molecule, NCAM: heterophilic interactions

between NCAM and cell adhesion molecules, extracellular matrix proteins,

and viruses. Advances in experimental medicine and biology 2010;663:23-53.

141 55. Yang HJ, Xia YY, Wang L, et al. A novel role for neural cell adhesion

molecule in modulating insulin signaling and adipocyte differentiation of

mouse mesenchymal stem cells. Journal of cell science 2011;124:2552-60.

56. Tsukita S, Furuse M. The structure and function of claudins, cell adhesion

molecules at tight junctions. Annals of the New York Academy of Sciences

2000;915:129-35.

57. Ahmad R, Rah B, Bastola D, Dhawan P, Singh AB. Obesity-induces Organ

and Tissue Specific Tight Junction Restructuring and Barrier Deregulation by

Claudin Switching. Scientific reports 2017;7:5125.

58. Winer DA, Luck H, Tsai S, Winer S. The Intestinal Immune System in

Obesity and Insulin Resistance. Cell metabolism 2016;23:413-26.

59. Schlingmann KP, Konrad M, Seyberth HW. Genetics of hereditary disorders

of magnesium homeostasis. Pediatric nephrology (Berlin, Germany)

2004;19:13-25.

60. Meng J, Glick JL, Polakis P, Casey PJ. Functional interaction between

Galpha(z) and Rap1GAP suggests a novel form of cellular cross-talk. The

Journal of biological chemistry 1999;274:36663-9.

61. Yeung F, Ramirez CM, Mateos-Gomez PA, et al. Nontelomeric role for Rap1

in regulating metabolism and protecting against obesity. Cell reports

2013;3:1847-56.

62. Iankova I, Chavey C, Clape C, et al. Regulator of G protein signaling-4

controls fatty acid and glucose homeostasis. Endocrinology 2008;149:5706-

12.

63. Soundararajan M, Willard FS, Kimple AJ, et al. Structural diversity in the

RGS domain and its interaction with heterotrimeric G protein alpha-subunits.

142 Proceedings of the National Academy of Sciences of the United States of

America 2008;105:6457-62.

64. Mariman EC, Wang P. Adipocyte extracellular matrix composition, dynamics

and role in obesity. Cellular and molecular life sciences : CMLS

2010;67:1277-92.

65. Bayomy AF, Bauer M, Qiu Y, Liao R. Regeneration in heart disease-Is ECM

the key? Life sciences 2012;91:823-7.

66. Yuen M, Cooper ST, Marston SB, et al. Muscle weakness in TPM3-myopathy

is due to reduced Ca2+-sensitivity and impaired acto-myosin cross-bridge

cycling in slow fibres. Human molecular genetics 2015;24:6278-92.

67. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL. The human

disease network. Proceedings of the National Academy of Sciences of the

United States of America 2007;104:8685-90.

68. Scherer PE, Bickel PE, Kotler M, Lodish HF. Cloning of cell-specific secreted

and surface proteins by subtractive antibody screening. Nature biotechnology

1998;16:581-6.

69. Pasarica M, Gowronska-Kozak B, Burk D, et al. Adipose tissue collagen VI in

obesity. The Journal of clinical endocrinology and metabolism 2009;94:5155-

62.

70. Khan T, Muise ES, Iyengar P, et al. Metabolic dysregulation and adipose

tissue fibrosis: role of collagen VI. Molecular and cellular biology

2009;29:1575-91.

71. Yang W, Ng FL, Chan K, et al. Coronary-Heart-Disease-Associated Genetic

Variant at the COL4A1/COL4A2 Locus Affects COL4A1/COL4A2

Expression, Vascular Cell Survival, Atherosclerotic Plaque Stability and Risk

of Myocardial Infarction. PLoS genetics 2016;12:e1006127.

143 72. Williams AS, Kang L, Wasserman DH. The extracellular matrix and insulin

resistance. Trends in endocrinology and metabolism: TEM 2015;26:357-66.

73. Dokken BB. The Pathophysiology of Cardiovascular Disease and Diabetes:

Beyond Blood Pressure and Lipids. Diabetes Spectrum 2008;21:160-5.

74. Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for

integrated models of biomolecular interaction networks. Genome Res

2003;13:2498-504.

75. Reactome. 2018. (Accessed Oct 17th, 2018, at https://reactome.org/.)

76. Kyoto Encyclopedia of Genes and Genomes. 2018. (Accessed Oct 17th, 2018,

at https://www.genome.jp/kegg/.)

144