Urinary cadmium and lung cancer risk in smokers from the Multiethnic Cohort Study

A DISSERTATION SUBMITTED TO THE FACULTY OF THE UNIVERSITY OF MINNESOTA BY

Shannon Sullivan Cigan

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Irina Stepanov, Ph.D., Advisor

August 2019

© Shannon Sullivan Cigan 2019

Acknowledgements

First, I want to thank my research and academic advisor, Dr. Irina Stepanov. This work could not have happened without your encouragement, support and great mentorship. You have guided me to become an independent researcher by providing me an environment in which you challenged me, yet supported my work. Thank you to Dr. Bruce H. Alexander, my unofficial co- advisor, for your constant support, guidance and mentorship during my five years with the department. Your value and dedication in helping students is very inspirational to me and is a model I will use in my future mentorship and career.

Further, I would like to thank my dissertation committee, Dr. Cavan S. Reilly and Dr. S. Lani Park for assisting me in all aspects of my research and always challenging me to put my best foot forward and be the best researcher. Noel Stanton and his team at the Wisconsin State Laboratory of Hygiene University of Wisconsin-Madison, for your expertise and support in measuring cadmium in urine for not only this study but others.

This work could not have happened without the academic and research financial support provided to me throughout my graduate career. I was fortunate to receive funding support from the National Institute of Health (R01 CA-179246 and P01 CA-217806) and the National Institute for Occupational Safety and Health training grant (5T42OH008434-13).

Last but certainly not least to my family and friends – my support team - I am eternally grateful. The culmination of this research and degree is not my own, as I could not have reached my goals without your constant love, encouragement and prayers. To my husband, Mathew, for your patience and endurance during my graduate education. To my parents, for your praise and encouragement in being your “perpetual student”. A special thank you to my fellow PhD student and best friend, Gabriela Bustamante, you are an amazing woman. I have enjoyed every step of this journey together; your accountability, self-validation and hand holding when things got scary, whether it was in work or in our personal life, I will

i forever be grateful. I come out of this program with more than I could have ever hoped for, life-long mentors, friends and family, that I can call my support team.

ii

Dedication

To my support team of family and friends.

iii

Abstract

Objective: The overall objective of this research is to investigate the effects of smoking, occupational exposures, demographics (e.g. race/ethnicity and education) and common genetic variants on the levels of urinary cadmium (Cd), a validated biomarker of long-term Cd exposure, in current smokers from the Multiethnic Cohort (MEC) Study. In addition, test whether this biomarker in smokers is associated with lung cancer risk. The overall objective of this research was accomplished in three separate dissertation chapters (chapter 2, 3 and 4) and set up as three sequential papers.

Specific Aim 1:

Aim: To investigate the relationship between self-reported occupation and urinary Cd levels in 1,956 current smokers at time of urine collection from five race/ethnicity groups from the MEC Study with complete covariate data.

Methods: Urinary Cd was measured by inductively coupled plasma mass spectrometry (ICP-MS). Internal smoking dose was estimated by the biomarker urinary total nicotine equivalents (TNE). A censored multiple linear regression model (tobit regression) was used to estimate the percent adjusted change in the association between occupational exposure categories and levels of urinary Cd, while adjusting for confounding variables, with estimated 95% confidence intervals (CI) to characterize precision. Occupational exposure categories were based on the combined response to questions regarding the industry and occupation the participant worked the longest.

Results: Participants categorized as ‘Likely exposed’ to Cd based on their occupations had 12.1% (95%CI: 1.2%, 24.3%) higher levels of urinary Cd as compared to those ‘Not likely exposed’ to Cd in the workplace, after adjusting for age, sex, race/ethnicity, creatinine (natural-log), education, smoking dose (TNE), and duration (years of smoking). Similarly, participants categorized as ‘Possibly exposed’ to Cd in the workplace had 7.7% (95%CI: 0.1%, 16.0%) higher levels of urinary Cd as compared to workers ‘Not likely exposed’ to Cd in the workplace. iv

Specific Aim 2:

Aim: To conduct a genome-wide association study to identify common genetic variants that may be associated with urinary Cd in smokers (N=1,977). In addition, evaluate the association between urinary Cd and single nucleotide polymorphisms (SNPs) previously reported in literature to be associated with Cd and located within that have biological plausibility to be associated with Cd absorption, distribution, metabolism and elimination.

Methods: Linear regression was used to test the association of each genetic variant with urinary Cd levels, adjusted for age at urine collection, sex, self-reported race/ethnicity, creatinine (natural log), smoking dose (urinary TNE), and the top 10 leading PCs. Allele dosage was used as the explanatory variable of most interest in the analysis. Genome-wide significance was based on a Bonferroni-corrected 5% significant threshold of p < 8.4 x 10-9 (0.05/5,944,091 SNPs for analyses). In addition, a candidate SNP approach was used to identify associations in 1,169 single nucleotide polymorphisms (SNPs) comprised from 15 SNPs reported in literature to be associated with Cd biomarkers and SNPs within 29 candidate genes identified a priori that have biological plausibility to be involved in transport of Cd; Metallothionein (MT) family, metal-regulatory factor 1 (MTF1), and Glutathione S-transferase (GST) gene family.

Results: No single SNP was associated with urinary Cd in the genome- wide association study. The SNP with the lowest p-value for the association with urinary Cd was rs673456 on 11, an intron variant close to the TENM4 gene (p=3.47 x 10-7). The candidate SNP analyses also did not reach statistically significance (lowest p-values 3.89 x 10-4). After adjusting for age at urine collection, sex, creatinine (natural log), and total nicotine equivalents, geometric mean levels of urinary Cd were significantly different across the five race/ethnicity groups (p<0.001). Latinos and Native Hawaiians had the highest geometric mean urinary Cd (0.87 and 0.84 ng/mL) followed by Japanese Americans (0.81 ng/mL), African Americans (0.80 ng/mL), and Whites (0.74

v ng/mL). Geometric means were also higher in females than males across all race/ethnicity groups.

Specific Aim 3:

Aim: To investigate the association of urinary Cd and lung cancer risk in a subset of 1,956 MEC current smokers at the time of urine collection.

Methods: Cd was analyzed by ICP-MS in urine samples collected between 1997 and 2006. Lung cancer incidence was identified by linkages with the statewide Surveillance, Epidemiology and End Results (SEER) registries of Hawaii and California through December 31, 2016. Cox proportional hazards regression was used to estimate multivariable-adjusted hazard ratios (HR) and 95% confidence intervals (95% CI) for lung cancer. Urinary Cd was modelled as (1) continuous urinary Cd levels (natural log) and (2) quartiles of urinary Cd levels.

Results: A total of 89 lung cancer cases were diagnosed in our subset of 1,956 current smokers with a median follow-up time of 12.4 years. Higher urinary Cd was associated with increased lung cancer risk after adjustment for age, sex, race/ethnicity, creatinine (natural log), education, smoking dose (urinary total nicotine equivalents), smoking duration (years of smoking), and occupational Cd exposure (HR: 1.69; 95% CI: 1.26, 2.26). Categorical analysis by quartiles of urinary Cd levels showed the multivariable-adjusted HR for lung cancer increased by increasing quartile of urinary Cd. Relative to the lowest quartile of urinary Cd levels, the adjusted HR for the fourth (highest) quartile was 3.45 (95% CI: 1.73, 6.89), third quartile was 2.02 (95% CI: 1.05, 3.89) and second quartile was 1.21 (95% CI: 0.62, 2.36).

Overall Conclusions: Overall, our results indicate that urinary Cd is an important determinant of lung cancer risk in current smokers at the time of urine collection from the MEC Study and future studies investigating lung cancer risk will benefit from these findings. Specific Aim 1 demonstrated that at similar levels of smoking, smokers in occupations that have a potential for Cd exposure had

vi higher levels of urinary Cd. This suggest workers in these types of occupations who also smoke may be at an increased risk for lung cancer and should be targeted for intervention measures. In Specific Aim 2, no common genetic variants were associated with urinary Cd at a genome-wide level. In addition, the candidate SNP analyses also did not demonstrate a statistically significant association with urinary Cd levels. Specific Aim 3 demonstrated that levels of urinary Cd were associated with lung cancer incidence in smokers from the MEC study and this association persisted after adjustment for occupational Cd exposure, smoking dose (TNE), duration (years of smoking) and other potential risk factors. Replication of findings in a larger sample size and/or studies with well-characterized measures of occupational Cd exposure, dietary and environmental exposures are warranted. The findings of this study were consistent with literature, confirm Cd is a risk factor for lung cancer and this relationship can be detected in current smokers at the time of urine collection. Public health efforts to reduce Cd exposure including tobacco cessation programs, reducing the environmental and industrial impact of Cd, and the implementation of educating smokers in occupations that pose a high risk for Cd exposure, may contribute to the reduction of lung cancer in the future.

vii

Table of Contents

Acknowledgements ...... i

Dedication ...... iii

Abstract ...... iv

Table of Contents ...... viii

List of Tables ...... i

List of Figures ...... i

Preface ...... iii

Chapter 1 ...... 1

Background and significance ...... 1

Chapter 2 ...... 11

Relationship between urinary cadmium and occupation in smokers from the Multiethnic Cohort Study ...... 11

Chapter 3 ...... 37

Genome-wide and candidate gene association study of urinary cadmium levels in smokers ...... 37

Chapter 4 ...... 78

The association between urinary cadmium levels and risk of lung cancer in smokers from a prospective cohort study ...... 78

Chapter 5 ...... 97

Conclusions and Future Direction ...... 97

Bibliography ...... 100

APPENDICES ...... 109

APPENDIX A ...... 110

viii

List of Tables

Chapter 1 ...... 1

Table 1. Urinary cadmium prevalence in the U.S. Working Population (aged 18 to 64 years), 1988-1994 reported by Yassin et al (30) ...... 9 Chapter 2 ...... 11 Table 2. Categorization of participants based on their combined response to two questions regarding longest history of industry and occupation worked (Figure 1a ...... 26 Table 3. Main characteristics and biomarkers of study population by four occupational exposure categories (n=1,956) ...... 27 Table 4. Multivariable adjusted analysis of the association between urinary cadmium and occupational exposure categories compared to the reference category (‘Not likely exposed’) ...... 29 Table 5. (Supplemental Table S1) Number of participants categorized in the ‘Likely exposed’ to cadmium in the workplace (N=151) ...... 31 Table 6. (Supplemental Table S2) Number of participants categorized in the ‘Possibly exposed’ to cadmium in the workplace (N=347) ...... 32 Table 7. (Supplemental Table S3) Number of participants categorized in the ‘Not likely exposed’ to cadmium in the workplace (N=1,116) ...... 33 Table 8. (Supplemental Table S4) Number of participants categorized in the ‘Unknown exposure’ to cadmium in the workplace (N=363) ...... 34 Chapter 3 ...... 37

Table 9. List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans...... 62 Table 10. List of the 29 candidate genes identified a priori from literature that shows biological plausibility for being involved in the absorption, distribution, metabolism and elimination of urinary Cd ...... 63

i

Table 11. Main characteristics and biomarkers of study participants stratified by race/ethnicity and sex (n=1,977) ...... 64 Table 12. Median and interquartile range (IQR) for urinary cadmium, ng/mL .... 65 Table 13. Geometric mean (95% CI) of urinary cadmium stratified by race/ethnicity and sex (n=1,977) ...... 66 Table 14. Top GWAS results (p < 5 x 10-5) for the association between urinary cadmium and genotyped and imputed alleles in smokers ordered by chromosome...... 68 Table 15. Candidate SNP analysis results for the 15 specific SNPs that were found to be associated with cadmium in previous studiesa ...... 69 Table 16. Top 100 candidate SNP analysis results (by p-value) for the 1,169 SNPs identified in genes that were hypothesized to be associated with cadmium ...... 70 Chapter 4 ...... 78

Table 17. Characteristics of the study population by status at the end of follow- up, December 31, 2016 (n=1,956) ...... 89 Table 18. Estimated hazard ratio (95% CI) of lung cancer by one-log increase in urinary cadmium among 1,956 participants from the MEC Study (89 cases) ...... 90 Table 19. Characteristics of the study population overall and by quartiles of urinary cadmium levels (N=1,956) ...... 91 Table 20. Estimated hazard ratio (95% CI) of lung cancer by quartiles of urinary cadmium levels among a subset of participants from the MEC Study (89 cases) ...... 92 Table 21. (Supplementary S6) Estimated hazard ratio (95% CI) of lung cancer levels by various characteristics adjusted for log-transformed urinary cadmium and other covariates ...... 95 Table 22. (Supplementary S7) Estimated hazard ratio (95% CI) of lung cancer levels by various characteristics adjusted for continuous (non-transformed) urinary cadmium and other covariates ...... 96

ii

List of Figures

Chapter 1 ...... 1

Figure 1. Overall schematic of causal models for (a) Specific Aim 1, (b) Specific Aim 2, and (c) Specific Aim 3 ...... 10 Chapter 2 ...... 11 Figure 2. MEC Study Initial Questionnaire (1993-1996; page 22). (72) Work history questions from the questionnaire administered to participants at cohort entry ...... 25 Figure 3. Proportion of participants assigned to occupational exposure categories overall and within each race/ethnicity ...... 28 Figure 4. Racial/ethnic-specific analysis of the association between urinary cadmium and race, adjusted for sex, age at urine collection, creatinine (natural log), maximum education, urinary TNE, average years of smoking and occupational exposure category ...... 30 Figure 5. (Supplemental Figure S1) Direct acyclic graph for the association between urinary cadmium and self- report occupation ...... 35 Figure 6. (Supplemental Figure S2) Median levels of urinary cadmium (natural log of ng/mL) by occupational exposure category...... 36 Chapter 3 ...... 37

Figure 7. Quantile-Quantile plot of observed and expected –log10 transformed p- values from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles...... 67 Figure 8. Manhattan plot from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles...... 67 Figure 9. (Supplemental Figure S3) Conceptual model for the association between urinary cadmium and genetic variants ...... 73 Figure 10. (Supplemental Figure S4) Geometric means of urinary cadmium levels by race/ethnicity ...... 74

i

Figure 11. (Supplemental Figure S5). Geometric means of urinary cadmium levels by race/ethnicity and sex...... 75 Figure 12. (Supplemental Figure S6). QQ-plot and Manhattan plot from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles...... 76 Figure 13. (Supplemental Figure S7) QQ-plot and Manhattan plot from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles...... 77 Chapter 4 ...... 78

Figure 14. (Supplemental Figure S8) Direct acyclic graph for the association between urinary cadmium and lung cancer risk ...... 93 Figure 15. (Supplemental Figure S9). Median levels of urinary cadmium (natural log of ng/mL) of participants at time of urine collection by status as of December 31, 2016 ...... 94

ii

Preface

This dissertation is the final result of the research work conducted during my doctoral studies in the Department of Environmental Health Sciences at the University of Minnesota School of Public Health. This dissertation is organized starting with Chapter 1 which provides an initial introductory to the overall study, as well as background information, gaps in knowledge and presentation of the research questions proposed. This is followed by three chapters (Chapters 2-4) that pertain to the three specific aims proposed and provides a presentation of the findings for each study. These chapters are written in manuscript format to be submitted to peer-viewed scientific journals, therefore redundancy of information will appear. Chapter 5 provides a discussion and summary of the overall findings, conclusions and the public health impact of the research.

iii

Chapter 1 Background and significance

Lung cancer is a critical public health issue

In the United States (U.S.), lung cancer is the second most commonly diagnosed cancer in both men and women, with an estimated 228,150 new cases of lung cancer expected to be diagnosed in 2019 alone (1). Overall, lung cancer contributes to approximately 24% of all estimated cancer-related deaths, reflecting a significant impact on public health (1). Adding to the devastating incidence and mortality statistics is the substantial financial burden of lung cancer to those diagnosed, to their families, and to society as a whole. Medical costs associated with lung cancer were estimated to be $14.2 billion in 2018, and projected to increase due to the aging and growth of the U.S. population (2,3). Thus, better understanding of the factors that contribute to lung cancer incidence and mortality is critically needed to help identify susceptible individuals who can be targeted for disease prevention measures.

Cigarette smoking is the greatest risk factor for lung cancer. The U.S. Surgeon General estimated that 90% of all lung cancer deaths are attributable to cigarette smoking (4,5). However, not all smokers get lung cancer: an estimated 11% of female smokers and 24% of male smokers will develop lung cancer over their lifetime and factors contributing to the inter-individual differences in lung cancer risk among smokers are not well understood (5). Therefore, understanding factors that contribute to the likelihood of developing lung cancer in smokers is a key public health challenge. It is generally known that, lung cancer rates and trends vary substantially by sex, age, race/ethnicity, and socioeconomic status. For instance, lung cancer mainly occurs in older people, with the average age at diagnosis being 70 years (6). More men are diagnosed with lung cancer each year, but more women live with the disease (7). African American men and women are more likely to be diagnosed and die from lung cancer than people from any other racial or ethnic group (8,9). Education level,

1 as a proxy for socioeconomic status, shows an inverse correlation with death rates from lung cancer (10,11). Furthermore, while exposure to tobacco-specific lung carcinogens may be somewhat similar among smokers, evidence suggests the intake and metabolism in response to these chemicals varies and can only partly explain the differing patterns of smoking and lung cancer risk (12). This suggests that, together with the already identified risk factors, exposure to lung carcinogens from multiple sources could amplify the risk of lung cancer development in some smokers.

Exposure to cadmium is a key risk factor for lung cancer

Cadmium (Cd), a constituent of cigarette smoke and a widespread environmental pollutant, is associated with many adverse health outcomes including lung cancer, asthma, and emphysema (13–15). The association between Cd exposure and lung cancer is well established in humans and rodents and toxicological and occupational studies confirm that Cd is a respiratory toxicant and lung carcinogen (13,14,16). It has been suggested that Cd is involved in the increase of oxidative stress by being a catalyst in the formation of reactive oxygen species both in vitro and in vivo, increase lipid peroxidation, and the depletion of glutathione and protein-bound sulfhydryl groups (13,17,18). In addition, Cd may also be involved the modification activity of transcription factors and indirectly inhibit the protective function of DNA repair mechanisms (13,19). Based on the available evidence from mechanistic, in vitro and in vivo studies, and human epidemiological data, Cd and its compounds were classified by the International Agency for Research on Cancer (IARC) as a Group 1 known human carcinogen (20,21).

Cigarette smoking is a major source of human exposure to Cd. There is roughly 1.0 – 2.0 µg of Cd in a single cigarette, of which nearly 2-10% is transferred to the cigarette smoke and when inhaled, 10-50% of the Cd is absorbed into the lungs (15,16,22,23). It is estimated that a person smoking 20 cigarettes per day (one pack) will absorb roughly ~1 µg of cadmium daily (22). Following inhalation, Cd is stored in the lung tissue and kidney and eliminated from the body through urine. Population-level data from The National Health and

2

Nutrition Examination Survey (NHANES) has consistently demonstrated significant increases in urinary Cd levels of smokers as compared to nonsmokers, with smokers having an almost 2-fold higher geometric mean urinary Cd (NHANES 2015-2016: 0.275 µg/L versus 0.155 µg/L, respectively) (24–27).

In addition to smoking, certain occupations may serve as a source of substantial Cd intake because Cd is also an environmental and industrial pollutant. Based on evidence of the carcinogenicity from occupational studies and others, IARC classified Cd and its compounds as a group 1 human lung carcinogen (13). Although Cd has been recognized as an occupational health hazard for decades, and occupational standards and guidelines for permissible Cd exposure limits were put in place, workers in a wide variety of occupations are still potentially at risk for Cd exposure (16,28). The Occupational Safety and Health Administration (OSHA) estimates indicate that more than 512,000 U.S. workers are exposed to Cd and Cd compounds in the workplace (29). Further evidence from a study using NHANES data (NHANES III, 1988-1994) reported a range of urinary Cd levels from 0.10 to 15.57 µg/L among U.S. workers aged 18 to 64 years (30). Although, the reported overall population geometric mean of urinary Cd (0.30 µg/L; 95% CI 0.28, 0.32) was somewhat similar to levels reported for smokers, this study demonstrated that a noteworthy number of workers have urinary Cd levels 10- to 50-fold higher than this level (Table 1). Occupations known to pose a risk for Cd exposure include Cd and Cd alloy production and refining, nickel–Cd battery manufacturing, Cd pigment manufacturing and formulation, mechanical plating, zinc smelting, brazing with a silver– Cd–silver alloy solder, polyvinylchloride compounding, vehicle mechanics, transportation, and repair service industries, fuel combustion and exposure to Cd-containing phosphate fertilizers (13,30,31). These additional exposures could subsequently elevate lung cancer risk in some smokers.

Cadmium is widely dispersed into the environment through industrial and agricultural practices, and can also be easily taken up from the soil by various

3 plants, further expanding the potential exposures to this carcinogen and the relating increases in lung cancer risk (22,32). For example, an association between Cd exposure (assessed using urinary Cd) and lung cancer incidence was found in a population-based prospective cohort study looking at Cd exposure including participants living in the vicinity to three zinc smelters and those from a low exposure area in Belgium (N=994). The study revealed that after adjustment for sex, age and smoking the overall hazard ratio for lung was 1.70 (95% CI: 1.13, 2.57) for a doubling of 24-hour urinary Cd excretion and 4.17 (95% CI: 1.21-14.4) for those living in the high exposure area compared to the low exposure area (33). Such additional exposures to Cd due to occupation or environmental sources could potentially contribute to the inter-individual differences in lung cancer risk among smokers. Lastly, diet can also contribute to inter-individual differences in Cd exposure as a primary source of exposure among nonsmoking, non-occupationally exposed populations, with only a small amount (1-10%) entering the body through the digestive tract (34). While most foods contain low concentrations of Cd, levels may vary depending on the type of food, agricultural and cultivating practices, and the amount of anthropogenic contamination. Cd has been found in seafood, shellfish, animal organ meats (e.g. liver, and kidney), potatoes and grains, peanuts, soybeans and leafy green vegetables (e.g. lettuce, and spinach) (34).

The intake and excretion of Cd can be affected by individual characteristics

Human exposure to Cd is commonly assessed through the analysis of blood or urine. While blood Cd levels are indicative of recent exposure (half-life 3-4 months), urinary Cd has a much longer half-life (10-30 years), and therefore is a better biomarker for investigations of long-term Cd exposure and health outcomes (14,19,22,23,35,36). Multiple studies indicate that, in addition to smoking and other external exposures, individual characteristics are important contributing factors to the variations in urinary Cd levels in the general U.S. population. For instance, levels of urinary Cd increase with age, are higher among women and generally reported higher in individuals with lower education

4

(24,25,30,31,35–39). However, some studies are inconsistent with these findings. For example, using seven phases of NHANES data ranging between 1988 and 2008, Tellez-Plaza et al. consistently found the highest geometric mean level of urinary Cd in the ‘Other’ race/ethnicity category (40). However, the second highest geometric mean level of urinary Cd by race/ethnicity varied by year. For example, for NHANES 1988-1994, the highest geometric mean following the ‘Other’ race/ethnicity category was African American adults (0.41 µg/g creatinine) and the lowest in White (0.36 µg/g creatinine) adults. However, in 1999-2002 and 2003-2008 Mexican American adults had the highest levels (0.29 and 0.26 µg/g creatinine, respectively) with Whites (0.26 and 0.24 µg/g creatinine, respectively) and African American adults (0.27 and 0.24, µg/g creatinine, respectively) having lower levels. These results indicate the effect race/ethnicity may play in levels of urinary Cd, although unclear if this effect is due to the differences in exposure or in intrinsic factors, such as differences in absorption, metabolism and/or excretion rates. For instance, research by Riederer et al. indicated the potential impact of the place of birth (outside versus inside the U.S.) on urinary Cd levels, with Mexican Americans born outside the U.S. having higher urinary Cd levels among children, and adults as compared to Non-Hispanic White, born in the U.S. (36). The same study also reported significant differences in the relative proportion of Non-Hispanic White, Black and Mexican American workers across categories which were assumed to be occupationally exposed to Cd. Together, these studies suggest that racial/ethnic differences in urinary Cd levels may be at least in part due to differences in external factors. However, intrinsic factors cannot be excluded.

Factors that influence Cd absorption, distribution, storage and elimination are not well understood. Several recent studies illustrate the potential role that genetics may play in Cd toxicity (41–48). A number of cellular defense mechanisms are proposed to be activated in response to Cd exposure, specifically induction of metallothionein (MT) and cellular glutathione (GSH). For example, polymorphisms in the metallothionein IIA gene (MT2A), have previously been shown to be associated with differences in Cd concentrations in the human

5 renal cortex, with individuals that have lower expression of MT2A having higher levels of Cd in their blood (46–48). Variants within glutathione S-transferase (GST) genes, a phase II family of enzymes that play a role in the detoxification of reactive electrophiles and catalyzes intra-cellular binding of GSH-metal conjugates, have previously been associated with Cd toxicity (42,43). Thus, it is plausible that genetic variations could potentially alter Cd body burden, including urinary Cd levels, and the susceptibility to Cd toxicity in smokers. To our knowledge only two published studies have used a genome wide association study (GWAS) approach to examine the relationship between common genetic variants and levels of Cd in blood revealing associations within genes that have suggestive functions related to the absorption and metabolism of blood Cd (49,50). Ng et al. used serum Cd measurements and found an association with a SNP within the CD109 gene region (49). Borné et al. used erythrocyte Cd measurements and found associations among non-smokers in SNPs within the DLCAP1 and XKR9 gene regions (50). However, there is no GWAS investigating the association between common genetic variants with urinary Cd. An improved understanding of genetic polymorphisms on urinary Cd levels may help identify smokers who are susceptible to the toxicity of Cd.

Potential impact of the proposed research on human health recommendations

As with any multifactorial disease, the examination of individual risk factors for lung cancer have been insufficient to account for the entire burden of the disease. Understanding which factors contribute to the likelihood of developing lung cancer in smokers is a key public health challenge. This study specifically focuses on exposure to Cd in current smokers. Indeed, there is extensive evidence on the association between smoking and Cd and the association between occupational exposures and Cd, however there is very little literature that evaluates the relationship of Cd levels in relation to multiple factors known to contribute or modify Cd (e.g. cigarette smoking, occupational exposures, and individual characteristics) that may contribute to the variation of Cd levels in current smokers (Specific Aim 1). It has also been demonstrated that

6 although exposure to cancer causing chemicals from tobacco may be somewhat similar in smokers, evidence suggest that the metabolism and DNA damage and repair in response to these chemicals varies and can only partly explain differing patterns of smoking and lung cancer risk. This indicates the importance of understanding genetic factors that may contribute differences in exposure to tobacco-related lung carcinogens. However, evidence is lacking on the association between genetic polymorphisms and urinary Cd levels. These variants can be identified using a genome-wide association approach (Specific Aim 2). Lastly, an improved understanding of the interplay of individual characteristics and genetic polymorphisms on Cd exposure may help identify individuals who are at increased risk for lung cancer. While there is no question that exposure to tobacco-specific lung carcinogens is a major cause of smoking- related lung cancer, there is little ability to predict who among the more than one billion smokers in the world is most at risk to develop lung cancer. Therefore, it is necessary to assess the aggregate exposure to Cd through multiple factors to help identify current smokers at greatest risk for lung cancer and whether these are modifiable risk factors (Specific Aim 3). The contribution of this research will be significant as it has the potential to identify multiple factors that contribute to Cd exposure, which in part may influence differences in susceptibility to lung cancer risk in smokers.

Objective

The overall objective of this research is to investigate the effects of smoking, occupational exposures, demographics (e.g. race/ethnicity and education) and common genetic variants on the levels of urinary Cd, a validated biomarker of long-term Cd exposure, in current smokers from the Multiethnic Cohort (MEC) Study. In addition, test whether this biomarker in smokers is associated with lung cancer risk.

Specific Aims

The overall objective of this research will be accomplished in three separate dissertation chapters through the following specific aims:

7

Chapter 2

Specific Aim 1: To investigate the relationship between self-reported occupation and urinary Cd levels in smokers from five race/ethnicity groups from the MEC Study with complete covariate data.

Hypothesis 1: Smokers employed in occupations known to pose a risk for Cd exposures will have higher levels of urinary Cd, after adjusting for smoking dose and duration.

Chapter 3

Specific Aim 2: Identify common genetic variants that may be associated with urinary Cd in current smokers from the MEC.

Hypothesis 2: Inter-individual differences in the levels of urinary Cd are related to common genetic variants.

Chapter 4

Specific Aim 3: Investigate the relationship between urinary Cd and lung cancer risk in current smokers at time of urine collection.

Hypothesis 3: Urinary Cd will be positively associated with incidence of lung cancer in current smokers at the time of urine collection in the MEC cohort.

8

Table 1. Urinary cadmium prevalence in the U.S. Working Population (aged 18 to 64 years), 1988-1994 reported by Yassin et al (30)

Urinary Estimated cadmium level Prevalence (N) ≥15 µg/L 0.0028% (3,907) ≥10 µg/L 0.06% (78,471) ≥5 µg/L 0.42% (551,000) ≥3 µg/L 1.68% (2,350,000)

9

Figure 1. Overall schematic of causal models for (a) Specific Aim 1, (b) Specific Aim 2, and (c) Specific Aim 3

10

Chapter 2

Relationship between urinary cadmium and occupation in smokers from the Multiethnic Cohort Study

Introduction

Although cigarette smoking is an established major risk factor for lung cancer, only 11-24% of smokers will develop the disease over their lifetime (5,51). As with any multifactorial disease, the examination of individual risk factors for lung cancer in smokers have been insufficient in accounting for the entire burden of the disease. Further, it is known that lung cancer risk differs by race/ethnicity, between sex and people of lower socioeconomic status (9,52,53). However, the mechanisms behind these inter-individual differences are not fully understood. While exposure to certain tobacco-specific lung carcinogens may be somewhat similar among smokers, the intake of certain lung carcinogens that are not specific to tobacco may vary drastically due to concurrent exposures from other sources such as occupation and environmental.

Cadmium (Cd), a constituent of cigarette smoke and an environmental and industrial pollutant, is an International Agency for Research on Cancer (IARC) Group 1 known human lung carcinogen (13,20). Although Cd has been recognized as an occupational health hazard for decades, and occupational standards and guidelines for permissible Cd exposure limits were put in place, workers in a wide variety of occupations are still potentially at risk for Cd exposure (16,28). The Occupational Safety and Health Administration (OSHA) estimates that more than 512,000 U.S. workers are potentially exposed to Cd and Cd compounds in the workplace (29). In agreement with OSHA’s estimates, an analysis using cross-sectional data from the National Health and Nutrition Examination Survey (NHANES III, 1988-1994) revealed that approximately 551,000 U.S. workers between the ages of 18 and 64 years had elevated urinary

11

Cd concentrations above limits set by occupational standards or guidelines (28,30). However, the extent to which occupational exposures to Cd may add to the risk in populations exposed to other major lung cancer risk factors, such as smoking, is not known. Therefore, it is plausible that occupational exposure to lung carcinogens also found in cigarette smoke, e.g. Cd, may contribute to the inter-individual variation in lung cancer risk observed among smokers. To our knowledge, no study has employed a biomarker-based approach to describe the relative contribution of occupational exposures to Cd body burden in a racial/ethnic diverse population of smokers.

In this study, we examined the relationship between levels of urinary Cd, a validated biomarker of long-term Cd exposure, self-reported longest held occupation category and racial/ethnic differences in smokers from the Multiethnic Cohort (MEC) Study. The results of this study will help better determine the contribution of these factors to lung cancer risk in this cohort of smokers and other cohorts.

Materials and Methods

Study Population

Details of the MEC Study have previously been published (12,54–56). Briefly, participants were recruited from the states of Hawaii and California between 1993 and 1996. The cohort consists of 215,251 men and women between the ages of 45 and 75 years at recruitment, primarily belonging to five ethnic/racial groups (African American, Japanese American, Latino, Native Hawaiian and White). Approximately 10 years after cohort entry, a subset of participants provided a blood sample and overnight urine sample (participants recruited in Hawaii) or first morning urine sample (participants recruited in California), completed an epidemiologic questionnaire that included history of daily cigarette smoking during the past 2 weeks, smoking duration, and a medication record. This specific study includes a subset of MEC participants who were lung cancer-free, current smokers at the time of biospecimen collection and

12

had their urine analyzed for the biomarker of internal smoking dose (TNE) as well as other metabolites of tobacco carcinogens (12,55–60).

The Institutional Review Boards at the University of Southern California, University of Hawaii and University of Minnesota approved the study protocol.

Analysis of cadmium in urine

Urine samples were prepared by diluting 50 µL of urine with 250 µL of 2% nitric acid (trace-metal grade). The measurement of Cd in prepared urine samples was carried out by inductively coupled plasma mass spectrometry (ICP- MS) (61) at the Wisconsin State Laboratory of Hygiene (WSLH) University of Wisconsin-Madison, which is certified for the analysis of Cd in biological and environmental samples. The average of the three replicate readings was used for data interpretation. The analysts were blinded to the origin of all urine samples. Quality control measures were incorporated at different stages of the analysis to monitor for analytical accuracy and ensure robust data: (i) multiple replicates of randomly selected urine samples were blindly inserted throughout the sample set; (ii) negative control (2% nitric acid method blanks) and positive control (urine with known concentrations of Cd) samples were prepared by the University of Minnesota (UMN) laboratory and added to the set; and (iii) State Laboratory of Hygiene instrument performance controls were included with each batch of samples. The method limit of quantification (LOQ) was 0.02 ng/mL Cd.

Analysis of creatinine and nicotine intake biomarkers

Urinary creatinine was quantified using a colorimetric microplate assay (CRE34-K01) purchased from Eagle Bioscience (Nashua, NH). Urinary biomarker of nicotine intake, total nicotine equivalents (TNE; sum of nicotine and its metabolites in urine (nicotine, cotinine, trans-3’-hydroxycotinine, nicotine N- oxide, and corresponding glucuronide conjugates), was quantified previously by liquid chromatography-tandem mass spectrometry (LC-MS/MS) (55).

13

Occupational exposure categories

Occupational exposure was captured through a self-reported questionnaire via two questions regarding longest occupational category worked and history of industry and occupations employed for more than 10 years (Figure 2).

[Figure 2]

Occupations and industries reported by the participants were reviewed and those that were reported in literature to be associated with the risk for Cd exposure were identified (13,34,36,62). The occupation titles posing a risk for Cd exposure from the questionnaire included: ‘laborer or farm worker, factory worker or machine operator, and craftsperson’. The industry titles posing a risk for Cd exposure from the questionnaire included: ‘metal production or processing, mining, quarrying, rock crushing, or cement manufacturing, plastic production or processing, gasoline refining or distribution, rubber or tire manufacturing, shipyard work, farming, furniture making or woodworking, automotive repair, pesticide production and paint production or use’.

Based on the participants combined response to two questions pertaining to the title of the industry and occupation where they worked the longest (Figure 2), participants were grouped into four occupational exposure categories (Table 2): ‘Likely exposed’, ‘Possibly exposed’, ‘Not likely exposed’ and ‘Unknown exposure’. For example, if both the industry and occupation reported by a participant was known to be a source of Cd exposure, the participant was categorized as ‘Likely exposed’. If only one of the reported titles (either industry or occupation) were a known source of Cd exposure, the participant was categorized as ‘Possibly exposed’. If neither industry or occupation was a known source of Cd exposure or was not reported by the participant (e.g. unknown industry and/or occupation), the participant was categorized as ‘Not Likely exposed’ or ‘Unknown exposure’, respectively. Specific categorization by title of

14

reported industry and occupation and the number of participants for each occupational exposure category is provided in Supplemental Tables S1-S4.

[Table 2]

Statistical Analysis

Descriptive statistics (e.g. frequencies, median, and interquartile range: 25th and 75th percentile) were used to describe participant characteristics across occupational exposure categories. Epidemiologic data from the questionnaire at cohort entry and questionnaire at urine collection were used to estimate years of smoking when missing at time of urine collection, as described previously (55,60). Among the 1,977 participants that had urinary Cd measured, 21 participants were excluded from the analyses due to missing values: 20 subjects were missing values for education and 1 subject had no information to compute smoking duration resulting in 1,956 participants with complete biomarker measurements and covariate data (63).

Urinary Cd and urinary creatinine were log-transformed using the natural logarithm to achieve approximately normal distribution. Therefore, log- transformed urinary Cd was used as the dependent variable. Univariate analyses were employed to identify factors associated with urinary Cd, including age at urine collection, sex (male and female), self-reported race/ethnicity (African American, Native Hawaiian, Japanese, Latino and White), body-mass index (BMI) at urine collection, smoking duration (years of smoking), average cigarettes per day, smoking dose (urinary TNE), creatinine (natural log), maximum education (≤8th grade, 9th-12th grade, vocational school/some college and ≥ graduated college), and medication use that may affect urine output at collection (drug class (yes/no): antidiabetic, antihypertension diuretic and specifically, hydrochlorothiazide/dyazide/lasix medication use). Variables that were significant (p<0.05) in the univariate analyses were included in the regression analysis.

A censored multiple linear regression model (tobit regression) (64) was used to estimate the percent adjusted change in the association between

15

occupational exposure categories and levels of urinary Cd, while adjusting for confounding variables as presented in the direct acyclic graph in Supplemental Figure S1 with estimated 95% confidence intervals (CI) to characterize precision. Urinary creatinine (natural log) was modeled as an independent variable (in addition to age, sex, race/ethnicity, education, smoking dose (TNE), and duration (average years of smoking), so that Cd concentration comparisons could be based on adjustment for urinary output, urine collection methods and demographic differences (65). Two multivariable linear models were used adjusting for the following predictors of urinary Cd: (1) age at time of urine collection, sex, race/ethnicity, creatinine (natural log) and education, and (2) further adjusting for smoking dose (TNE) and duration (average years of smoking). For ease of data interpretation, beta coefficients from the models were back transformed to the original scale to express the percent change of urinary Cd with respect to a one-unit increment of each independent variable (urinary Cd percent change= ((eβ) − 1)*100). In addition, estimated geometric means with 95% CIs are also presented. Kruskal-Wallis testing were used for comparison across categories where appropriate. For statistical analysis purposes, samples with non-detect Cd levels were assigned a value equal to ½ the instrument LOQ (0.02 ng/mL).

All data analysis was performed using Stata-IC statistical software (version 14; StataCorp LLC, College Station, TX, USA).

Results

Main characteristics and biomarkers of the study population stratified by occupational exposure categories are summarized in

Table 3. A total of 1,956 smokers between the ages of 46 to 87 years old at urine collection were included in the analysis. The overall median concentration of Cd in urine was 0.60 ng/mL and ranged from below the LOQ to 6.0 ng/mL for all exposure categories. Based on results from the univariate analysis examining the association between potential predictors and urinary Cd,

16

predictors of urinary Cd included sex, reported race/ethnicity, creatinine (natural log), maximum education, urinary biomarker of nicotine dose (TNE), and years of smoking. Age at urine collection did not reach statistical significance (p=0.875), however based on evidence from literature, age was included in the models as a predictor of urinary Cd (35,39,66). Furthermore, medication use at urine collect was not associated with urinary Cd levels in this cohort (antidiabetic use, antihypertension diuretic use, and hydrochlorothiazide/dyazide/lasix use) and therefore was not included further in the analysis. Overall, the percentage of participants with measurements of urinary Cd below the LOQ was 0.77% (

Table 3).

Median urinary Cd levels of occupational exposure categories are presented in

Table 3 and Supplemental Figure S2 (for urinary Cd in the natural log scale). Median levels differed significantly (p<0.001) and were highest among participants in the ‘Likely exposed’ occupational exposure category (0.78 ng/mL) and lowest among those in the ‘Not Likely exposed’ category (0.54 ng/mL). Notably, participants in the ‘Likely exposed’ and ‘Possibly exposed’ occupational exposure categories were predominately male (72% and 82%) and reported ≤12th grade as their highest education (67% and 56%, respectively). Whereas the ‘Not Likely exposed’ occupational exposure category was predominately female (60%) and reported an education level of vocational school and/or some college (39%). Participants in the ‘Likely exposed’ occupational exposure category had higher median years of smoking (44.5), reported higher average number of cigarettes per day (12) and had higher median levels of TNE (43.8 nmol/mL) than any other occupational exposure category.

In this cohort, significant differences were observed in the relative number of participants from different racial/ethnic groups within each occupational exposure category (

17

Table 3). Notably, participants in the ‘Likely exposed’ and ‘Possibly exposed’ occupational exposure categories had higher numbers of Latinos (40% and 33% of all participants in those occupational exposure categories, respectively). Whereas the ‘Not Likely exposed’ category had a higher number of Japanese Americans (35%) and Whites (26%) and the ‘Unknown exposure’ category had a higher number of Latinos (30%), and African American (19%) participants. Furthermore, within each race/ethnicity group the distribution of occupational exposure categories also varied (Figure 3). However, there was no evidence of an interaction between occupational exposure category and race/ethnicity (p=0.934) on urinary Cd levels in any of the analyses.

[

Table 3]

[Figure 3]

When compared to the ‘Not Likely exposed’ occupational exposure category, the unadjusted geometric mean urinary Cd levels were 34.6% higher (95% CI: 18.1%, 53.5%) in the ‘Likely exposed’ occupational exposure category, 16.8% higher (95% CI: 6.4%, 28.2%) in the ‘Possibly exposed’ occupational exposure category and 4.6% higher (95% CI: -4.5%, 14.6%) in the ‘Unknown exposure’ category. After adjustment for predictors of urinary Cd (Model 1; Table 4): reported race/ethnicity, sex, age, creatinine (natural log) and maximum education, participants grouped in the ‘Likely exposed’ occupational exposure category had 18.6% (95% CI: 6.5%, 32.0%) higher geometric mean urinary Cd as compared to participants grouped in the ‘Not Likely exposed’ category. The same trend can be observed with participants in the ‘Possibly exposed’ occupational exposure category with 8.7% (95% CI: 0.60%, 17.4%) higher geometric mean urinary Cd levels as compared to the ‘Not Likely exposed’ category. When further adjusted for smoking variables, TNE and years of smoking (Model 2; Table 4), similar results are seen for all occupational exposure categories with ‘Likely exposed’ having 12.1% (95% CI: 1.2%, 24.3%)

18

higher and ‘Possibly exposed’ having 7.7% (95% CI: 0.1%, 16.0%) higher levels of urinary Cd as compared to the ‘Not Likely exposed’ occupational exposure category. Smoking-related variables thus accounted for 34.7% and 10.9%, respectively of the observed urinary Cd concentrations in the ‘Likely exposed’ and ‘Possibly exposed’ occupational exposure categories comparing the fully adjusted model with (Model 2) and without smoking variables (Model 1; Table 4).

[Table 4]

Due to the nature of this multiethnic population and known differences in lung cancer risk, we also investigated urinary Cd differences by race/ethnicity. After adjustment for all predictors of urinary Cd: age, sex, creatinine (natural log), maximum education, TNE, average years of smoking and occupational exposure category, ethnic-specific analysis showed significantly higher geometric mean urinary Cd levels in Latinos (geometric mean: 0.86 ng/mL; 95% CI: 0.81, 0.91), Native Hawaiians (geometric mean 0.84 ng/mL; 95% CI: 0.78, 0.89), and Japanese Americans (geometric mean 0.81 ng/mL; 95% CI: 0.77, 0.86) as compared to Whites (geometric mean 0.74 ng/mL; 95% CI: 0.70, 0.79). The somewhat higher levels of urinary Cd in African Americans (geometric mean 0.81 ng/mL; 95% CI: 0.75, 0.86) as compared to Whites were not statistically significantly different. These results are illustrated graphically in Error! Reference source not found.

[Figure 4]

Discussion While Cd has long been recognized as carcinogenic to humans (IARC Group 1), occupational Cd exposure still remains a public health concern. In this sample of current smokers at the time of urine collection from the MEC Study, we observed that participants in broad occupational categories with potential for Cd exposure have demonstratively higher levels of urinary Cd after accounting for a number of covariates including measures of smoking, internal smoking dose (as measured by urinary TNE) and duration (years of smoking). This suggests that

19

occupational exposure to this known lung carcinogen is an important contributor to urinary Cd levels in smokers and should be considered when evaluating lung cancer risk among smokers in this cohort. This study also provided evidence that there are differences in urinary Cd levels by race/ethnicity, with Latinos having the highest geometric mean urinary Cd levels after adjustment for age, sex, creatinine (natural log), education, occupational exposure categories, and smoking measures (TNE and average years of smoking) which may indicate different levels of risk across these categories.

Urinary Cd levels in this study are within the observed range of U.S. adult cigarette smokers who are 50+ years of age reported recently in the 2015-2016 NHANES data (27). However, in comparison to those reported for cigarette smokers in earlier studies, the overall geometric mean urinary Cd level in our study is almost 2-fold higher. For example, reports using data from the 1999- 2010 NHANES study, reported a range of geometric mean values between 0.32- 0.42 ng/mL in smokers whereas the unadjusted overall geometric mean in our study was 0.60 ng/mL (95% CI: 0.58, 0.62) (30,36,67). These differences are likely attributable to our study population being older, with an overall median age of 64 years (range: 46-87), whereas other studies incorporate a wider range of ages (e.g. 18 to 64). Due to urinary Cd reflecting long-term Cd exposure, literature has consistently shown that urinary Cd levels increase with age, and therefore contribute to the overall higher levels observed in this study (24,30,31,36).

This study advances prior research on occupational exposures to Cd by using a biomarker-based approach to accurately account for the contribution of smoking to urinary Cd levels. In our analysis, we use the urinary biomarker TNE, which accounts for more than 90% of the daily nicotine intake in cigarette smokers (55,68,69). Therefore, TNE is used as an objective measure of cigarette smoke exposure as opposed to self-reported behavior used in most other studies (e.g. average cigarettes per day). The majority of reported studies interested in occupational exposure to Cd investigated small populations exposed to relatively

20

high levels of environmental or industrial Cd (13,62). Specifically, IARC used data to initially classify Cd as a human lung carcinogen (Group 1) from studies such as those that focused on workers from a cadmium recovery plant in the United States (20). Cigarette smoking is an important potential confounding factor in the evaluation of the carcinogenicity of Cd in the workplace. Workers in occupations that pose a risk for Cd exposure, who also smoke cigarettes, may be at an increased risk for combined Cd exposure thus adding to their risk of developing lung cancer. However, these studies could not fully account for the confounding effects of smoking due to the difficulty in collecting accurate smoking data for each participant as they either did not collect the information or typically used a self-reported measure of smoking status and/or smoking history which lends rise to recall bias.

Since the initial release of the IARC Monograph in 1993, few epidemiological studies of adverse health effects associated with occupational and environmental exposure to Cd in the United States have been conducted (13). As previously stated, those that have focus on a highly exposed population (e.g. from environmental or industrial exposure), some lack a level of detail to account for contributions from smoking and very few use a measure of urinary Cd to investigate adverse health outcomes. While human exposure to Cd can be assessed using either blood or urine, it accumulates in the kidneys with a much longer half-life (up to 38 years) and as such urinary Cd, is a better measure of long-term exposure to Cd (14,22,35). In contrast, blood Cd is recognized as a biomarker of recent exposure to Cd as it has a much shorter half-life (3-4 months) (35). The most recent studies to investigate occupational exposure and the association with urinary Cd in the United States analyzed NHANES data, however information was limited due to NHANES not capturing occupational variables after the 2004 survey year (30,36,39). Our findings are consistent with these previous reported results that higher levels of urinary Cd are associated with certain types of occupations (13,30,40). For example, in our study, smokers who were categorized in the ‘Likely exposed’ occupational exposure category indicated that they were laborers, factory workers, craftsperson or small business

21

owners in the automotive repair, metal production and processing, mining, quarrying, rock crushing or cement manufacturing industry to list a few (Supplemental Table S1). Reports analyzing NHANES data report higher urinary Cd levels in participants who worked in vehicle mechanics, transportation workers, repair service industries, paint and metal industry (30,36). Our findings are in agreement and suggest that smokers working in occupations and industries that are potential sources of Cd exposure may be at higher risk for lung cancer than smokers employed in occupations that are not likely to be sources and should be targeted for public health interventions.

Looking at patterns across occupational exposure categories, in our study participants in the ‘Possibly exposed’ and ‘Likely exposed’ occupational exposure categories smoked more, as seen from urinary TNE measurements, for a longer duration of time and reported lower levels of education (

Table 3). Specifically, participants in the ‘Likely exposed’ and ‘Possibly exposed’ occupational exposure categories reported their highest level of education to ≤12th grade (67% and 56%, respectively). This agrees with many published reports that smoking is correlated with lower levels of educational attainment in turn giving rise to higher lung cancer incidence than those with a college education (10,70). However, despite all of this and after adjusting for education, urinary TNE and average years of smoking in the multivariable regression analysis, urinary Cd levels in smokers still varied significantly among the occupational exposure categories (Table 4).

The findings of this study must be interpreted in light of some limitations. First, we did not directly measure individual exposure to Cd in each occupational exposure category which may lead to misclassification of exposure. Occupational exposure categories that have the potential for Cd exposure were based on self- reported questionnaire data and there is no definitive measure of the length of work per profession, therefore not all occupational exposure details were captured in the analyses which could lead to misclassification. However, published data supports our categorization that certain occupations pose a risk

22

for Cd exposure, which was corroborated in this study (13,30,36). Second, our study includes a one-time measure of urinary Cd and urine collection methods differed. This sample may not be representative of the participants’ continuing exposure to Cd. To account for differing collection methods, we used adjustment for creatinine and our urinary Cd results are consistent with 2015-2016 NHANES urinary Cd levels for smokers 50+ years of age as stated above.

The main strength of our study is the use of a well-characterized cohort of smokers from the MEC that have extensive epidemiologic data, and measures of tobacco constituent biomarkers. Specifically, this study used the urinary indicator of internal smoking dose (TNE) to account for recent exposure to cigarette smoke, in addition to years of smoking (55,68,71). TNE allowed us to accurately assess individual contributions from cigarette smoke and identify inter-individual differences in urinary Cd that are independent of the amount of smoking. Other control methods such as using self-report cigarettes per day are prone to measurement error as it is not an accurate reflection of individual differences in smoking behavior including the number of puffs taken per cigarette or the amount inhaled. TNE accounts not only for differences in smoking intensity but also accurate reporting of exposure to cigarettes smoked. Furthermore, this study expands upon previously published studies on this cohort that observed racial/ethnic differences in lung cancer risk due to smoking by identifying other sources of exposure to cancer-causing chemicals (9,12,55–60). Indeed, there is extensive evidence on the association between smoking and Cd and the association between occupational exposures and Cd, however to our knowledge no study has investigated the inter-individual variation of urinary Cd levels in relation to multiple factors known to contribute or modify urinary Cd levels (e.g. cigarette smoking, occupational exposures, and individual characteristics) among a population of smokers. Further, even with an imperfect measure of occupational Cd exposure we were able to detect a relationship between occupational categories with potential for Cd exposure and the urinary biomarker of Cd above contributions from smoking.

23

In conclusion, the findings of this study indicate that broad occupational categories with potential for Cd exposure have measurable contributions to levels of urinary Cd and that this association can be detected in current smokers at the time of urine collection, independent of smoking measures. These findings lend support to the hypothesis that co-exposure from occupational exposures and smoking can be detected in urinary Cd and this may contribute to the differences in smoking-related lung cancer risk. The findings also implicate the importance of accounting for measures of occupational exposure to Cd when evaluating risk differences. If occupational exposures are not accounted for, critical information may be missed in regard to the variability in exposure to such carcinogens and identifying high-risk groups. Occupational exposures, along with smoking, are a modifiable source of Cd exposure. Additional exposure prevention efforts are needed to reduce and prevent Cd exposure in the workplace with priorities geared toward people within the “Likely exposed” and “Possibly exposed” occupational categories. Furthermore, our study provided evidence that there are differences in urinary Cd levels by race/ethnicity. These differences persisted after adjustment for occupational exposure groups, smoking measures (TNE and average years of smoking), age at urine collection, sex, creatinine (natural log) and education. Future research is needed to evaluate the association between urinary Cd exposure and lung cancer risk in smokers, accounting for occupational sources of Cd exposure.

24

Figure 2. MEC Study Initial Questionnaire (1993-1996; page 22). (72) Work history questions from the questionnaire administered to participants at cohort entry

25

Table 2. Categorization of participants based on their combined response to two questions regarding longest history of industry and occupation worked (Figure 1)a

Industry Indicator Occupational Indicator Exposure Category Yes Yes Likely exposed Yes No Possibly exposed Yes Unknown Possibly exposed No Yes Possibly exposed No No Not likely exposed No Unknown Not likely exposed Unknown Yes Possibly exposed Unknown No Not likely exposed Unknown Unknown Unknown exposure aexception of ‘small business owners’: If a small business owner worked in an industry that is known to be exposed to Cd, participants were grouped in the ‘Likely exposed’.

26

Table 3. Main characteristics and biomarkers of study population by four occupational exposure categories (n=1,956) a

Not likely Possibly Likely Unknown exposed exposed exposed exposure Variable N=1,106 N=343 N=149 N=358 Sex, N(%) Females 660 (60) 97 (28) 26 (18) 248 (69) Males 446 (40) 246 (72) 123 (82) 110 (31) Race/ethnicity, N(%) African American 147 (13) 45 (13) 20 (13) 68 (19) Native Hawaiian 155 (14) 58 (17) 21 (14) 60 (17) Japanese American 382 (35) 96 (28) 39 (26) 67 (19) Latinos 130 (12) 113 (33) 60 (40) 108 (30) Whites 292 (26) 31 (9) 9 (6) 55 (15) Maximum Education, N(%) ≤ 12th grade 266 (24) 192 (56) 100 (67) 227 (63) Vocational school/Some college 430 (39) 117 (34) 44 (30) 90 (25) ≥ Graduated college 410 (37) 34 (10) 5 (3) 41 (11) Median (25th and 75th percentile) Age at urine collection, years 62.9 (58.9, 68.8) 63.8 (59.6, 69.7) 64.3 (59.9, 69.6) 65.3 (60.2, 70.8) Average years of smoking 41.5 (34.5, 46.5) 43.5 (34.5, 47.5) 44.5 (35.5, 48.0) 43.5 (34.5, 47.0) Average cigarettes per day 11 (7, 20) 11 (7, 20) 12 (8, 20) 10 (5, 20) Urinary TNE, nmol/mL 31.0 (19.4, 50.4) 35.2 (20.5, 53.6) 43.8 (23.9, 67.8) 33.1 (17.6, 52.3) Urinary creatinine, mg/dL 71.2 (42.5, 119.1) 88.0 (53.7, 143.7) 108.9 (62.7, 157.6) 74.5 (42.0, 128.5) Urinary Cd, ng/mL 0.54 (0.36, 0.96) 0.72 (0.42, 1.08) 0.78 (0.42, 1.38) 0.60 (0.36, 0.96) Urinary Cd below LOD, N(%) 10 (0.9) 1 (0.3) 1 (0.7) 3 (0.8) aValues are presented as number and percentage of participants in each exposure category, n(%) or median (25th, and 75th percentile).

27

Figure 3. Proportion of participants assigned to occupational exposure categories overall and within each race/ethnicity

0.80

0.70

0.60

0.50

0.40

0.30

0.20

Proportion Proportion ofparticipants 0.10

0.00 All Whites African Japanese Native Latinos Americans Americans Hawaiians

Not likely Possibly Likely Unknown

28

Table 4. Multivariable adjusted analysis of the association between urinary cadmium and occupational exposure categories compared to the reference category (‘Not likely exposed’)

Percent adjusted change in urinary cadmium,

%∆ (95 % CI) Model 1a Model 2b Not likely exposed (N=1,106) reference reference Possibly exposed (N=343) 8.7% (0.6%, 17.4%) 7.7% (0.11%, 16.0%) Likely exposed (N=149) 18.6% (6.5%, 32.0%) 12.1% (1.2%, 24.3%) Unknown exposure (N=358) -0.5% (-7.6%, 7.1%) -0.9% (-7.6%, 6.4%) aadjusted for reported race/ethnicity, sex, age at urine collection, creatinine (natural log), and maximum education bfurther adjusted for urinary TNE and average years of smoking

29

Figure 4. Racial/ethnic-specific analysis of the association between urinary cadmium and race, adjusted for sex, age at urine collection, creatinine (natural log), maximum education, urinary TNE, average years of smoking and occupational exposure category

0.95

0.90

0.85

0.80

0.75

Geometric mean (95% CI) 0.70 urinary urinary cadmium levels, ng/mL

0.65 All Whites African Japanese Native Latinos (n=1,956) (n=387) American Americans Hawaiians (n=411) (n=280) (n=584) (n=294)

30

Table 5. (Supplemental Table S1) Number of participants categorized in the ‘Likely exposed’ to cadmium in the workplace (N=151)

31

Table 6. (Supplemental Table S2) Number of participants categorized in the ‘Possibly exposed’ to cadmium in the workplace (N=347)

32

Table 7. (Supplemental Table S3) Number of participants categorized in the ‘Not likely exposed’ to cadmium in the workplace (N=1,116)

33

Table 8. (Supplemental Table S4) Number of participants categorized in the ‘Unknown exposure’ to cadmium in the workplace (N=363)

34

Figure 5. (Supplemental Figure S1) Direct acyclic graph for the association between urinary cadmium and self- report occupation

35

Figure 6. (Supplemental Figure S2) Median levels of urinary cadmium (natural log of ng/mL) by occupational exposure category. The box represents the interquartile range (25th and 75th percentile), the dark line across the box represents the median value (50th percentile), the bottom and top whisker represents the first and 99th percentile and the circles above and below the whiskers represent outliers (>1.5x and <3x the interquartile range)

36

Chapter 3

Genome-wide and candidate gene association study of urinary cadmium levels in smokers

Introduction

Lung cancer is the leading cause of cancer death in the United States, with an estimated 142,670 deaths expected in 2019 (1). While cigarette smoking is an established risk factor for lung cancer, causing up to 90% of lung cancer- related deaths in the United States, not all smokers will develop the disease over their lifetime (4,5). It is estimated that 24% of male smokers and 11% of female smokers will develop lung cancer by the age of 85 (5). In an effort to identify what makes some smokers more susceptible to this disease than others, investigators in the Multiethnic Cohort (MEC) Study have demonstrated at similar levels of smoking, risk varies drastically between race/ethnicity, with African Americans and Native Hawaiian smokers having the highest risk followed by White, Japanese American and Latino smokers (9,73). This suggests a potential impact of demographics and genetics on tobacco carcinogen intake, metabolism, and DNA damage and its repair. One such tobacco-related carcinogen that may be important in the development of lung cancer is cadmium (Cd).

Cadmium, a heavy metal found in cigarette smoke, is a well-established carcinogen and as a result was classified by the International Agency for Research on Cancer (IARC) as a Group 1 known human lung carcinogen (13). However, factors that influence Cd absorption, distribution, storage, and elimination are not well understood. Several recent studies illustrate the potential role that genetics may play in Cd toxicity suggesting a number of cellular mechanisms are activated in response to Cd exposure (41–43,46–48,74). Specifically, the induction of metallothionein (MT) and glutathione (GSH) related genes. These genes participate in a number of cellular processes including regulating proteins that protect cells against non-essential and excessive

37

essential metals and oxidative damage. These mechanisms bind and sequester metal ions resulting in the decreased availability of these metals which could lead to toxic interactions in the body (75). Furthermore, studies have shown that polymorphisms in the MT2A gene are associated with differences in Cd concentrations in blood and the human renal cortex, with individuals that have low expression of MT having higher Cd levels. Other studies have found that genes within the glutathione S-transferase (GST) gene family, a superfamily of Phase 2 detoxification enzymes that catalyzes intra-cellular binding of GSH- metal conjugates, are associated with Cd toxicity (42,43). Evidence such as racial/ethnic differences in the prevalence of GSTM1, a gene within the GST family, suggests an increased risk for disease in African Americans if they lack or have reduced GSTM1 and are exposed to DNA-damaging agents, such as cigarette smoke (43). Thus, it is plausible that genetic variations could potentially alter Cd body burden, including urinary Cd levels, and the susceptibility to Cd toxicity in smokers. To date, only two genome wide association studies (GWAS) have examined the association between common genetic variants and levels of Cd; using blood Cd measurements researchers revealed several associations with SNPs within genes that have suggestive functions related to the absorption and metabolism of Cd (49,50). For example, one study identified an association between blood Cd and two SNPs within the LACTB2 gene, a protein encoding gene suggested to be involved in the transport of metal ions (50). However, to our knowledge there is no GWAS investigating the association between common genetic variants with urinary Cd, a validated biomarker of long-term Cd exposure.

In this study, levels of urinary Cd were quantified in smokers from the MEC Study. The main aim of this study was to identify common genetic variants that are associated with urinary Cd using a GWAS approach. In addition, a secondary aim was to evaluate the association between urinary Cd and single nucleotide polymorphisms (SNPs) previously reported in literature to be associated with Cd and located within genes that have biological plausibility to be associated with Cd absorption, distribution, metabolism and elimination.

38

Methods

Statistical analysis methods

For this analysis, 1,977 participants were retained. Demographics and biomarker data were summarized across race/ethnicity groups by frequencies, median and interquartile range (25th and 75th percentile) or geometric means (GM) and 95% confidence intervals (CI). Urinary Cd and urinary creatinine were log-transformed using the natural logarithm to achieve approximately normal distribution. Therefore, log-transformed urinary Cd was used as the dependent variable. Kruskal-Wallis testing was used for comparison across categories where appropriate. Wilcoxon Mann-Whitney was used to compare biomarkers in each race/ethnicity group to Whites. Two multivariable linear models were used for analysis: (1) age at time of urine collection, sex, race/ethnicity, creatinine (natural log), and (2) further adjusting for smoking dose (urinary TNE) and duration (average years of smoking). Additionally, effect modification by race/ethnicity and sex was tested using multiplicative interaction terms, because urinary Cd levels have previously been shown to differ significantly across these factors (24,25,31,35,36,40). For statistical analysis purposes, samples with non- detect Cd levels were assigned a value equal to ½ the instrument limit of quantification (LOQ; 0.02 ng/mL).

Genotyping, quality control and imputation

Details of genotyping, quality control and imputation have previously been published (58). In brief, a total of 2,418 current smokers’ blood lymphocyte DNA was genotyped using the Illumina Human1M-Duo BeadChip (1,199,187 SNPs). The genotyping quality control consisted of removing individual samples with ≥2% of genotypes not called, removing SNPs ≤98% call rate, removing known duplicate samples and excluding samples with close relatives based on estimated identical by descent (IBD) status, and samples with conflicting or inter- determinate sex. To extend the genotype analysis, variants were imputed using SHAPEIT (76) and IMPUTE2 (77) with reference files from the 1000 Genomes Project (78) Phase I integrated variant set (March 2012). This extended the

39

analysis to 11,892,802 genome wide variants (1,131,426 genotyped and 10,761,376 imputed). Poorly imputed SNPs, defined as those with an IMPUTE2 score <0.30 and minor allele frequency (MAF) ≤1% in any MEC racial/ethnic group, were excluded. As a result, a total of 5,944,091 SNPs we used for the analysis.

The first 10 principal components (PC) were previously estimated using a random sample of 19,059 autosomal SNPs with frequency ≥2% over the five racial/ethnic groups to capture population substructure (58). These 10 PCs are included as covariates in the model to correct for any heterogeneity we may find in the data due to admixture.

Genome-wide approach

A total of 1,977 current smokers with complete genotype and phenotype data were included in the analyses. Analyses were adjusted for potential confounders as presented in the conceptual model in Supplemental Figure S3. To test the association of each genetic variants with urinary Cd levels, the GWAS analyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), smoking dose (urinary TNE), and the top 10 leading PCs. Allele dosage was used as the explanatory variable of most interest in the analysis. Specifically, we tested whether carriers of each genotyped SNP or HapMap predicted SNP appear to have a different level of urinary Cd than non- carriers (0 copies). Log transformed urinary Cd was treated as a linear variable, therefore the association analysis gives a beta coefficient and 95% CI for the change in log transformed urinary Cd level per each risk allele. Genome-wide significant associations were based on Bonferroni-corrected 5% significant threshold of p < 8.4 x 10-9 (0.05/5,944,091 SNPs) to account for the Type 1 error rate. Genomic inflation factor (lambda, λ) was used to assess adequacy of model and SNP effects in the GWAS data analyses.

40

Candidate SNP approach

Fifteen candidate SNPs were selected a priori based on evidence from previous studies showing an association between specific SNPs and Cd biomarkers in humans (Table 9). In addition, a total of 29 candidate genes were identified a priori through the review of literature. These genes have suggestive biological plausibility to be involved in the absorption, distribution, storage and elimination of heavy metals including Cd: Metallothionein (MT) gene family, metal-regulatory factor 1 (MTF1), and Glutathione S-transferase (GST) gene family (

41

Table 10 List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference

rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42) Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

42

Table 10). NCBI Entrez Gene database (https://www.ncbi.nlm.nih.gov/gene) was used to identify genomic coordinates for each gene based on the GRCh37 genome build (79). SNPs within five kb downstream and upstream of each candidate gene were selected for analysis, removing SNPs which overlapped by genomic coordinates. A final list of 1,169 total SNPs comprising the MT genes (439 SNPs), MTF-1 gene (61 SNP), GST genes (662 SNPs), and 7 additional SNPs (8 SNPs were within the genomic coordinates of candidate genes) from literature were stratified from our GWAS results. For each candidate SNP analysis, a Bonferroni-corrected 5% significant threshold of p < 3.98 x 10-5 was used.

PLINK software was used for the association analysis (80). Manhattan plots and QQ plots were drawn using R software version 3.1.2 (81).

[Table 99]

[Table 9. List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42) Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

43

Table 1010 List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference

rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42) Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

44

Table 10]

Results

Characteristics of study population

A total of 1,977 smokers (285 African American, 296 Native Hawaiian, 588 Japanese American, 418 Latino and 390 White) were included in the analysis (Table 11). Overall, there were significant differences in smoking between the race/ethnicity groups with Whites having the highest median number of cigarettes per day (20 CPD), and Latinos having the lowest median (7.5 CPD). In contrast, median levels of urinary TNE were highest among African Americans (44.6 nmol/mL) and lowest among Japanese Americans (27.3 nmol/mL). When categorized by sex and race/ethnicity, these trends remained with the exception of African American males having the lowest CPD along with Latino males (10 CPD).

[Table 11]

Median urinary levels of Cd were significantly different between the five race/ethnicity groups (p<0.001; Table 12). African American smokers had the highest median urinary Cd level (0.84 ng/mL), followed by Latino (0.72 ng/mL), Native Hawaiian (0.57 ng/mL), Japanese American (0.54 ng/mL) and White smokers (0.48 ng/mL). Sex-specific differences in urinary Cd were also significantly different (p<0.001 for both sexes across race/ethnicity). These trends remained with the exception of Whites; for males, Native Hawaiian and White smokers (0.60 ng/mL for both) had higher levels of urinary Cd than Japanese (0.54 ng/mL). For females, White and Japanese smokers had the lowest levels (0.48 ng/mL for both).

[Table 12]

45

Geometric means of urinary Cd levels by race/ethnicity and sex are given in Table 13. In Model 1, they have been adjusted for age at urine collection, sex and creatinine (natural log). The highest geometric mean levels were in Latinos (0.84 ng/mL; 95% CI: (0.79, 0.89) and Native Hawaiians (0.820 ng/mL; 95% CI: 0.76, 0.88); the lowest were in Whites (0.75 ng/mL; 95%CI: 0.71, 0.80). When Model 1 is further adjusted for total nicotine equivalents (Model 2), similar results were seen across race/ethnicity groups. The results of Model 2 are illustrated graphically in Supplemental Figure S4. Sex-specific geometric means of urinary Cd levels by race/ethnicity are similar to the overall geometric means. Notably, females have higher geometric mean urinary Cd levels as compared to males across all race/ethnicity groups. The results of Model 2 by sex are illustrated graphically in Supplemental Figure S5.

Genome-wide association study

Quantile-quantile plots of the –log10 p-values for the GWAS results for urinary Cd (Figure 7) showed no evidence for systematic bias (lambda (λ) = 1.00). No SNP showed a globally significant association with urinary Cd using our Bonferroni-corrected 5% significant threshold of p < 8.4 x 10-9 (Figure 8). The Manhattan plot shows SNPs in high linkage disequilibrium (LD), specifically in 2, 9, and 11, however they did not reach genome wide significance. The SNP with the lowest p-value for the association with urinary Cd was rs673456 on chromosome 11, an intron variant close to the TENM4 gene (p=3.47 x 10-7), followed by rs187840540 (p=4.62 x10-7) an intergenic variant on and rs144158780 (p=1.10 x 10-6) an intron variant on chromosome 19 close to the DNM2 gene. A list of SNPs at p < 5 x 10-5 from the GWAS results are provided in Table 14. Additional GWAS analyses included removing creatinine from the model (Supplemental Figure S6) and adding in all predictors of urinary Cd found in our previous study (82) (Supplemental Figure S7); the results varied only marginally from the results presented in Figure 7 and Figure 8; Table 14.

46

[Figure 7]

[Figure 8]

[Table 14]

Candidate SNP association study

The results for the fifteen SNPs identified from previous studies showing an association between a specific SNP and a Cd biomarker are presented in

47

. Additional analyses on the 1,169 total candidate SNPs that were identified in genes a priori based on their hypothesized relationship with biological mechanisms of Cd exposure were conducted (Table 9. List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference

rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42) Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

48

Table 10 and Table 9) (42,49,83–86). The results of the top 100 associations within the SNPs identified in the candidate genes are presented in Table 15. Candidate SNP analysis results for the 15 specific SNPs that were found to be associated with cadmium in previous studiesa

Minor Allele Frequency IMPUTE SNP chr position A1 A2 FRQ PLINK R2 BETA SE P AFR AMR ASN EUR INFO rs4653329 1 38285893 G T 0.224 1.040 0.012 0.021 0.573 0.114 0.262 0.229 0.294 1.039 rs10014145 4 103200577 A G 0.222 1.060 0.016 0.022 0.471 0.337 0.199 0.108 0.325 1.054 rs233804 4 103212916 C A 0.204 1.032 0.039 0.023 0.088 0.211 0.207 0.080 0.302 1.029 rs9350504 6 74457830 T C 0.206 1.107 0.038 0.024 0.111 0.130 0.130 0.371 0.012 1.115 rs17574271 8 3970124 T C 0.030 1.020 -0.005 0.052 0.929 0.004 0.022 0.002 0.066 1.011 rs4872479 8 22233636 G T 0.103 0.984 -0.006 0.031 0.857 0.289 0.072 0.082 0.053 0.975 rs870215 8 22258137 G A 0.150 1.096 -0.005 0.026 0.846 0.496 0.130 0.170 0.049 1.107 rs12681420 8 71573198 A G 0.243 1.030 -0.046 0.021 0.030 0.173 0.268 0.084 0.424 1.060 rs1695 11 67352689 A G 0.299 1.097 -0.017 0.020 0.381 0.443 0.425 0.166 0.319 1.099 rs8044719 16 56678865 T G 0.843 1.045 0.018 0.025 0.480 0.411 0.144 0.115 0.143 1.056 rs11076161 16 56673148 A G 0.682 1.004 0.018 0.020 0.359 0.478 0.282 0.302 0.276 1.015 rs1599823 16 56675817 T C 0.582 1.014 0.014 0.019 0.456 0.344 0.373 0.484 0.354 1.020 rs28366003 16 56642491 A G 0.068 0.933 0.063 0.037 0.089 0.014 0.091 0.119 0.044 0.935 rs10636 16 56643343 G C 0.228 0.880 0.000 0.023 0.998 0.274 0.260 0.248 0.211 0.899 rs4784706 16 56697581 A G 0.857 0.999 0.032 0.027 0.238 0.279 0.144 0.086 0.162 1.011 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components.

*NOTE: Abbreviations - SNP: single nucleotide polymorphism SE: standard error of effect estimate chr: chromosome P: P-value for the association test position: genomic position on chromosome or position AFR: minor allele frequency for African samples in 1000 Genomes A1: reference allele AMR: minor allele frequency for Ad Mixed American samples in 1000 A2: alternate allele, used for association testing Genomes FRQ: frequency of A1 ASN: minor allele frequency for Asian samples in 1000 Genomes

49

PLINK R2: Plink R2 quality metric of model EUR: minor allele frequency for European samples in 1000 Genomes BETA: effect estimate per alternate allele carried IMPUTE Info: Imputation quality score from IMPUTE2

Table 16. All analyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), smoking dose (urinary TNE) and the top 10 leading principal components. None of the candidate SNPs were associated with urinary Cd in our sample of smokers at a significance level of p < 4.28 x 10-5.

[

50

]

[Table 15. Candidate SNP analysis results for the 15 specific SNPs that were found to be associated with cadmium in previous studiesa

Minor Allele Frequency IMPUTE SNP chr position A1 A2 FRQ PLINK R2 BETA SE P AFR AMR ASN EUR INFO rs4653329 1 38285893 G T 0.224 1.040 0.012 0.021 0.573 0.114 0.262 0.229 0.294 1.039 rs10014145 4 103200577 A G 0.222 1.060 0.016 0.022 0.471 0.337 0.199 0.108 0.325 1.054 rs233804 4 103212916 C A 0.204 1.032 0.039 0.023 0.088 0.211 0.207 0.080 0.302 1.029 rs9350504 6 74457830 T C 0.206 1.107 0.038 0.024 0.111 0.130 0.130 0.371 0.012 1.115 rs17574271 8 3970124 T C 0.030 1.020 -0.005 0.052 0.929 0.004 0.022 0.002 0.066 1.011 rs4872479 8 22233636 G T 0.103 0.984 -0.006 0.031 0.857 0.289 0.072 0.082 0.053 0.975 rs870215 8 22258137 G A 0.150 1.096 -0.005 0.026 0.846 0.496 0.130 0.170 0.049 1.107 rs12681420 8 71573198 A G 0.243 1.030 -0.046 0.021 0.030 0.173 0.268 0.084 0.424 1.060 rs1695 11 67352689 A G 0.299 1.097 -0.017 0.020 0.381 0.443 0.425 0.166 0.319 1.099 rs8044719 16 56678865 T G 0.843 1.045 0.018 0.025 0.480 0.411 0.144 0.115 0.143 1.056 rs11076161 16 56673148 A G 0.682 1.004 0.018 0.020 0.359 0.478 0.282 0.302 0.276 1.015 rs1599823 16 56675817 T C 0.582 1.014 0.014 0.019 0.456 0.344 0.373 0.484 0.354 1.020 rs28366003 16 56642491 A G 0.068 0.933 0.063 0.037 0.089 0.014 0.091 0.119 0.044 0.935 rs10636 16 56643343 G C 0.228 0.880 0.000 0.023 0.998 0.274 0.260 0.248 0.211 0.899 rs4784706 16 56697581 A G 0.857 0.999 0.032 0.027 0.238 0.279 0.144 0.086 0.162 1.011 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components.

*NOTE: Abbreviations - SNP: single nucleotide polymorphism SE: standard error of effect estimate chr: chromosome P: P-value for the association test position: genomic position on chromosome or AFR: minor allele frequency for African base pair position samples in 1000 Genomes A1: reference allele AMR: minor allele frequency for Ad Mixed A2: alternate allele, used for association testing American samples in 1000 Genomes FRQ: frequency of A1 ASN: minor allele frequency for Asian samples PLINK R2: Plink R2 quality metric of model in 1000 Genomes BETA: effect estimate per alternate allele EUR: minor allele frequency for European carried samples in 1000 Genomes IMPUTE Info: Imputation quality score from IMPUTE2 Table 16]

Discussion

Previous studies on these smokers have demonstrated that the differences in risk for lung cancer varied significantly by race/ethnicity, with African Americans having the highest risk and Japanese Americans having the lowest risk even at similar levels of smoking (9). These findings suggest

51

differences could be due to demographic and genetic differences involved in the intake, metabolism and DNA damage and its repair of tobacco-related constituents. Studies including biomarker measurements of tobacco-related constituents corroborate these findings however, only partially explain these differences (55–59,87). Therefore, our study sought to add to this body of literature and investigate demographic and genetic differences in the levels of urinary Cd in current smokers. In this sample of smokers from the MEC Study, we observed significant differences in urinary Cd levels by race/ethnicity, with Latino smokers having the highest geometric mean levels followed by Native Hawaiian and African American smokers after adjustment for age at urine collection, sex, creatinine (natural log) and TNE (Table 13). While the high levels of urinary Cd in Native Hawaiians may be reflective of their lung cancer risk, the relatively high levels of urinary Cd in Latinos and intermediate levels in Japanese Americans were not consistent with their risk found in previous studies. This further indicates the potential that genetics may play in the metabolism and toxicity of Cd. Differences in urinary Cd levels among racial/ethnic groups also varied by sex. Levels of urinary Cd in these participants were higher among female smokers than in males, consistent with previous reported results and is suggested to be contributed to females generally having low iron stores and therefore a higher affinity for Cd (24,25,31,35,88).

Our GWAS analysis did not reveal any globally significant SNPs associated with urinary Cd. Results demonstrated SNPs in high linkage disequilibrium (LD), specifically in chromosomes 2, 9, and 11, however none reached genome-wide significance (Figure 8 and Table 14). The SNP with the most statistically significant p-value for the association with urinary Cd was rs673456 on chromosome 11, an intron variant within the TENM4 gene region (p=3.47 x 10-7), followed by rs187840540 (p=4.62 x10-7) an intergenic variant on chromosome 6 and rs144158780 (p=1.10 x 10-6) an intron variant on chromosome 19 within the DNM2 gene region (Table 14). None of these genes appear to be implicated in the biological plausibility to be involved in the mechanisms of Cd exposure, TENM4 gene is involved in encoding a protein

52

involved in neuronal development and DNM2 gene is involved in providing instructions for making proteins which alter cell membrane’s to form vesicles for endocytosis and cell structural framework (89).

There is previously published evidence demonstrating a relationship between specific SNPs and Cd biomarkers in humans however these studies predominately used blood Cd measurements (Table 15. Candidate SNP analysis results for the 15 specific SNPs that were found to be associated with cadmium in previous studiesa

Minor Allele Frequency IMPUTE SNP chr position A1 A2 FRQ PLINK R2 BETA SE P AFR AMR ASN EUR INFO rs4653329 1 38285893 G T 0.224 1.040 0.012 0.021 0.573 0.114 0.262 0.229 0.294 1.039 rs10014145 4 103200577 A G 0.222 1.060 0.016 0.022 0.471 0.337 0.199 0.108 0.325 1.054 rs233804 4 103212916 C A 0.204 1.032 0.039 0.023 0.088 0.211 0.207 0.080 0.302 1.029 rs9350504 6 74457830 T C 0.206 1.107 0.038 0.024 0.111 0.130 0.130 0.371 0.012 1.115 rs17574271 8 3970124 T C 0.030 1.020 -0.005 0.052 0.929 0.004 0.022 0.002 0.066 1.011 rs4872479 8 22233636 G T 0.103 0.984 -0.006 0.031 0.857 0.289 0.072 0.082 0.053 0.975 rs870215 8 22258137 G A 0.150 1.096 -0.005 0.026 0.846 0.496 0.130 0.170 0.049 1.107 rs12681420 8 71573198 A G 0.243 1.030 -0.046 0.021 0.030 0.173 0.268 0.084 0.424 1.060 rs1695 11 67352689 A G 0.299 1.097 -0.017 0.020 0.381 0.443 0.425 0.166 0.319 1.099 rs8044719 16 56678865 T G 0.843 1.045 0.018 0.025 0.480 0.411 0.144 0.115 0.143 1.056 rs11076161 16 56673148 A G 0.682 1.004 0.018 0.020 0.359 0.478 0.282 0.302 0.276 1.015 rs1599823 16 56675817 T C 0.582 1.014 0.014 0.019 0.456 0.344 0.373 0.484 0.354 1.020 rs28366003 16 56642491 A G 0.068 0.933 0.063 0.037 0.089 0.014 0.091 0.119 0.044 0.935 rs10636 16 56643343 G C 0.228 0.880 0.000 0.023 0.998 0.274 0.260 0.248 0.211 0.899 rs4784706 16 56697581 A G 0.857 0.999 0.032 0.027 0.238 0.279 0.144 0.086 0.162 1.011 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components.

*NOTE: Abbreviations - SNP: single nucleotide polymorphism SE: standard error of effect estimate chr: chromosome P: P-value for the association test position: genomic position on chromosome or AFR: minor allele frequency for African base pair position samples in 1000 Genomes A1: reference allele AMR: minor allele frequency for Ad Mixed A2: alternate allele, used for association testing American samples in 1000 Genomes FRQ: frequency of A1 ASN: minor allele frequency for Asian samples PLINK R2: Plink R2 quality metric of model in 1000 Genomes BETA: effect estimate per alternate allele EUR: minor allele frequency for European carried samples in 1000 Genomes IMPUTE Info: Imputation quality score from IMPUTE2 Table 16). In addition, two published studies employed GWAS technology to examine the relationship between common genetic variants and Cd, which

53

also used blood Cd measurements and found varying results (49,50). The first GWAS of 949 participants from Uppsala, Sweden demonstrated an association between serum Cd and the locus 6q14.1 (49). The lead SNP, rs9350504 is an intronic variant within the CD109 gene, is suggested to be responsible for the increased clearance or decreased absorption of Cd in blood given by evidence of a negative beta value. The second GWAS study of 4,432 participants which consisted of all women from Malmo, Sweden did not find an association between a SNP and erythrocyte Cd measurements for the whole sample (including smokers and never smokers) after adjusting for age and sex (50). When the researchers stratified their analysis to never smokers only (N=1,728), they identified 13 SNPs in two independent regions; 13 variants were in high linkage disequilibrium on locus 18.q13.3 and 1 variant was within locus X18.p11.31. The lead SNP on locus 8q13.3 was rs12681420, an intron variant within the XKR9 gene. The other significant variant was rs17574271, an intron variant within the DLGAP1 gene on chromosome 18. The researchers suggest the variants are associated with functions related to blood Cd metabolism, absorption and transportation although the mechanisms are not clear (50). Specific SNPs from the previously identified literature and from the two published GWAS studies, with the exception of rs17574271 on chromosome 18 (not genotyped or imputed in our dataset), were included in our candidate SNP analyses (Table 99 List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42)

54

Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

55

Table 10). However, we did not find a statistically significant association between these SNPs of interest and urinary Cd (Table 15. Candidate SNP analysis results for the 15 specific SNPs that were found to be associated with cadmium in previous studiesa

Minor Allele Frequency IMPUTE SNP chr position A1 A2 FRQ PLINK R2 BETA SE P AFR AMR ASN EUR INFO rs4653329 1 38285893 G T 0.224 1.040 0.012 0.021 0.573 0.114 0.262 0.229 0.294 1.039 rs10014145 4 103200577 A G 0.222 1.060 0.016 0.022 0.471 0.337 0.199 0.108 0.325 1.054 rs233804 4 103212916 C A 0.204 1.032 0.039 0.023 0.088 0.211 0.207 0.080 0.302 1.029 rs9350504 6 74457830 T C 0.206 1.107 0.038 0.024 0.111 0.130 0.130 0.371 0.012 1.115 rs17574271 8 3970124 T C 0.030 1.020 -0.005 0.052 0.929 0.004 0.022 0.002 0.066 1.011 rs4872479 8 22233636 G T 0.103 0.984 -0.006 0.031 0.857 0.289 0.072 0.082 0.053 0.975 rs870215 8 22258137 G A 0.150 1.096 -0.005 0.026 0.846 0.496 0.130 0.170 0.049 1.107 rs12681420 8 71573198 A G 0.243 1.030 -0.046 0.021 0.030 0.173 0.268 0.084 0.424 1.060 rs1695 11 67352689 A G 0.299 1.097 -0.017 0.020 0.381 0.443 0.425 0.166 0.319 1.099 rs8044719 16 56678865 T G 0.843 1.045 0.018 0.025 0.480 0.411 0.144 0.115 0.143 1.056 rs11076161 16 56673148 A G 0.682 1.004 0.018 0.020 0.359 0.478 0.282 0.302 0.276 1.015 rs1599823 16 56675817 T C 0.582 1.014 0.014 0.019 0.456 0.344 0.373 0.484 0.354 1.020 rs28366003 16 56642491 A G 0.068 0.933 0.063 0.037 0.089 0.014 0.091 0.119 0.044 0.935 rs10636 16 56643343 G C 0.228 0.880 0.000 0.023 0.998 0.274 0.260 0.248 0.211 0.899 rs4784706 16 56697581 A G 0.857 0.999 0.032 0.027 0.238 0.279 0.144 0.086 0.162 1.011 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components.

*NOTE: Abbreviations - SNP: single nucleotide polymorphism SE: standard error of effect estimate chr: chromosome P: P-value for the association test position: genomic position on chromosome or AFR: minor allele frequency for African base pair position samples in 1000 Genomes A1: reference allele AMR: minor allele frequency for Ad Mixed A2: alternate allele, used for association testing American samples in 1000 Genomes FRQ: frequency of A1 ASN: minor allele frequency for Asian samples PLINK R2: Plink R2 quality metric of model in 1000 Genomes BETA: effect estimate per alternate allele EUR: minor allele frequency for European carried samples in 1000 Genomes IMPUTE Info: Imputation quality score from IMPUTE2 Table 16). We also did not reveal any statistically significant associations with SNPs identified in candidate genes (Table 9. List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference

56

rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42) Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

57

Table 1010 List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference

rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42) Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

58

Table 10) which were hypothesized to have biological relevancy to Cd (

59

). These contrasting results are likely due to the differences in the type of biomarker used and the mechanisms the body uses to handle the transport, storage and elimination of Cd. While human exposure to Cd can be assessed using either blood or urine, Cd is known to accumulate in the kidneys with a much longer half-life (up to 38 years) as compared to blood with a much shorter half-life (3-4 months) (14,22,35). As such, urinary Cd is a better measure of long- term exposure to Cd and blood is recognized as a short-term measure of exposure (or current exposure). When Cd enters the body, it is taken up by blood, bound to proteins (such as MT) and transported to the liver. From here, Cd is then transported to the kidney where it is stored and slowly excreted through urine (90). Therefore, genetic associations found with blood Cd are reflective of the active transport mechanism of Cd and were not seen with urinary Cd because the release of Cd via urine excretion pertains to the storage of Cd and may be driven by other genes related to kidney function and storage. Furthermore, some of the associations with specific SNPs identified in literature did use urinary Cd, and we still did not find a statistically significant association in our population. This is likely due to the differences in the populations used to identify those associations and levels of urinary Cd as compared to our population. For example, Lei et al. found an association with higher levels of urinary Cd and two SNPs, rs28366003 and rs10636, in a population of Southeastern Chinese men and women who were exposed to high levels of Cd through rice contamination. These participants had much higher median levels of urinary Cd (median: 5.38 μg/g creatinine; range: 1.24–19.9 μg/g creatinine) as compared to our participants (85). Alternatively, Adams et al. found an association between the same SNPs and a few others however he reported associations that showed lower levels of urinary Cd in his population of women only in New Mexico and Seattle (mean±SD: 0.46± 0.54 ng/mL and 0.25±0.30 ng/mL, respectively) who had lower urinary Cd levels as compared to Lei et al (83).

To the best of our knowledge, this is the first study to employ a GWAS approach to identify common genetic variants associated with urinary Cd.

60

Nonetheless, a limitation of this study is that urine collection methods differed; Hawaii study participants provided an overnight urine and California study participants provided a first morning urine. While 24-hour urine samples would have been the preferable method, our adjustment for creatinine (natural log) as an independent variable is expected to attenuate the differences from these two modes of collection (65).

In conclusion, the findings of this study provide evidence that there are differences in urinary Cd levels by race/ethnicity in smokers. Particularly, the relatively higher levels of urinary Cd in Latinos and Native Hawaiians, at similar levels of smoking, compared to other racial/ethnic groups in the MEC is very interesting and suggest other factors may influence Cd exposure and should be further investigated. Our GWAS did not demonstrate a significant association with urinary Cd in our population nor did the analyses of candidate SNPs. As lung cancer is a multifactorial disease, it is important to understand the influence of genetics on exposure to tobacco-related carcinogens and how inter-individual differences may play a role in their susceptibility. Data on blood Cd could potentially provide additional information on these current smokers. Future studies are warranted to further investigate the role of genetics on urinary Cd susceptibility.

61

Table 9. List of 15 candidate SNPs selected a priori for the analysis based on evidence from previous studies showing their relationship with Cd biomarkers in humans

SNP ID Chromosome Nearest Gene Literature reference

rs4653329 1 MTF1 Adams, S. et al., 2015 (83) rs10014145 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs233804 4 SLC39A8 Rentschler, G. et al., 2014 (84) rs9350504 6 CD109 Ng, E. et al., 2015 (49) rs17574271 8 DLCAP1 Borné Y et al., 2016 (50) rs4872479 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs870215 8 SLC39A14 Rentschler, G. et al., 2014 (84) rs12681420 8 XKR9 Borné Y et al., 2016 (50) rs1695 11 GSTP1 Khansakorn, N. et al., 2012 (42) Lei, L. et al. (85), 2012, Kayaalti, rs28366003 16 MT2A Z. et al., 2010 (74); Adams, S. et al., 2015 (83) Lei, L. et al., 2012 (85); rs10636 16 MT2A Adams, S. et al., 2015 (83) rs8044719 16 MT1A Adams, S. et al., 2015 (83) rs11076161 16 MT1A Lei, L. et al., 2012 (85) rs1599823 16 MT1B Adams, S. et al., 2015 (83) rs4784706 16 MT gene region Adams, S. et al., 2015 (83)

62

Table 10. List of the 29 candidate genes identified a priori from literature that shows biological plausibility for being involved in the absorption, distribution, metabolism and elimination of urinary Cd

Gene Gene name Chr Genomic Coordinatesa Metal-responsive transcription factor-1 MTF1 Metal-responsive transcription factor-1 1 38,275,239-38,325,292 Glutathione S-transferase (GST) gene family GSTM4 Glutathione S-Transferase Mu 4 1 110,198,698-110,208,123 GSTM2 Glutathione S-Transferase Mu 2 1 110,210,644-110,252,171 GSTM1 Glutathione S-Transferase Mu 1 1 110,230,418-110,251,661 GSTM5 Glutathione S-Transferase Mu 5 1 110,254,864-110,318,050 GSTM3 Glutathione S-Transferase Mu 3 1 110,276,554-110,284,384 GSTA2 Glutathione S-Transferase Alpha 2 6 52,614,885-52,628,367 GSTA1 Glutathione S-Transferase Alpha 1 6 52,656,178-52,668,708 GSTA3 Glutathione S-Transferase Alpha 3 6 52,761,437-52,774,496 GSTA4 Glutathione S-Transferase Alpha 4 6 52,842,746-52,860,178 GSTP1 Glutathione S-Transferase Pi 1 11 67,351,066-67,354,131 GSTZ1 Glutathione S-Transferase Zeta 1 14 77,787,227-77,797,940 GSTT2 Glutathione S-Transferase theta 2 22 24,322,339-24,326,106 GSTT1 Glutathione S-Transferase theta 1 22 24,376,133-24,384,680 Metallothionein (MT) gene family MT4 Metallothionein 4 16 56,598,961-56,602,869 MT3 Metallothionein 3 16 56,622,986-56,625,000 MT2A Metallothionein 2A 16 56,642,111-56,643,409 MT1L Metallothionein 1L 16 56,651,388-56,652,730 MT1E Metallothionein 1E 16 56,659,387-56,661,024 MT1M Metallothionein 1M 16 56,666,145-56,667,898 MT1JP Metallothionein 1J, pseudogene 16 56,669,651-56,670,998 MT1A Metallothionein 1A 16 56,672,578-56,673,999 MT1DP Metallothionein 1D, pseudogene 16 56,677,617-56,678,698 MT1CP Metallothionein 1C, pseudogene 16 56,682,160-56,683,426 MT1B Metallothionein 1B 16 56,685,811-56,687,116 MT1F Metallothionein 1F 16 56,691,606-56,694,610 MT1G Metallothionein 1G 16 56,700,643-56,701,977 MT1H Metallothionein 1H 16 56,703,726-56,705,041 MT1X Metallothionein 1X 16 56,716,336-56,718,108 aBased on the GRCh37 genome build (79). Abbreviation: Chr - chromosome

63

Table 11. Main characteristics and biomarkers of study participants stratified by race/ethnicity and sex (n=1,977)

African American Native Hawaiian Japanese American Latino White

Median and interquartile range All n=285 n=296 n=588 n=418 n=390 Age, years 64.5 (59.8, 69.1) 61.0 (56.9, 65.9) 63.4 (59.1, 69.8) 65.7 (61.7, 70.8) 62.5 (59.2, 69.3) BMI, kg/m2 27 (23.7, 31.1) 26.7 (23.8, 30.6) 24.4 (21.9, 26.9) 26.5 (24.0, 29.6) 24.7 (22.0, 27.9) Cigarettes per day 10 (5, 18) 15 (9, 20) 12 (10, 20) 7.5 (4, 12) 20 (10, 20) Years of smoking 37.5 (34.5, 46.5) 37.5 (33.5, 46.5) 43.5 (35.5, 46.5) 43.5 (34.5, 48.0) 44.5 (35.5, 46.5) TNE, nmol/mL 44.6 (28.3, 71.6) 30.1 (19.2, 46.3) 27.3 (15.7, 42.9) 32.5 (20.8, 53.6) 35.9 (21.9, 57.5) Creatinine, mg/dL 113.4 (66.7, 167.8) 74.6 (40.0, 115.8) 67.6 (39.3, 113.0) 92.0 (56.6, 141.8) 62.7 (39.1, 109.1) Males n=88 n=109 n=344 n=221 n=173 Age, years 64.5 (58.9, 66.6) 63.3 (58.4, 69.0) 63.3 (59.2, 69.6) 66.3 (62.6, 71.7) 62.4 (59.2, 68.2) BMI, kg/m2 27 (23.4, 29.0) 26.5 (24.3, 30.3) 24.7 (22.7, 27.4) 25.8 (23.7, 28.5) 25.8 (23.3, 27.9) Cigarettes per day 10 (5, 20) 19 (10, 20) 15 (10, 20) 10 (5, 15) 20 (15, 25) Years of smoking 37.5 (35.5, 46.5) 45.5 (34.5, 47.5) 44.5 (36.5, 47.5) 45.5 (35.5, 50.0) 44.5 (35.5, 47.5) TNE, nmol/mL 44.6 (28.9, 77.7) 33.3 (22.1, 54.7) 30.0 (17.3, 47.8) 34.6 (21.9, 57.8) 39.9 (24.6, 70.8) Creatinine, mg/dL 139.1 (90.0, 197.9) 84.6 (53.7, 156.5) 86.2 (50.4, 133.9) 106.5 (66.2, 160.7) 82.9 (47.3, 138.1) Females n=197 n=187 n=244 n=197 n=217 Age, years 64.5 (60.1, 71.5) 60.5 (56.4, 64.5) 63.8 (59.0, 69.9) 65.5 (60.9, 69.8) 62.7 (59.2, 69.6) BMI, kg/m2 27 (23.9, 32.1) 26.8 (23.8, 31.1) 23.5 (20.6, 26.5) 27.2 (24.2, 30.6) 23.9 (21.0, 27.7) Cigarettes per day 10 (5, 15) 12 (8, 20) 10 (7, 15) 6 (4, 10) 15 (7, 20) Years of smoking 37.5 (34.5, 46.5) 36.5 (32.5, 45.5) 36.5 (34.5, 46.5) 36.5 (32.5, 45.5) 44.5 (35.5, 46.5) TNE, nmol/mL 44.6 (27.8, 66.6) 28.8 (17.9, 44.1) 22.1 (14.2, 35.7) 31.7 (17.8, 50.1) 31.2 (20.5, 49.3) Creatinine, mg/dL 94 (60.1, 154.3) 61.6 (37.2, 98.5) 48.4 (33.3, 72.9) 76.0 (48.7, 113.5) 47.7 (33.3, 78.6) Age, at urine collection; BMI, body-mass index; TNE, total nicotine equivalents

64

Table 12. Median and interquartile range (IQR) for urinary cadmium, ng/mL

All Males Females N Median (IQR) p** N Median (IQR) p** N Median (IQR) p** African 285 0.84 (0.48, 1.26) <0.001 88 0.90 (0.48, 1.32) <0.001 197 0.78 (0.48, 1.26) <0.001 American Native Hawaiian 296 0.57 (0.36, 0.96) 0.0189 109 0.60 (0.42, 1.08) 0.0531 187 0.54 (0.30, 0.96) 0.1436 Japanese 588 0.54 (0.30, 0.84) 0.5534 344 0.54 (0.30, 0.90) 0.6489 244 0.48 (0.30, 0.75) 0.8555 American Latino 418 0.72 (0.42, 1.08) <0.001 221 0.78 (0.48, 1.08) <0.001 197 0.66 (0.42, 1.08) <0.001

White 390 0.48 (0.30, 0.78) 173 0.60 (0.36, 0.90) 217 0.48 (0.30, 0.72) p * <0.001 <0.001 <0.001 *p-value using the Kruskal –Wallis test **p-value using the Wilcoxon Mann-Whitney test compared to Whites

65

Table 13. Geometric mean (95% CI) of urinary cadmium stratified by race/ethnicity and sex (n=1,977)

Model 1a Model 2b Geometric Geometric N (95% CI) (95% CI) mean mean All African American 285 0.82 (0.76, 0.88) 0.80 (0.75, 0.85) Native Hawaiian 296 0.82 (0.76, 0.88) 0.84 (0.79, 0.90) Japanese American 588 0.77 (0.73, 0.81) 0.81 (0.77, 0.85) Latinos 418 0.84 (0.79, 0.89) 0.87 (0.82, 0.92) Whites 390 0.75 (0.71, 0.80) 0.74 (0.69, 0.78) p-valuec 0.045 <0.001 Males African American 88 0.75 (0.69, 0.81) 0.74 (0.68, 0.79) Native Hawaiian 109 0.75 (0.70, 0.81) 0.78 (0.72, 0.83) Japanese American 344 0.71 (0.67, 0.75) 0.75 (0.71, 0.79) Latinos 221 0.77 (0.72, 0.82) 0.80 (0.75, 0.85) Whites 173 0.69 (0.65, 0.74) 0.68 (0.63, 0.72) p-valuec 0.046 <0.001 Females African American 197 0.90 (0.84, 0.96) 0.87 (0.82, 0.93) Native Hawaiian 187 0.90 (0.83, 0.97) 0.92 (0.86, 0.99) Japanese American 244 0.85 (0.79, 0.90) 0.89 (0.83, 0.94) Latinos 197 0.92 (0.86, 0.98) 0.95 (0.89, 1.01) Whites 217 0.83 (0.77, 0.88) 0.80 (0.75, 0.86) p-valuec 0.044 <0.001 aModel 1: adjusted for age, sex, and creatinine (natural log) bModel 2: further adjusted for urinary TNE cGlobal p-value

66

Figure 7. Quantile-Quantile plot of observed and expected –log10 transformed p- values from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles. Analysis was adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components

Figure 8. Manhattan plot from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles. Analysis was adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components. Blue line corresponds to a p < 10-6 value for reference.

67

Table 14. Top GWAS results (p < 5 x 10-5) for the association between urinary cadmium and genotyped and imputed alleles in smokers ordered by chromosome aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components.

Minor Allele Frequency chr SNP coordinates A1 A2 FRQ PLINK R2 BETA SE P Impute Info Score SNP Functionb Nearest Geneb AFR AMR ASN EUR 1 rs9660585 5826721 G C 0.182 1.209 -0.114 0.026 8.34E-06 0.274 0.111 0.171 0.067 1.230 2 rs112558199 205853168 T G 0.050 0.957 0.211 0.046 3.80E-06 0.089 0.030 0.040 0.028 0.943 intron variant PARD3B 2 rs76219483 205875740 C T 0.038 1.043 0.234 0.050 3.27E-06 0.020 0.022 0.035 0.019 1.038 intron variant PARD3B 2 rs77463136 238900800 G A 0.043 1.017 0.195 0.044 7.71E-06 0.010 0.044 0.025 0.049 1.022 intron variant UBE2F 2 rs72977049 238923619 T G 0.045 1.014 0.196 0.043 4.64E-06 0.016 0.044 0.025 0.049 1.019 intron variant UBE2F 6 rs187840540 58083263 A G 0.312 0.622 0.123 0.024 4.62E-07 0.315 0.298 0.406 0.267 0.618 intergenic variant 6 rs151158528 58083289 A G 0.317 0.640 0.112 0.024 3.32E-06 0.309 0.298 0.423 0.261 0.637 intergenic variant 9 rs4977463 18925401 T C 0.407 1.052 -0.083 0.018 7.35E-06 0.197 0.312 0.365 0.314 1.055 intron variant SAXO1 9 rs11791623 118075574 C T 0.065 1.067 0.170 0.036 2.21E-06 0.033 0.072 0.014 0.162 1.066 non coding transcript exon variant DEC1 9 rs11787738 118075812 A C 0.064 1.062 0.170 0.036 2.73E-06 0.033 0.072 0.014 0.160 1.062 non coding transcript exon variant DEC1 9 chr9:118075914:I 118075914 A AT 0.065 1.056 0.167 0.036 4.09E-06 0.035 0.072 0.014 0.162 1.056 9 rs34159304 118076374 G A 0.065 1.070 0.169 0.036 2.23E-06 0.033 0.072 0.014 0.162 1.068 non coding transcript exon variant DEC1 9 rs13290828 118076785 C T 0.066 1.069 0.165 0.036 3.65E-06 0.033 0.072 0.019 0.162 1.074 intron variant DEC1 9 rs11787534 118087739 C A 0.067 1.066 0.171 0.035 1.36E-06 0.039 0.072 0.018 0.161 1.073 intron variant DEC1 9 chr9:118088080:D 118088080 TG T 0.063 1.055 0.163 0.036 7.83E-06 0.039 0.066 0.018 0.156 1.064 9 rs13283967 118088081 G A 0.064 1.063 0.165 0.036 5.42E-06 0.039 0.066 0.018 0.158 1.071 intron variant DEC1 9 chr9:140367406:I 140367406 C CA 0.069 0.661 0.196 0.044 8.34E-06 0.108 0.102 0.026 0.067 0.653 11 rs7120358 78590340 G A 0.174 1.092 0.113 0.025 5.30E-06 0.307 0.218 0.018 0.278 1.076 11 rs7120488 78590380 G A 0.160 1.029 0.120 0.026 5.04E-06 0.266 0.177 0.018 0.248 1.013 intron variant TENM4 11 rs504245 78625883 G A 0.177 1.083 0.118 0.025 2.04E-06 0.185 0.232 0.030 0.310 1.060 intron variant TENM4 11 rs673456 78629612 G A 0.172 1.070 0.128 0.025 3.47E-07 0.183 0.229 0.032 0.303 1.052 intron variant TENM4 11 chr11:78637734:D 78637734 TTC T 0.219 1.139 0.116 0.026 5.46E-06 0.451 0.268 0.012 0.315 1.123 11 chr11:89993305:I 89993305 T TA 0.313 0.997 0.091 0.020 8.44E-06 0.344 0.298 0.385 0.191 0.998 11 rs112250004 90009663 A G 0.327 1.008 0.090 0.020 5.82E-06 0.366 0.318 0.399 0.210 1.006 intron variant DISC1FP1 11 rs12807358 101075736 C T 0.081 1.007 -0.145 0.032 7.80E-06 0.029 0.116 0.080 0.082 1.002 11 rs34712605 101078316 G A 0.080 1.009 -0.146 0.032 7.40E-06 0.029 0.116 0.079 0.082 1.005 11 rs35253160 101078512 T C 0.080 1.004 -0.145 0.033 9.77E-06 0.029 0.116 0.079 0.077 1.000 11 rs114909569 101082437 G C 0.080 1.009 -0.146 0.032 7.36E-06 0.029 0.116 0.079 0.082 1.005 11 rs35401637 101083619 A G 0.080 1.009 -0.146 0.032 7.35E-06 0.029 0.116 0.079 0.082 1.005 11 rs4121759 101090857 G A 0.080 1.009 -0.146 0.032 7.18E-06 0.029 0.116 0.079 0.082 1.006 11 rs12790750 101093285 T G 0.080 1.009 -0.146 0.032 7.18E-06 0.029 0.116 0.079 0.082 1.006 11 rs71476672 101095354 T C 0.080 1.009 -0.146 0.032 7.22E-06 0.029 0.116 0.080 0.082 1.005 11 rs74659211 101096291 G A 0.081 1.007 -0.145 0.032 7.74E-06 0.029 0.119 0.080 0.083 1.003 11 rs34391750 101096526 A G 0.080 1.008 -0.146 0.032 7.17E-06 0.029 0.116 0.080 0.082 1.005 11 rs35978545 101097425 C T 0.081 1.004 -0.144 0.032 9.14E-06 0.033 0.113 0.080 0.082 0.998 11 rs35538677 101098195 C T 0.082 1.007 -0.144 0.032 7.98E-06 0.029 0.113 0.080 0.082 1.003 11 rs34223959 101099655 G T 0.080 1.007 -0.146 0.033 7.67E-06 0.029 0.113 0.080 0.082 1.004 11 rs34604180 101112704 G C 0.081 1.012 -0.146 0.032 6.06E-06 0.031 0.116 0.079 0.082 1.006 11 rs12793035 101117482 A G 0.081 1.012 -0.144 0.032 7.97E-06 0.031 0.116 0.079 0.082 1.007 12 rs112150049 15770028 A G 0.044 1.045 -0.213 0.046 3.67E-06 0.169 0.030 0.039 0.017 1.072 13 rs1547149 22476288 A G 0.359 0.999 -0.084 0.019 8.44E-06 0.313 0.356 0.428 0.354 0.999 14 rs4906282 103619513 C T 0.615 0.999 0.084 0.019 5.75E-06 0.370 0.475 0.330 0.425 1.001 14 rs34475295 103635359 T C 0.364 1.038 0.087 0.019 3.43E-06 0.185 0.227 0.413 0.418 1.052 14 rs57594879 103657430 T C 0.350 1.020 0.087 0.019 4.83E-06 0.130 0.224 0.409 0.402 1.026 16 rs149945080 1280873 A T 0.052 0.514 0.254 0.056 5.69E-06 0.065 0.041 0.054 0.046 0.521 2KB Upstream Variant TPSB2 19 rs144158780 10932459 G A 0.020 0.615 0.392 0.080 1.10E-06 0.035 0.030 0.021 0.015 0.619 intron variant DNM2 19 rs7253141 35927203 A G 0.192 1.065 0.105 0.023 7.88E-06 0.274 0.124 0.287 0.067 1.066 2KB Upstream Variant LOC101927522 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components. bSNP function and nearest gene information from dbSNP (https://www.ncbi.nlm.nih.gov/snp/) *Note: Abbreviations - SNP: single nucleotide polymorphism chr: chromosome position: genomic position on chromosome or base pair position A1: reference allele A2: alternate allele, used for association testing FRQ: frequency of A1 PLINK R2: Plink R2 quality metric of model BETA: effect estimate per alternate allele carried SE: standard error of effect estimate P: P-value for the association test AFR: minor allele frequency for African samples in 1000 Genomes AMR: minor allele frequency for Ad Mixed American samples in 1000 Genomes ASN: minor allele frequency for Asian samples in 1000 Genomes EUR: minor allele frequency for European samples in 1000 Genomes IMPUTE Info: Imputation quality score from IMPUTE2

68

Table 15. Candidate SNP analysis results for the 15 specific SNPs that were found to be associated with cadmium in previous studiesa

Minor Allele Frequency IMPUTE SNP chr position A1 A2 FRQ PLINK R2 BETA SE P AFR AMR ASN EUR INFO rs4653329 1 38285893 G T 0.224 1.040 0.012 0.021 0.573 0.114 0.262 0.229 0.294 1.039 rs10014145 4 103200577 A G 0.222 1.060 0.016 0.022 0.471 0.337 0.199 0.108 0.325 1.054 rs233804 4 103212916 C A 0.204 1.032 0.039 0.023 0.088 0.211 0.207 0.080 0.302 1.029 rs9350504 6 74457830 T C 0.206 1.107 0.038 0.024 0.111 0.130 0.130 0.371 0.012 1.115 rs17574271 8 3970124 T C 0.030 1.020 -0.005 0.052 0.929 0.004 0.022 0.002 0.066 1.011 rs4872479 8 22233636 G T 0.103 0.984 -0.006 0.031 0.857 0.289 0.072 0.082 0.053 0.975 rs870215 8 22258137 G A 0.150 1.096 -0.005 0.026 0.846 0.496 0.130 0.170 0.049 1.107 rs12681420 8 71573198 A G 0.243 1.030 -0.046 0.021 0.030 0.173 0.268 0.084 0.424 1.060 rs1695 11 67352689 A G 0.299 1.097 -0.017 0.020 0.381 0.443 0.425 0.166 0.319 1.099 rs8044719 16 56678865 T G 0.843 1.045 0.018 0.025 0.480 0.411 0.144 0.115 0.143 1.056 rs11076161 16 56673148 A G 0.682 1.004 0.018 0.020 0.359 0.478 0.282 0.302 0.276 1.015 rs1599823 16 56675817 T C 0.582 1.014 0.014 0.019 0.456 0.344 0.373 0.484 0.354 1.020 rs28366003 16 56642491 A G 0.068 0.933 0.063 0.037 0.089 0.014 0.091 0.119 0.044 0.935 rs10636 16 56643343 G C 0.228 0.880 0.000 0.023 0.998 0.274 0.260 0.248 0.211 0.899 rs4784706 16 56697581 A G 0.857 0.999 0.032 0.027 0.238 0.279 0.144 0.086 0.162 1.011 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components.

*NOTE: Abbreviations - SNP: single nucleotide polymorphism SE: standard error of effect estimate chr: chromosome P: P-value for the association test position: genomic position on chromosome or base pair position AFR: minor allele frequency for African samples in 1000 Genomes A1: reference allele AMR: minor allele frequency for Ad Mixed American samples in 1000 A2: alternate allele, used for association testing Genomes FRQ: frequency of A1 ASN: minor allele frequency for Asian samples in 1000 Genomes PLINK R2: Plink R2 quality metric of model EUR: minor allele frequency for European samples in 1000 Genomes BETA: effect estimate per alternate allele carried IMPUTE Info: Imputation quality score from IMPUTE2

69

Table 16. Top 100 candidate SNP analysis results (by p-value) for the 1,169 SNPs identified in genes that were hypothesized to be associated with cadmiuma

Candidate Minor Allele Frequency IMPUTE SNP chr position A1 A2 FRQ PLINK R2 BETA SE P Gene Region AFR AMR ASN EUR INFO GSTA2 rs150303824 6 52631801 C T 0.808 0.918 0.094 0.026 3.89E-04 0.128 0.373 0.037 0.367 0.911 GSTA2 rs137880383 6 52631802 A G 0.813 0.929 0.084 0.027 0.002 0.089 0.373 0.042 0.359 0.923 MTF1 rs112496727 1 38304767 C T 0.013 0.838 0.265 0.087 0.002 0.059 0.011 0.012 0.013 0.837 GSTA2 rs138908399 6 52631827 A G 0.715 0.748 0.072 0.024 0.003 0.218 0.442 0.147 0.433 0.745 GSTA2 rs2608622 6 52613931 C T 0.873 1.093 0.082 0.029 0.004 0.043 0.216 0.011 0.305 1.089 GSTA2 rs2070774 6 52628035 C T 0.815 1.077 0.066 0.024 0.005 0.029 0.238 0.136 0.363 1.081 GSTT2 rs140239 22 24329151 C T 0.138 0.634 -0.094 0.034 0.005 0.053 0.152 0.070 0.186 0.626 GSTA2 rs2749013 6 52627165 A G 0.816 1.068 0.065 0.024 0.006 0.043 0.238 0.136 0.362 1.071 GSTA2 rs2608639 6 52632213 C T 0.808 1.039 0.065 0.024 0.006 0.033 0.238 0.143 0.367 1.043 MT1X rs56213321 16 56720665 T C 0.083 0.967 -0.089 0.033 0.007 0.010 0.088 0.072 0.123 0.968 MT1X chr16:56720446:D 16 56720446 GCC G 0.083 0.967 -0.089 0.033 0.007 0.010 0.088 0.072 0.123 0.968 GSTA2 rs2749010 6 52626572 C T 0.817 1.084 0.064 0.024 0.007 0.029 0.301 0.136 0.362 1.087 MT1X rs72784753 16 56720881 C T 0.083 0.967 -0.089 0.033 0.007 0.010 0.088 0.072 0.123 0.968 GSTA2 rs2608631 6 52626269 G A 0.817 1.086 0.064 0.024 0.007 0.029 0.301 0.136 0.362 1.088 GSTA2 rs2608630 6 52625913 T A 0.818 1.077 0.064 0.024 0.007 0.033 0.238 0.136 0.362 1.079 GSTA2 rs2608625 6 52623095 C T 0.818 1.077 0.064 0.024 0.007 0.029 0.238 0.136 0.362 1.080 GSTA2 rs1009063 6 52623505 T A 0.818 1.077 0.064 0.024 0.007 0.029 0.238 0.136 0.362 1.080 GSTA2 rs1009064 6 52623490 G C 0.818 1.077 0.064 0.024 0.007 0.029 0.238 0.136 0.362 1.080 GSTA2 rs2749026 6 52630635 A G 0.818 1.077 0.064 0.024 0.007 0.029 0.238 0.136 0.363 1.080 GSTA2 rs2608627 6 52625041 C T 0.818 1.077 0.064 0.024 0.007 0.029 0.238 0.136 0.362 1.080 GSTA2 rs2749008 6 52626120 C T 0.818 1.078 0.064 0.024 0.007 0.029 0.238 0.136 0.362 1.081 GSTA2 rs2749014 6 52627322 G A 0.818 1.078 0.064 0.024 0.007 0.029 0.238 0.136 0.362 1.081 GSTA1 rs1051775 6 52658962 T C 0.218 1.033 -0.061 0.023 0.009 0.059 0.301 0.138 0.434 1.035 GSTA4 rs17614871 6 52843758 T A 0.145 1.050 0.065 0.025 0.009 0.067 0.177 0.119 0.270 1.043 GSTM2/GSTM1 rs4147565 1 110231777 G A 0.253 0.724 -0.071 0.027 0.010 0.022 0.127 0.351 0.164 0.730 GSTT2 rs35936339 22 24328687 T G 0.168 0.664 -0.080 0.031 0.010 0.045 0.188 0.066 0.253 0.659 MT2A rs1008766 16 56639800 G C 0.328 1.052 -0.054 0.021 0.011 0.445 0.379 0.124 0.449 1.052 GSTM2/GSTM1 rs737497 1 110231592 T C 0.317 0.640 -0.066 0.026 0.011 0.173 0.246 0.400 0.203 0.644 MT2A rs1862849 16 56639494 G A 0.328 1.053 -0.053 0.021 0.011 0.445 0.379 0.124 0.449 1.053 GSTT2 rs140230 22 24328548 A G 0.108 0.631 -0.096 0.038 0.011 0.026 0.146 0.046 0.149 0.623 GSTA2 rs2749019 6 52628551 A G 0.811 1.066 0.059 0.023 0.012 0.067 0.240 0.140 0.364 1.070 GSTA1 rs6932500 6 52665671 G A 0.212 1.069 -0.058 0.023 0.012 0.059 0.296 0.136 0.427 1.072 GSTA2 rs2749033 6 52632479 A G 0.839 1.023 0.069 0.027 0.012 0.065 0.260 0.025 0.365 1.021 GSTA1 rs62412860 6 52654550 G C 0.209 1.079 -0.058 0.023 0.012 0.033 0.296 0.136 0.443 1.082 GSTA1 rs145747437 6 52652103 C T 0.209 1.078 -0.058 0.023 0.012 0.037 0.296 0.136 0.442 1.080 GSTA1 rs1986661 6 52652340 T C 0.209 1.080 -0.058 0.023 0.013 0.033 0.296 0.136 0.443 1.082 MT2A rs7188169 16 56638232 G A 0.328 1.056 -0.052 0.021 0.013 0.445 0.379 0.124 0.450 1.055 GSTT2 rs139489364 22 24328816 A T 0.141 0.651 -0.082 0.033 0.013 0.035 0.166 0.073 0.219 0.646 GSTA1 rs4147615 6 52657321 T C 0.221 1.067 -0.056 0.022 0.013 0.124 0.298 0.136 0.426 1.067 GSTT2 rs365158 22 24323849 T C 0.412 0.783 -0.054 0.022 0.014 0.396 0.439 0.241 0.450 0.770 GSTA1 rs58912740 6 52669759 G C 0.208 1.060 -0.057 0.023 0.014 0.059 0.293 0.131 0.420 1.061 GSTA1 rs11969435 6 52664956 C T 0.221 1.069 -0.055 0.022 0.015 0.122 0.298 0.136 0.427 1.069 GSTA1 rs10948723 6 52666405 T C 0.221 1.069 -0.055 0.022 0.015 0.122 0.298 0.136 0.427 1.069 GSTA1 rs10948722 6 52663813 A G 0.207 1.088 -0.056 0.023 0.015 0.033 0.293 0.136 0.427 1.090 GSTA2 rs2254050 6 52622814 C T 0.812 1.066 0.057 0.023 0.016 0.079 0.243 0.136 0.362 1.066 GSTA2 rs2749005 6 52621433 T G 0.810 1.067 0.056 0.023 0.017 0.110 0.243 0.136 0.363 1.067 GSTA4 rs3756980 6 52851979 A G 0.186 1.066 0.054 0.023 0.017 0.102 0.180 0.122 0.264 1.055 GSTA2 rs2749024 6 52630587 T A 0.813 1.073 0.056 0.023 0.017 0.063 0.240 0.136 0.363 1.076 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components. Table is in order starting at smallest p-value.

70

Table 16 (continued). Top 100 candidate SNP analysis results (by p-value) for the 1,169 SNPs identified in genes that were hypothesized to be associated with cadmiuma

Candidate Minor Allele Frequency IMPUTE SNP chr position A1 A2 FRQ PLINK R2 BETA SE P Gene Region AFR AMR ASN EUR INFO GSTA2 rs2749029 6 52631188 A G 0.814 1.073 0.056 0.023 0.017 0.063 0.240 0.136 0.362 1.077 GSTM2/GSTM1 rs111633257 1 110235186 C T 0.092 0.708 0.093 0.039 0.017 0.248 0.102 0.018 0.145 0.736 GSTA2 rs2608629 6 52625794 G A 0.809 1.072 0.054 0.023 0.019 0.112 0.243 0.136 0.362 1.070 GSTM5/GSTM3 rs55976937 1 110277837 A T 0.031 0.970 -0.123 0.052 0.019 0.033 0.022 0.012 0.024 0.964 GSTA2 rs2207951 6 52611118 C A 0.834 1.111 0.058 0.025 0.020 0.037 0.254 0.094 0.365 1.111 GSTA2 rs2608624 6 52622475 G T 0.810 1.070 0.053 0.023 0.021 0.112 0.243 0.136 0.363 1.069 GSTA2 rs2749007 6 52625963 A T 0.810 1.072 0.053 0.023 0.021 0.112 0.243 0.136 0.362 1.071 MT2A rs1610216 16 56642284 A G 0.374 0.961 0.048 0.021 0.022 0.421 0.290 0.423 0.168 0.957 GSTA2 rs2207949 6 52610584 T C 0.827 1.110 0.055 0.024 0.024 0.098 0.254 0.098 0.365 1.108 GSTA4 rs45460093 6 52850831 C T 0.173 1.010 0.053 0.024 0.025 0.077 0.174 0.124 0.259 1.005 GSTT2 rs34083567 22 24328683 A G 0.167 0.640 -0.070 0.031 0.026 0.049 0.188 0.070 0.251 0.636 GSTA2 rs2748999 6 52612668 T A 0.827 1.097 0.055 0.025 0.026 0.100 0.254 0.094 0.365 1.094 MT1G/MT1H rs12448654 16 56701827 T C 0.100 0.962 -0.067 0.030 0.028 0.020 0.113 0.079 0.136 0.971 MT4 rs145695903 16 56600685 A G 0.113 0.671 -0.075 0.034 0.030 0.142 0.138 0.121 0.104 0.674 GSTA2 rs2207950 6 52610630 A G 0.827 1.107 0.053 0.025 0.030 0.098 0.254 0.094 0.365 1.103 GSTA1 rs11961624 6 52673388 G T 0.206 1.027 -0.051 0.024 0.031 0.098 0.276 0.128 0.402 1.027 GSTA1 rs72936066 6 52673387 T G 0.206 1.027 -0.051 0.024 0.031 0.098 0.276 0.128 0.402 1.027 GSTM2/GSTM1 rs182902643 1 110225432 G A 0.239 0.780 -0.051 0.024 0.032 0.337 0.199 0.220 0.239 0.787 GSTA2 rs2749012 6 52627050 C G 0.560 0.677 0.047 0.022 0.034 0.311 0.412 0.446 0.433 0.673 GSTZ1 rs61990305 14 77784267 T C 0.197 1.031 -0.048 0.023 0.035 0.014 0.282 0.241 0.145 1.039 GSTT1 rs2739345 22 24375989 G A 0.574 0.808 -0.047 0.022 0.035 0.183 0.301 0.350 0.161 0.819 GSTT2 rs140240 22 24329435 C T 0.236 0.806 -0.054 0.026 0.035 0.037 0.276 0.108 0.368 0.798 GSTA2 rs4715316 6 52628998 C T 0.795 1.055 0.048 0.023 0.035 0.191 0.254 0.142 0.363 1.053 GSTA4 rs642851 6 52842106 A C 0.643 1.017 -0.041 0.019 0.036 0.498 0.492 0.266 0.458 1.017 GSTA2 rs4715317 6 52629010 T G 0.787 1.026 0.047 0.023 0.037 0.179 0.254 0.142 0.379 1.022 GSTM2/GSTM1 rs187753552 1 110226548 A G 0.046 0.567 -0.119 0.057 0.037 0.010 0.019 0.068 0.029 0.564 GSTA1 rs2397104 6 52652593 C A 0.225 1.057 -0.047 0.022 0.037 0.146 0.307 0.136 0.443 1.058 MT1B/MT1F rs56093186 16 56689685 G A 0.324 1.059 -0.041 0.020 0.037 0.081 0.428 0.315 0.457 1.062 GSTT2 rs184144393 22 24327424 C T 0.055 0.594 -0.108 0.052 0.038 0.020 0.050 0.025 0.067 0.584 GSTT2 rs436740 22 24323878 G A 0.564 0.886 -0.042 0.020 0.040 0.252 0.412 0.416 0.335 0.876 GSTA3 rs10948730 6 52771402 C A 0.177 1.069 -0.049 0.024 0.041 0.022 0.199 0.098 0.281 1.057 GSTA3 rs13197742 6 52772678 T A 0.174 1.066 -0.049 0.024 0.042 0.016 0.196 0.091 0.281 1.054 GSTT1 rs140316 22 24388274 C T 0.697 0.658 -0.050 0.025 0.043 0.431 0.251 0.296 0.136 0.658 GSTA3 rs6916414 6 52766805 T C 0.177 1.068 -0.048 0.024 0.045 0.016 0.191 0.100 0.281 1.056 GSTT1 rs4630 22 24376322 G A 0.567 0.817 -0.045 0.022 0.045 0.189 0.296 0.362 0.154 0.827 GSTA4 rs367836 6 52843131 G T 0.650 1.031 -0.039 0.019 0.045 0.496 0.489 0.257 0.453 1.031 GSTA3 rs45516398 6 52759828 C A 0.177 1.070 -0.048 0.024 0.046 0.016 0.196 0.100 0.280 1.058 GSTA4 rs423475 6 52844583 G T 0.645 1.034 -0.039 0.019 0.046 0.468 0.481 0.257 0.453 1.037 GSTA3 rs2295605 6 52761898 T C 0.177 1.070 -0.048 0.024 0.046 0.016 0.196 0.100 0.280 1.058 GSTT1 rs2844008 22 24378805 C T 0.572 0.798 -0.045 0.023 0.046 0.183 0.293 0.372 0.154 0.807 GSTT1 rs140317 22 24388642 A G 0.647 0.765 -0.049 0.025 0.046 0.218 0.276 0.301 0.140 0.773 MT4 rs12922677 16 56600661 G A 0.044 0.659 -0.108 0.054 0.046 0.126 0.080 0.046 0.034 0.666 GSTA4 rs612483 6 52839958 T C 0.632 1.060 -0.038 0.019 0.047 0.433 0.478 0.257 0.467 1.057 GSTM1/GSTM2 chr1:110228903:I 1 110228903 G GTTCT 0.193 0.689 -0.056 0.028 0.048 0.077 0.191 0.236 0.119 0.691 GSTA4 rs673197 6 52844257 A G 0.647 1.034 -0.038 0.019 0.050 0.470 0.481 0.257 0.450 1.037 GSTA4 rs644100 6 52842371 A C 0.649 1.030 -0.038 0.019 0.050 0.488 0.486 0.257 0.453 1.032 aAnalyses were adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), TNE and the top 10 leading principal components. Table is in order starting at smallest p-value.

71

*NOTE: Abbreviations - SNP: single nucleotide polymorphism SE: standard error of effect estimate chr: chromosome P: P-value for the association test position: genomic position on chromosome or base pair position AFR: minor allele frequency for African samples in 1000 Genomes A1: reference allele AMR: minor allele frequency for Ad Mixed American samples in 1000 A2: alternate allele, used for association testing Genomes FRQ: frequency of A1 ASN: minor allele frequency for Asian samples in 1000 Genomes PLINK R2: Plink R2 quality metric of model EUR: minor allele frequency for European samples in 1000 Genomes BETA: effect estimate per alternate allele carried IMPUTE Info: Imputation quality score from IMPUTE2

72

Figure 9. (Supplemental Figure S3) Conceptual model for the association between urinary cadmium and genetic variants

73

Figure 10. (Supplemental Figure S4) Geometric means of urinary cadmium levels by race/ethnicity, after adjusting for age at urine collection, sex, creatinine (natural log) and smoking dose (urinary TNE) (Model 2b)

1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10

Geometric mean urinary Cd,ng/mL 0.00 African Native Japanese Latinos Whites American Hawaiian American (n=418) (n=390) (n=285) (n=296) (n=588) Race/ethnicity

74

Figure 11. (Supplemental Figure S5). Geometric means of urinary cadmium levels by race/ethnicity and sex, after adjusting for age at urine collection, sex, creatinine (natural log) and TNE (Model 2b).

1.20

1.00

0.80

0.60 Males

0.40 Females

0.20

0.00 Geometric mean urinary Cd,ng/mL African Native Japanese Latinos Whites American Hawaiian American Race/ethnicity

75

Figure 12. (Supplemental Figure S6). QQ-plot and Manhattan plot from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles. Analysis was adjusted for model above (age at urine collection, sex, reported race/ethnicity, TNE and the top 10 leading principal components) but removing creatinine. Blue line corresponds to a p < 10-6 value for reference.

76

Figure 13. (Supplemental Figure S7) QQ-plot and Manhattan plot from the GWAS analysis investigating the association between urinary cadmium and genotyped and imputed alleles. Analysis was adjusted for age at urine collection, sex, reported race/ethnicity, creatinine (natural log), education, TNE, smoking duration, occupational exposure categories and the top 10 leading principal components. Blue line corresponds to a p < 10-6 value for reference.

77

Chapter 4

The association between urinary cadmium levels and risk of lung cancer in smokers from a prospective cohort study

Introduction On January 11, 1964, the U.S. Surgeon General released a ground breaking report titled ‘Smoking and Health’ which alerted the country to the causal association between cigarette smoking and lung cancer (91). This report was the first of its kind and set a trajectory of public health campaigns and research aimed to understand the carcinogenic constituents of tobacco and reduce the adverse health effects due to smoking. More than 55 years have elapsed since the release of the initial report, and numerous studies have been published that confirm the causal association between tobacco use and a variety of adverse human health effects with the strongest association being lung cancer. Yet recent surveys report that in 2017 more than 34.3 million U.S. adults were still smoking (92). Moreover, in 2019 an estimated 228,150 new cases (116,440 men and 111,710 women) of lung cancer are expected to be diagnosed in the U.S. making lung cancer the leading cause of cancer-related death in both men and women (1).

While cigarette smoking is the major risk factor for lung cancer, it is estimated that 11% of male and 24% of female smokers will develop the disease over their lifetime (4,5). There is a critical need to better understand factors contributing to the inter-individual susceptibility to lung cancer. Such research could improve our understanding of tobacco-related carcinogenesis and generate new insight for lung cancer prevention strategies. Racial/ethnic differences in lung cancer incidence provide powerful clues for understanding the stated variation in the risk of lung cancer in smokers. For example, studies using data from the Multiethnic Cohort (MEC) Study have shown that for the same number of cigarettes smoked, African American and Native Hawaiian smokers had

78

significantly higher risk of lung cancer as compared to Whites, while Latinos and Japanese Americans were at a lower risk (9,73). These differences in risk for smoking-related lung cancer may be attributable to the variability in the uptake and metabolism of a wide range of harmful chemical constituents in cigarette smoke, some of which are potent carcinogens. Therefore, studies have sought to understand these factors and how they contribute to the variations seen in lung cancer risk. Some studies have explored the variation in the metabolism of nicotine among different race/ethnicities to understand variations in smoking topography and exposure to tobacco carcinogens. These studies revealed that a person who can metabolize nicotine faster, smokes more extensively. Thus, these individuals experience greater nicotine exposure, after adjustment for cigarette consumption and, consequently, they are exposed to a greater amount of tobacco carcinogens (12,52,55,93). In addition, previous epidemiological studies of smokers have demonstrated that, independent of smoking intensity and duration, certain cigarette smoke constituents are associated with lung cancer risk For example, tobacco-related biomarkers such as nicotine uptake, nitrosamines, volatile organic compounds and polycyclic aromatic hydrocarbons have been found to be associated with lung cancer risk in smokers (60,93–95). Although, these findings give great insight into differences in lung cancer risk in smokers, they do not fully explain the racial/ethnic differences in susceptibility previously stated, suggesting that other factors may influence lung cancer risk in smokers. One carcinogenic constituent of interest in tobacco and cigarette smoke is cadmium (Cd).

Cadmium is a heavy metal classified by the International Agency for Research on Cancer (IARC) as a human lung carcinogen (Group 1) (13). In addition to being a constituent of cigarette smoke, Cd is also an industrial and environmental pollutant and the primary route of exposure to Cd is through inhalation. While the amount of Cd drawn from one cigarette is relatively small (~0.05 µg), the long-term exposure and accumulation of Cd from cigarette smoke and other sources is a health concern (22). The association between Cd exposure and lung cancer has been readily established in humans and rodents

79

(13,16) and toxicological and occupational studies confirm that Cd is a respiratory toxicant and lung carcinogen (14). Given that Cd is a known lung carcinogen, it is possible that differences in susceptibility to lung cancer in smokers could be explained, at least in part, by variations in Cd exposure. However, there are no studies of urinary Cd, a long-term biomarker of Cd exposure, and lung cancer risk in smokers similar to the studies conducted above.

Our previous studies on these smokers as discussed in Chapters 2 and 3, demonstrated that urinary Cd levels were associated with smoking intensity, and differed by race/ethnicity and that occupational exposure to Cd contributed to their urinary Cd levels (63,82). Accounting for these identified factors, in this study we explored the potential differences in urinary Cd between smokers within the MEC Study who developed lung cancer and those who remained lung cancer free.

Methods

Follow-up and Identification of lung cancer cases and deaths

Participants’ follow-up began at age of urine collection, for which urinary biomarkers Cd and TNE were measured and ended when one of the following events occurred: (1) diagnosis of lung cancer; (2) death; or (3) end of follow-up, December 31, 2016. Lung cancer cases were identified through linkages to two state-wide National Cancer Institute's (NCI) Surveillance, Epidemiology and End Results (SEER) Program registries: the Hawaii Tumor Registry and the California State Cancer Registry. International Classification of Disease for Oncology (ICD- O3) and ICD-10, C34 codes were used for this purpose. Deaths were identified by linkages to the state death certificate files in Hawaii and California, and to the National Death Index (NDI) for deaths occurring in other states. By the end of follow-up (median 12.4 years), 89 lung cancer cases were identified among the 1,956 current smokers at the time of urine collection who had complete urinary biomarker measurements and epidemiological data.

80

Statistical analysis

We evaluated the association of urinary Cd levels with lung cancer incidence using Cox proportional hazards models to estimate hazard ratios (HR) and 95% confidence intervals (95% CI), age was used as the time scale and follow-up began at urine collection. The distribution of urinary biomarkers used in the analyses were right skewed, and therefore were log-transformed to achieve approximately normal distribution: urinary Cd and urinary creatinine. Cox proportional hazards models were adjusted for potential confounders as presented in the direct acyclic graph in Supplemental Figure S8. Two models were used for the analyses: (Model 1) age at time of urine collection (months, continuous), sex (male/female), race/ethnicity (African American, Japanese American, Latino, Native Hawaiian and White), urinary creatinine (continuous, natural log), education (≤12th grade, vocational school/some college, ≥graduated college), and occupational exposure category (Likely exposed, Possibly exposed, Not Likely exposed and Unknown exposure) (82) and (Model 2) further adjusting for smoking dose (urinary TNE) and duration (average years of smoking; continuous). Urinary creatinine was included as an independent covariate in the model as opposed to using creatinine standardization (dividing urinary Cd by creatinine) because this approach has been shown to produce less biased effect estimates (65). Urinary Cd was modeled as a continuous, log-transformed variable and we present the HR for lung cancer associated with a one-log increase in urinary Cd levels. For further analyses, study participants were categorized into quartiles of urinary Cd levels according to the distribution of urinary Cd among the study participants (cutting at 25th, 50th and 75th percentiles). The lowest quartile (quartile 1) was used as the reference level of exposure. Quartile 1 was < 0.42 ng/mL, quartile 2 was 0.42 – 0.60 ng/mL, quartile 3 was 0.61 – 1.02 ng/mL and quartile 4 was > 1.02 ng/mL. Sensitivity analysis was conducted using tertiles and quintiles of urinary Cd and showed consistent increasing risk for increasing quantiles. Proportional hazards assumptions were visually examined using Schoenfeld residuals over time and a

81

test for a non-zero slope for this association (96). The Cox proportional hazards assumptions were met for all variables of interest.

All analyses were performed using Stata-IC Statistical Software (version 14; StataCorp LLC, College Station, TX, USA) and R Software (version 3.6.0; R Core Team, Vienna, Austria).

Results

A total of 1,956 MEC participants with an overall median of 63.7 years old at urine collection were included in this study. Table 17 presents the characteristics of the study population by status at the end of follow-up (December 31, 2016). At the end of follow-up among this population, there were 1,427 participants still alive without a lung cancer diagnosis, 440 participants died from other causes, and 89 lung cancer cases diagnosed. The median age at the end of follow-up for all participants was 75.2 years (IQR: 71.0, 80.5). The median time-interval between urine collection and cancer diagnosis was 6.9 years (IQR: 5.1-8.9). At the end of follow-up, the highest proportion of lung cancer diagnoses by race/ethnicity occurred among African Americans (7.5%), followed by Native Hawaiians (6.8%) and Whites (4.1%). The highest proportion of those still alive without lung cancer diagnosis were Japanese Americans (79.8%), followed by Latino (74%) and Native Hawaiians (71.7%). Among these participants, those alive at the end of follow-up had higher levels of education with 77.6% having at least graduated college whereas the highest proportion in deaths (27.9%) had less than a high school education and lung cancer cases had some college (5.1%). Participants who were diagnosed with lung cancer were more likely to report more years of smoking (median: 46.5), more cigarettes per day (median: 15), had higher levels of urinary TNE (median: 34.7 nmol/mL) and urinary Cd levels (median: 0.78 ng/mL) as compared to those participants not diagnosed with lung cancer. Participants who were alive without lung cancer at the end of follow-up were younger (median: 62.2), more likely to report fewer years of smoking (median 37.5), fewer cigarettes per day on average (median: 10), have lower levels of urinary TNE (median: 31.9 nmol/mL) and urinary Cd (median:

82

0.60 ng/mL) than those participants not diagnosed with lung cancer. By occupations, those likely exposed to Cd in the workplace make up the majority of those with lung cancer diagnosis (6.0%) followed by those possibly exposed (5.2%). Those still alive without diagnosis of lung cancer, were more likely to work in occupations not likely to be exposed to Cd (75.9%). The distribution of log-transformed urinary Cd by status at follow-up can be seen in Supplemental Figure S9.

[Table 17]

Overall higher urinary Cd levels were significantly associated with increased lung cancer risk suggesting a dose-response relationship (Table 18). A one unit increase in the log of urinary Cd, is associated with a 79.4% increase in the hazard for lung cancer (95% CI: 37.5%, 134.1%) after adjustment for age, sex, race/ethnicity, creatinine (natural log), education and occupational Cd exposure. After further adjustment for smoking measures, TNE and smoking duration, the association remained (HR = 68.9%; 95% CI: 26.0%, 126.4%). Additional information on the estimated HR of lung cancer by various characteristics using log-transformed urinary cadmium are provided in Supplementary Table S6. Consistent with previous findings in the entire MEC study, African American and Native Hawaiian smokers had the highest risk for lung cancer as compared to White smokers (HZ: 1.78; 95% CI: 0.90, 3.50 and HZ: 1.46; 95% CI: 0.69, 3.07, respectively), after adjustment for age, sex, race/ethnicity, creatinine (natural log), education, occupational Cd exposure and smoking variables (TNE and years of smoking). Those smokers with the highest level of education as compared to the lowest level of education had the lowest risk for lung cancer (HZ: 0.61; 95% CI: 0.30, 1.26). Further, consistent with higher levels of urinary Cd, those smokers who worked in occupations likely and possibly exposed to Cd had a higher risk of lung cancer as compared to those who were not likely to work in an occupation (HZ: 1.04; 95% CI: 0.41, 2.59 and HZ: 1.13; 95% CI: 0.60, 2.13). Sensitivity analysis was performed using non-

83

transformed urinary Cd and also demonstrated a positive dose-response relationship with a one unit increase in urinary Cd (Supplemental Table S7).

[Table 18]

Incidence of lung cancer increased significantly (p=0.043) in a monotonic fashion across quartiles of urinary Cd, with the highest quartile (fourth) having 30 participants diagnosed with lung cancer (Table 19). Looking at race/ethnicity groups across quartiles of urinary Cd, the racial/ethnic group with the highest proportion within the fourth quartile of urinary Cd (>1.02 ng/mL urinary Cd) was African Americans (37.5%) followed by Latinos (27.5%) and Native Hawaiians (23.1%). A higher proportion of those that had vocational school/some college ≥graduated college were in the highest quartile of urinary Cd. By occupation, the highest proportion within the fourth quartile were those who likely worked in an occupation that exposed them to Cd (32.9%). Participants in the fourth quartile of urinary Cd also reported smoking more cigarettes per day (median: 12). Urinary levels of TNE significantly increased with increasing quartiles of urinary Cd (p<0.001) with the fourth quartile having the highest urinary TNE levels (median: 58.2 nmol/mL).

[Table 19]

Similar to the analysis using continuous urinary Cd, the categorical analysis suggested a dose-response relationship between quartiles of urinary Cd and lung cancer risk. Table 20 provides the adjusted HR of lung cancer by quartiles of urinary Cd levels for three models with the reference category being the lowest quartile of urinary Cd (<0.42 ng/mL) measured for these participants. The multivariable adjusted HR for lung cancer increased by increasing quartile of urinary Cd for all models. Relative to the lowest quartile of urinary Cd levels, the adjusted HR for the highest quartile was 3.45 (95% CI: 1.73, 6.89; Model 2), and the third quartile was 2.02 (95% CI: 1.05, 3.89). The second quartile relative to the first quartile of urinary Cd levels was 1.21 (95% CI: 0.62, 2.36) although not statistically significantly different (p=0.477). Further adjustment for smoking variables, urinary TNE and years of smoking (Model 2) did not alter the direction

84

of the associations between quartiles of urinary Cd and lung cancer risk (Table 20).

[Table 20]

Discussion

The results of our study demonstrate that higher urinary Cd levels measured in urine samples collected before lung cancer diagnosis were associated with lung cancer risk in our sample of smokers. The strong positive association persisted after adjustment for occupational Cd exposure, internal smoking dose (TNE), years of smoking and other potential risk factors. These results suggest urinary Cd is an independent predictor of lung cancer risk in smokers after adjusting for measures of internal smoking dose (TNE), duration and occupational categories of potential Cd exposure. Our findings suggest that this biomarker may be a marker to help better identify smokers most susceptible to this disease.

In this same sample of smokers from the MEC Study, we previously observed significant differences in urinary Cd levels by race/ethnicity, with Latino smokers having the highest geometric mean levels followed by Native Hawaiian smokers after accounting for smoking contributions (63). We also demonstrated that occupational exposure to Cd contributed to urinary Cd levels in these same smokers and could partially explain higher levels of urinary Cd in Latinos (82). While the high levels of urinary Cd in Native Hawaiians may be reflective of their lung cancer risk, the relatively high levels of urinary Cd in Latinos and intermediate levels in Japanese Americans were not consistent with their risk found in previous studies (9,73). Therefore, these findings suggest additional factors contribute to some smokers being more sensitive to the carcinogen effects of Cd exposure. For example, although Latinos have the highest level of urinary Cd they have the lowest risk for lung cancer in our population of smokers. It is plausible to expect that Latinos are less sensitive to Cd’s damaging effects whereas Native Hawaiians are more susceptible lending support to a biological mechanism role. Our previous study investigated the association between

85

common genetic variants and urinary Cd in these same smokers but did not find a strong signal that could explain these differences by race/ethnicity (63). Nonetheless, combining the findings from our previous studies, we were able to show a strong positive association that urinary Cd is an independent predictor of lung cancer in our sample of smokers.

The observed association between Cd levels and lung cancer risk has been established and was the basis for IARC to conclude Cd as a Group 1 known human lung carcinogen (13). However, human evidence was based on relatively few studies and literature has focused on populations in either a high environmental exposure area or high occupational exposure populations (97,98). Furthermore, these studies posed a few noteworthy limitations including not being able to account for multiple factors that influence Cd exposure, the use of incomplete exposure measures, lack of a precise measure of smoking (if any at all) and relatively small sample sizes. For example, a study on male Cd recovery workers demonstrated a positive association in the overall risk for lung cancer but could not account for confounding effects due to smoking (97). Therefore, the risk of lung cancer in smokers could not be accounted for in the analyses. The present study was able to, in part, account for the primary factors known to contribute to urinary Cd: occupational Cd exposure and smoking dose (urinary TNE) and duration (years of smoking) and revealed an overall 1.69 (95% CI: 1.26, 2.26) greater risk of lung cancer in smokers for a 1-log increase in urinary Cd. When presented by quartiles of urinary Cd, the hazard for lung cancer increased by quartile of urinary Cd, with the highest hazard reflected in the highest quartile of urinary Cd (HZ: 2.76; 95% CI: 1.40, 5.42). This strong positive association provides evidence that urinary Cd levels could be used as a proxy to identify smokers who are most susceptible to lung cancer.

Strengths of this study include the prospective design using a well- characterized ethnic study population with a long follow-up time (maximum 20 years) to assess lung cancer incidence. This cohort uses regular linkages to registries which are part of the SEER program of NCI's Surveillance, state death certificate files and National Death Index and follow-up questionnaires to help

86

reduce differential loss to follow-up common in large prospective studies (72). In addition, urinary Cd was measured on urine samples collected before lung cancer diagnosis (median 6.9 years) from current smokers in a large multiethnic cohort. As cigarette smoking is the major risk factor for lung cancer, accurate measures of tobacco exposure are essential to gauge risk. However, it is known that assessing smoking exposure using questionnaires are subject to misclassification and recall bias (99). All participants in this study had biomarkers measured for smoking intensity (TNE) which allowed us to accurately adjust for the confounding effects due to variations of nicotine metabolism affecting one’s internal nicotine dose. Furthermore, in addition to demographic and smoking variables (urinary TNE and years of smoking), which were identified by previous studies to be associated with urinary Cd levels in these smokers, we incorporated adjustment for occupational Cd exposure categories (Likely exposed, Possibly exposed, Not Likely exposed and Unknown exposure) in the multivariable analyses (82).

The findings of this study must be interpreted in light of some limitations. This study only included a modest number of lung cancer cases, and more follow-up time is warranted. In addition, we recognize not all factors that could potentially confound the relationship between urinary Cd and lung cancer risk may have been included. For example, diet can be a source of Cd in the general non-smoking population and dietary contributions were not included in this analysis. Diet was considered for analyses however; these participants were all current smokers at the time of urine collection and dietary contributions would likely get attenuated. In addition, dietary recall was assessed on average 10 years before urine sample was collected and therefore is likely not reflective of their dietary intake at urine collection. Further misclassification in the occupational exposure to Cd categories could be introduced into the study, as occupational exposure was captured on a questionnaire that characterized broad occupational categories and exposure to Cd was not directly ascertained. Further studies should address these limitations by employing rigorous measures of dietary, occupational and environmental exposures to Cd.

87

In conclusion, this study demonstrated a positive association between urinary Cd and increased risk of lung cancer in current smokers from the MEC Study. These findings are independent from measures of smoking dose, duration and occupational Cd exposure. These findings further support evidence that Cd is an independent predictor of lung cancer risk in smoker, and suggest that this biomarker may be a marker to help better identify smokers most susceptible to this disease.

88

Table 17. Characteristics of the study population by status at the end of follow-up, December 31, 2016 (n=1,956)a

Overall study Alive without lung Death from other Lung Cancer participants cancer causes Cases N=1,956 N=1,427 N=440 N=89 Person-years 12.4 (10.9, 13.3) 12.9 (12.0, 13.6) 7.5 (4.6, 10.6) 6.9 (5.1, 8.9) Age at urine collection 63.7 (59.3, 69.5) 62.2 (58.6, 67.0) 69.0 (63.2, 74.5) 66.2 (62.2, 72.7) Sex, N(%) Male 935 643 (69.5) 239 (25.8) 43 (4.6) Female 1,031 784 (76.0) 201 (19.5) 46 (4.5) Race, N(%) African American 280 182 (65.0) 77 (27.5) 21 (7.5) Native Hawaiian 294 209 (71.1) 65 (22.1) 20 (6.8) Japanese American 584 466 (79.8) 95 (16.3) 23 (3.9) Latino 411 304 (74.0) 98 (23.8) 9 (2.2) White 387 266 (68.7) 105 (27.1) 16 (4.1) Maximum education, N(%) ≤12th grade 785 527 (67.1) 219 (27.9) 39 (5.0) Vocational school/Some college 681 520 (76.4) 126 (18.5) 35 (5.1) ≥ Graduated college 490 380 (77.6) 95 (19.4) 15 (3.1) Occupational Cd exposure, N(%) Likely exposed 149 106 (71.1) 34 (22.8) 9 (6.0) Possibly exposed 343 244 (71.1) 81 (23.6) 18 (5.2) Not Likely exposed 1,106 840 (75.9) 218 (19.7) 48 (4.3) Unknown exposure 358 237 (66.2) 107 (29.9) 14 (3.9) Years of smoking 43.5 (34.5, 46.5) 37.5 (34.5, 46.5) 45.5 (36.5, 50.0) 46.5 (41.5, 50.0) Average CPD 10 (6, 20) 10 (6, 20) 12 (7, 20) 15 (10, 20) Urinary TNE, nmol/mL 32.4 (19.6, 52.8) 31.9 (18.3, 52.2) 34.6 (20.8, 54.3) 34.7 (24.3, 53.7) Urinary Cd, ng/mL 0.60 (0.36, 1.02) 0.60 (0.36, 0.96) 0.60 (0.42, 1.02) 0.78 (0.42, 1.26) CPD: cigarettes per day; TNE: total nicotine equivalents aValues are presented as number and percentage of participants in each row category, n(%) or median (25th, and 75th percentile) in each column category

89

Table 18. Estimated hazard ratio (95% CI) of lung cancer by one-log increase in urinary cadmium among 1,956 participants from the MEC Study (89 cases)

Unadjusted Model 1a Model 2b HZ (95% CI) HZ (95% CI) HZ (95% CI) Cadmium (natural log of ng/mL) 1.38 (1.06, 1.81) 1.79 (1.38, 2.34) 1.69 (1.26, 2.26) aadjusted for race/ethnicity, sex, age at urine collection, creatinine (natural log), education and occupational exposure category bfurther adjusted for urinary TNE and average years of smoking (Model AIC: 1271.35)

90

Table 19. Characteristics of the study population overall and by quartiles of urinary cadmium levels (N=1,956)a

Quartiles of urinary cadmium (ng/mL) Overall Study Quartile 1 Quartile 2 Quartile 3 Quartile 4 Participants <0.42 0.42 - 0.60 0.61 - 1.02 >1.02 N=1,956 N=569 N=474 N=471 N=442 Cases of lung cancer, N(%) 89 19 (3.3) 17 (3.6) 23 (4.9) 30 (6.8) Sex, N(%) Male 925 255 (27.6) 206 (22.3) 238 (25.7) 226 (24.4) Female 1031 314 (30.5) 268 (26.0) 233 (22.6) 216 (21.0) Age at urine collection 63.7 (59.3, 69.5) 63.4 (59.0, 69.5) 63.8 (59.4, 69.4) 64.0 (59.5, 70.1) 63.6 (59.3, 68.7) Race, N(%) African American 280 43 (15.4) 60 (21.4) 72 (25.7) 105 (37.5) Native Hawaiian 294 89 (30.3) 74 (25.2) 63 (21.4) 68 (23.1) Japanese American 584 208 (35.6) 144 (24.7) 139 (23.8) 93 (15.9) Latino 411 87 (21.2) 92 (22.4) 119 (29.0) 113 (27.5) White 387 142 (36.7) 104 (26.9) 78 (20.2) 63 (16.3) Maximum education, N(%) ≤12th grade 785 197 (25.1) 195 (24.8) 210 (26.8) 184 (23.4) Vocational/some college 681 210 (30.8) 157 (23.1) 145 (21.3) 169 (24.8) ≥ Graduated college 490 162 (33.1) 122 (24.9) 117 (23.9) 89 (18.2) Occupational Cd exposure, N(%) Likely exposed 149 35 (23.5) 25 (16.8) 42 (28.2) 49 (32.9) Possibly exposed 343 78 (22.7) 76 (22.2) 102 (29.7) 91 (26.5) Not Likely exposed 1,106 358 (32.4) 283 (25.6) 252 (22.8) 223 (20.2) Unknown exposure 358 105 (29.3) 96 (26.8) 79 (21.1) 83 (23.2) Years of smoking 43.5 (34.5, 46.5) 38.5 (34.5, 46.5) 42.5 (34.5, 46.5) 44.5 (35.5, 47.0) 43.5 (34.5, 46.5) Average CPD 10 (6, 20) 10 (5, 20) 10 (5, 20) 10 (7, 20) 12 (7.1, 20) Urinary TNE, nmol/mL 32.4 (19.6, 52.8) 19.3 (11.3, 28.5) 30.5 (20.2, 44.2) 39.9 (27.4, 60.0) 58.2 (39.8, 89.8) CPD: cigarettes per day; TNE: total nicotine equivalents; Cd: cadmium. aValues are presented as number and percentage of participants in each row category, n(%) or median (25th, and 75th percentile) in each column category

91

Table 20. Estimated hazard ratio (95% CI) of lung cancer by quartiles of urinary cadmium levels among a subset of participants from the MEC Study (89 cases)

Unadjusted Model 1a Model 2b N HZ (95% CI) HZ (95% CI) HZ (95% CI) Quartile 1 (<0.42 ng/mL) 569 1.00 (reference) 1.00 (reference) 1.00 (reference) Quartile 2 (0.42 - 0.60 ng/mL) 474 1.10 (0.57, 2.12) 1.28 (0.65, 2.51) 1.22 (0.62, 2.38) Quartile 3 (0.61 - 1.02 ng/mL) 471 1.49 (0.81, 2.75) 2.15 (1.13, 4.10) 2.03 (1.06, 3.90) Quartile 4 (>1.02 ng/mL) 442 2.06 (1.16, 3.67) 3.85 (2.02, 7.33) 3.52 (1.83, 6.78) aadjusted for race/ethnicity, sex, age at urine collection, creatinine, education and occupational exposure category bfurther adjusted for urinary TNE and average years of smoking

92

Figure 14. (Supplemental Figure S8) Direct acyclic graph for the association between urinary cadmium and lung cancer risk

93

Figure 15. (Supplemental Figure S9). Median levels of urinary cadmium (natural log of ng/mL) of participants at time of urine collection by status as of December 31, 2016: alive with no lung cancer (n=1,427), death from other causes (n=440), and lung cancer diagnosis (n=89). The box represents the interquartile range (25th and 75th percentile), the dark line across the box represents the median value (50th percentile), the bottom and top whisker represents the first and 99th percentile and the circles above and below the whiskers represent outliers (>1.5x and <3x the interquartile range).

94

Table 21. (Supplementary S6) Estimated hazard ratio (95% CI) of lung cancer levels by various characteristics adjusted for log-transformed urinary cadmium and other covariates

HZ (95% CI)a Females vs Males 0.74 (0.46, 1.22) African American vs White 1.78 (0.90, 3.50) Native Hawaiian vs White 1.46 (0.69, 3.07) Japanese American vs White 0.76 (0.40, 1.45) Latino vs White 0.38 (0.15, 0.91) Vocational/some college vs ≤12th grade 1.01 (0.60, 1.69) ≥Graduated college vs ≤12th grade 0.61 (0.30, 1.26) Total nicotine equivalents (TNE) 1.00 (0.99, 1.01) Years of smoking 1.06 (1.02, 1.10) Likely exposed to Cd in the workplace vs Not Likely 1.04 (0.41, 2.59) Possibly exposed to Cd in the workplace vs Not Likely 1.13 (0.60, 2.13) Unknown exposure to Cd in the workplace vs Not Likely 0.81 (0.43, 1.51) aadjusted for race/ethnicity, sex, age at urine collection, education, total nicotine equivalents, years of smoking, urinary Cd (natural log), creatinine (natural log) and occupational Cd exposure Model AIC: 1271.35

95

Table 22. (Supplementary S7) Estimated hazard ratio (95% CI) of lung cancer levels by various characteristics adjusted for continuous (non-transformed) urinary cadmium and other covariates

HZ (95% CI)a Continuous cadmium (non-transformed) 1.35 (1.06, 1.72) Females vs Males 0.84 (0.52, 1.35) African American vs White 1.35 (1.06, 1.72) Native Hawaiian vs White 1.58 (0.77, 3.26) Japanese American vs White 0.75 (0.39, 1.42) Latino vs White 0.45 (0.18, 1.12) Vocational/some college vs ≤12th grade 1.03 (0.62, 1.71) ≥Graduated college vs ≤12th grade 0.63 (0.31, 1.29) Total nicotine equivalents (TNE) 1.00 (1.00, 1.01) Years of smoking 1.06 (1.03, 1.10) Likely exposed to Cd in the workplace vs Not Likely 1.24 (0.55, 2.80) Possibly exposed to Cd in the workplace vs Not Likely 1.21 (0.64, 2.27) Unknown exposure to Cd in the workplace vs Not Likely 0.80 (0.43, 1.49) aadjusted for race/ethnicity, sex, age at urine collection, education, total nicotine equivalents, years of smoking, urinary Cd, creatinine, and occupational Cd exposure Model AIC: 1281.19

96

Chapter 5

Conclusions and Future Direction

Overall, our results indicate that urinary Cd is an important determinant of lung cancer risk in smokers from the MEC Study and future studies investigating lung cancer risk will benefit from these findings. Specific Aim 1 demonstrated that at similar levels of smoking, smokers in occupations which have the potential for Cd exposure had higher levels of urinary Cd. This suggest workers in those occupations who also smoke may be at an increased risk for lung cancer and should be targeted for intervention measures. Specific Aim 2 did not find differences in urinary Cd levels to be attributed to common genetic variants and additional studies are warranted. As stated above, urinary Cd is stored in the kidney and possible variants involved in the elimination of Cd may be driven by genes related to kidney function. Additional measurements of Cd in blood could add some insight to differences in the active metabolic pathways. Specific Aim 3 demonstrated that levels of urinary Cd were associated with lung cancer incidence in current smokers at time of urine collection from the MEC study and this association persisted after adjustment for occupational Cd exposure, internal smoking dose (TNE), years of smoking and other potential risk factors. Public health efforts to reduce Cd exposure including tobacco cessation programs, reducing the environmental and industrial impact of Cd, and the implementation of educating smokers in occupations that pose a high risk for Cd exposure, may contribute to the reduction of lung cancer in the future.

Future research efforts will polymorphisms in the homologous deletion of GSTT1 and GSTM1 genes, specifically looking at null genotypes (copy number: 0/0). Individuals with the null genotype lack the GST enzyme, and studies have suggested that the distribution of these genotypes vary among different ethnic groups. For example, using this same sample of smokers, researchers from the MEC Study found that GSTT1 and GSTM1 null genotype explained ~22% of the

97

variation in S-phenylmercapturic acid (SPMA) levels, a specific urinary biomarker for benzene uptake, which ranged across racial/ethnic groups from 14.4% in African Americans to 33.0% in Native Hawaiians (87). When researchers combined the effects of both GSTT1 and GSTM1 null genotypes, this demonstrated a 10-fold difference in the levels of SPMA by race/ethnicity. The lowest levels were in Japanese Americans who mostly carried null genotypes for both deletions (43.8% and 50.4% of Japanese American participants had null genotypes (copy number: 0/0) for the GSTT1 and GSTM1, respectively) and highest among African Americans who were mainly wild-type for both (52.9% of African American participants carried 1/0 of GSTT1 and 47.6% carried 1/0 of GSTM1). Similar results were found in another study reporting an association between the deletion of GSTT1 and GSTM1 genes and urinary metabolites of 1,3-Butadiene, a carcinogen in tobacco smoke (100). This lends support to a logical hypothesis that there could be corresponding effects of GSTT1 and GSTM1 gene deletion on the levels of Cd in urine because GSTs are suggested to play a role in the detoxification of heavy metals such as Cd.

Furthermore, future research efforts will explore urinary Cd and lung cancer risk on a larger population of smokers. An alternative approach was used for Specific Aim 3, where originally it was proposed to develop a model that combines factors contributing to urinary Cd levels, as determined in Specific Aims 1 and 2, to impute a predicted urinary Cd for an additional ~8,000 MEC participants who were current smokers with genetic data but did not have urinary Cd measurements available. Instead, access to data was limited because in our study we found that urinary Cd was highly associated with smoking dose (urinary TNE) and unfortunately the same measures that would have been used to impute a predicted urinary Cd were also used to impute a predicted urinary TNE for the ~8,000 additional smokers. Therefore, the available data on 1,956 smokers with complete data was used to investigate the association between urinary Cd and lung cancer risk.

Additional studies by other research teams will hopefully be inspired by the findings of this study, helping to better understand the factors contributing to

98

inter-individual differences in exposure to, and urinary excretion of Cd, and the associated with lung cancer risk in smokers and in the general population.

99

Bibliography

1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019 Jan 8;69(1):7–34. 2. National Cancer Institute. Financial Burden of Cancer Care | Cancer Trends Progress Report, 2018 [Internet]. 2019 [cited 2019 Apr 1]. Available from: https://progressreport.cancer.gov/after/economic_burden 3. Mariotto AB, Robin Yabroff K, Shao Y, Feuer EJ, Brown ML. Projections of the Cost of Cancer Care in the United States: 2010-2020. JNCI J Natl Cancer Inst. 2011 Jan 19;103(2):117–28. 4. Centers for Disease Control and Prevention. What Are the Risk Factors for Lung Cancer? [Internet]. 2015 [cited 2017 Apr 11]. Available from: https://www.cdc.gov/cancer/lung/basic_info/risk_factors.htm 5. IARC. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. Tobacco Smoke and Involuntary Smoking. France; 2004. 6. American Cancer Society. Key Statistics for Small Cell Lung Cancer [Internet]. [cited 2019 Apr 1]. Available from: https://www.cancer.org/cancer/small-cell-lung-cancer/about/key- statistics.html 7. American Lung Association. Lung Cancer Fact Sheet [Internet]. 2016 [cited 2017 Jun 14]. Available from: http://www.lung.org/lung-health-and- diseases/lung-disease-lookup/lung-cancer/resource-library/lung-cancer- fact-sheet.html 8. American Lung Association. Too Many Cases, Too Many Deaths: Lung Cancer in African Americans [Internet]. 2010 [cited 2017 Jun 15]. Available from: http://www.lung.org/assets/documents/research/ala-lung-cancer-in- african.pdf 9. Haiman CA, Stram DO, Wilkens LR, Pike MC, Kolonel LN, Henderson BE, et al. Ethnic and Racial Differences in the Smoking-Related Risk of Lung Cancer. N Engl J Med. 2006;354(4):333–42. 10. Clegg LX, Reichman ME, Miller BA, Hankey BF, Singh GK, Lin YD, et al. Impact of socioeconomic status on cancer incidence and stage at diagnosis: selected findings from the surveillance, epidemiology, and end results: National Longitudinal Mortality Study. Cancer Causes Control. 2009 May 12;20(4):417–35. 11. Torre LA, Siegel RL, Jemal A. Lung Cancer Statistics. In: Advances in Experimental Medicine and Biology. Springer International Publishing Switzerland; 2016. p. 1–19. 12. Park SL, Tiirikainen MI, Patel YM, Wilkens LR, Stram DO, Le Marchand L,

100

et al. Genetic determinants of CYP2A6 activity across racial/ethnic groups with different risks of lung cancer and effect on their smoking intensity. Carcinogenesis. 2016 Mar;37(3):269–79. 13. IARC. Cadmium and Cadmium Compounds. IARC Monogr. 2012;100C:121–45. 14. Agency for Toxic Substances and Disease Registry (ATSDR). Toxicological Profile: Cadmium [Internet]. 1999. Available from: https://www.atsdr.cdc.gov/toxprofiles/tp.asp?id=48&tid=15 15. Mannino DM. Urinary cadmium levels predict lower lung function in current and former smokers: data from the Third National Health and Nutrition Examination Survey. Thorax. 2004 Mar 1;59(3):194–8. 16. National Toxicology Program (NTP). Report on Carcinogens, Fourteenth Edition [Internet]. Research Triangle Park, NC: U.S.; 2016 [cited 2017 May 12]. Available from: https://ntp.niehs.nih.gov/go/roc14 17. Liu J, Qu W, Kadiiska MB. Role of oxidative stress in cadmium toxicity and carcinogenesis. Toxicol Appl Pharmacol. 2009 Aug 1;238(3):209–14. 18. Manca D, Ricard AC, Van Tra H, Chevalier G. Relation between lipid peroxidation and inflammation in the pulmonary toxicity of cadmium. Arch Toxicol. 1994;68(6):364–9. 19. Nawrot TS, Van Hecke E, Thijs L, Richart T, Kuznetsova T, Jin Y, et al. Cadmium-related mortality and long-term secular trends in the Cadmium body burden of an Environmentally exposed population. Environ Health Perspect. 2008 Dec;116(12):1620–8. 20. IARC. Working Group on the Evaluation of Carcinogenic Risks to Humans. Beryllium, cadmium, mercury, and exposures in the glass manufacturing industry. 1993;58. 21. Office of the Federal Register. ATSDR – Priority List of Hazardous Substances [Internet]. Federal Register 79:102. 2014 [cited 2017 Mar 8]. p. 30613. Available from: https://www.atsdr.cdc.gov/spl/ 22. Järup L, Åkesson A. Current status of cadmium as an environmental health problem. Toxicol Appl Pharmacol. 2009;238:201–8. 23. Nordberg GF, Nogawa KA, Nordberg M. Cadmium. In: Handbook on the Toxicology of Metals. 2015. p. 667–716. 24. Paschal DC, Burt V, Caudill SP, Gunter EW, Pirkle JL, Sampson EJ, et al. Exposure of the U.S. population aged 6 years and older to cadmium: 1988- 1994. Arch Environ Contam Toxicol. 2000;38(3):377–83. 25. Richter PA, Bishop EE, Wang J, Swahn MH. Tobacco Smoke Exposure and Levels of Urinary Metals in the U . S . Youth and Adult Population : The National Health and Nutrition Examination Survey ( NHANES ) 1999 –

101

2004. Int J Environ Res Public Health. 2009;1930–46. 26. Centers for Disease Control and Prevention. Fourth National Report on Human Exposure to Environmental Chemicals, Updated Tables, January 2017, Volume Two [Internet]. Atlanta; 2017 [cited 2017 Mar 7]. Available from: https://www.cdc.gov/biomonitoring/pdf/FourthReport_UpdatedTables_Volu me2_Jan2017.pdf 27. Centers for Disease Control and Prevention. Fourth National Report on Human Exposure to Environmental Chemicals Special Analysis of Pooled Samples for Select Chemicals Background [Internet]. 2019 [cited 2019 Apr 2]. Available from: http://www.ncbi.nlm.nih.gov/pubmed/17618673 28. Occupational Safety and Health Administration (OSHA). Title 29 CFR, Cadmium Standard §1910.1027 [Internet]. 1992 [cited 2017 Aug 2]. Available from: https://www.osha.gov/SLTC/cadmium/ 29. Occupational Safety and Health Administration (OSHA). Occupational exposure to cadmium; proposed rule 29 CFR Part 1910: 4052-4147. 30. Yassin AS, Martonik JF. Urinary cadmium levels in the U S working population, 1988-1994. J Occup Environ Hyg. 2004;1(5):324–33. 31. Tellez-Plaza M, Navas-Acien A, Caldwell KL, Menke A, Muntner P, Guallar E. Reduction in cadmium exposure in the United States population, 1988- 2008: The contribution of declining smoking rates. Environ Health Perspect. 2012 Feb;120(2):204–9. 32. Nazar R, Iqbal N, Masood A, Khan MIR, Syeed S, Khan NA. Cadmium Toxicity in Plants and Role of Mineral Nutrients in Its Alleviation. Am J Plant Sci. 2012;03(10):1476–89. 33. Nawrot T, Plusquin M, Hogervorst J, Roels HA, Celis H, Thijs L, et al. Environmental exposure to cadmium and risk of cancer: a prospective population-based study. Lancet Oncol. 2006 Feb;7(2):119–26. 34. Agency for Toxic Substances and Disease Registry (ATSDR). Toxicological Profile for Cadmium [Internet]. Agency for Toxic Substances and Disease Registry, Public Health Service- U.S. Department of Health and Human Services. 2012 [cited 2017 Sep 7]. Available from: https://www.atsdr.cdc.gov/toxprofiles/tp5.pdf 35. Järup L, Berglund M, Elinder C, Nordberg G, Vanter M. Health effects of cadmium exposure - a review of the literature and a risk estimate. Scand J Work Environ Health. 1998;24(3):1–51. 36. Riederer AM, Belova A, George BJ, Anastas PT. Urinary Cadmium in the 1999–2008 U.S. National Health and Nutrition Examination Survey (NHANES). Environ Sci Technol. 2013;47(2):1137–47. 37. Menke A, Muntner P, Silbergeld EK, Platz EA, Guallar E. Cadmium Levels

102

in Urine and Mortality among U.S. Adults. Environ Health Perspect. 2009 Feb;117(2):190–6. 38. McElroy JA, Shafer MM, Hampton JM, Newcomb PA. Predictors of urinary cadmium levels in adult females. Sci Total Environ. 2007 Sep 1;382(2– 3):214–23. 39. Tellez-Plaza M. Reduction in Cadmium Exposure in the United States Population, 1988-2008: The Contribution of Declining Smoking Rates. Env Heal Perspect. 2012;120:204–9. 40. Tellez-plaza M, Navas-acien A, Caldwell KL, Menke A, Muntner P. Reduction in Cadmium Exposure in the United States Population , 1988 – 2008 : Env Heal Perspect. 2012;204(2):1–4. 41. Valko M, Morris H, Cronin M. Metals, Toxicity and Oxidative Stress. Curr Med Chem. 2005 May 1;12(10):1161–208. 42. Khansakorn N, Wongwit W, Tharnpoophasiam P, Hengprasith B, Suwannathon L, Chanprasertyothin S, et al. Genetic Variations of Glutathione S-Transferase Influence on Blood Cadmium Concentration. J Toxicol. 2012;2012:1–6. 43. Björkman L, Vahter M, Pedersen NL. Both the environmental and genes are important for concentrations of cadmium and lead in blood. Environ Health Perspect. 2000;108(8):719–22. 44. Ford JG. Glutathione S-transferase M1 polymorphism and lung cancer risk in African-Americans. Carcinogenesis. 2000 Nov 1;21(11):1971–5. 45. Singhal RK, Anderson ME, Meister A. Glutathione, a first line of defense against cadmium toxicity. FASEB J. 1987 Sep;1(3):220–3. 46. Kayaaltı Z, Aliyev V, Söylemezoğlu T, Kayaalti Z, Aliyev V, Söylemezoǧlu T. The potential effect of metallothionein 2A -5 A/G single nucleotide polymorphism on blood cadmium, lead, zinc and copper levels. Toxicol Appl Pharmacol. 2011 Oct;256(1):1–7. 47. Klaassen CD, Liu J, Diwan BA. Metallothionein protection of cadmium toxicity. Toxicol Appl Pharmacol. 2009 Aug 1;238(3):215–20. 48. Nordberg GF. Cadmium and health in the 21st Century – historical remarks and trends for the future. BioMetals. 2004 Oct;17(5):485–9. 49. Ng E, Lind PM, Lindgren C, Ingelsson E, Mahajan A, Morris A, et al. Genome-wide association study of toxic metals and trace elements reveals novel associations. Hum Mol Genet. 2015 Aug 15;24(16):4739–45. 50. Borné Y, Söderholm M, Barregard L, Fagerberg B, Persson M, Melander O, et al. Genome wide association study identifies two loci associated with cadmium in erythrocytes among never-smokers. Hum Mol Genet. 2016 Jun 1;25(11):2342–8.

103

51. National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health. The Health Consequences of Smoking—50 Years of Progress [Internet]. The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General. 2014 [cited 2019 Apr 25]. Available from: http://www.ncbi.nlm.nih.gov/pubmed/24455788 52. Derby KS, Cuthrell K, Caberto C, Carmella SG, Franke AA, Hecht SS, et al. Nicotine Metabolism in Three Ethnic/Racial Groups with Different Risks of Lung Cancer. 2008;17(12):3526–35. 53. Le Marchand L, Wilkens LR, Kolonel LN. Ethnic differences in the lung cancer risk associated with smoking. Cancer Epidemiol Biomarkers Prev. 1992;1(2):103–7. 54. Kolonel LN, Henderson BE, Hankin JH, Nomura AMY, Wilkens LR, Pike MC, et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. Am J Epidemiol. 2000 Feb 15;151(4):346–57. 55. Murphy SE, Park S-SL, Thompson EF, Wilkens LR, Patel Y, Stram DO, et al. Nicotine N-glucuronidation relative to N-oxidation and C-oxidation and UGT2B10 genotype in five ethnic/racial groups. Carcinogenesis. 2014 Nov 1;35(11):2526–33. 56. Park SL, Carmella SG, Ming X, Vielguth E, Stram DO, Le Marchand L, et al. Variation in levels of the lung carcinogen NNAL and its glucuronides in the urine of cigarette smokers from five ethnic groups with differing risks for lung cancer. Cancer Epidemiol Biomarkers Prev. 2015 Mar;24(3):561–9. 57. Park SL, Carmella SG, Chen M, Patel Y, Stram DO, Haiman CA, et al. Mercapturic acids derived from the toxicants acrolein and crotonaldehyde in the urine of cigarette smokers from five ethnic groups with differing risks for lung cancer. PLoS One. 2015;10(6):124841. 58. Patel YM, Stram DO, Wilkens LR, Park S-SL, Henderson BE, Le Marchand L, et al. The contribution of common genetic variation to nicotine and cotinine glucuronidation in multiple ethnic/racial populations. Cancer Epidemiol Biomarkers Prev. 2015;24(1):119–27. 59. Patel YM, Park SL, Carmella SG, Paiano V, Olvera N, Stram DO, et al. Metabolites of the polycyclic aromatic hydrocarbon phenanthrene in the urine of cigarette smokers from five ethnic groups with differing risks for lung cancer. Costa M, editor. PLoS One. 2016 Jun 8;11(6):e0156203. 60. Park SL, Murphy SE, Wilkens LR, Stram DO, Hecht SS, Le Marchand L. Association of CYP2A6 activity with lung cancer incidence in smokers: The Multiethnic Cohort Study. Niaura R, editor. PLoS One. 2017 May 25;12(5):e0178435. 61. Centers for Disease Control and Prevention. Laboratory Procedure Manual for Urine Multi-Element ICP-DRC-MS Method: 3018.4 (15 element panel)

104

and 3018A.3 (total arsenic) [Internet]. 2013. Available from: https://wwwn.cdc.gov/nchs/data/nhanes/2013- 2014/labmethods/UM_UMS_UTAS_UTASS_H_MET.pdf 62. Isermann J, Prager HM, Ebbinghaus R, Janasik B, Wasowicz W, Dufaux B, et al. Urinary cadmium levels in active and retired coal miners. J Toxicol Environ Heal - Part A Curr Issues. 2017 Apr 18;80(7–8):405–10. 63. Cigan SS. Genome-wide and candidate gene association study of urinary cadmium levels in smokers from the Multiethnic Cohort Study. 2019;(in draft). 64. Tobin J. Estimation of Relationships for Limited Dependent Variables. Econometrica. 1958;26(1):24. 65. Barr DB, Wilder LC, Caudill SP, Gonzalez AJ, Needham LL, Pirkle JL. Urinary creatinine concentrations in the U.S. population: Implications for urinary biologic monitoring measurements. Environ Health Perspect. 2005 Feb;113(2):192–200. 66. Olsson IM, Bensryd I, Lundh T, Ottosson H, Skerfving S, Oskarsson A. Cadmium in blood and urine - Impact of sex, age, dietary intake, iron status, and former smoking - Association of renal effects. Vol. 110, Environmental Health Perspectives. 2002. p. 1185–90. 67. Adams S V, Newcomb PA. Cadmium blood and urine concentrations as measures of exposure: NHANES 1999-2010. J Expo Sci Environ Epidemiol. 2014;24(2):163–70. 68. Wang J, Liang Q, Mendes P, Sarkar M. Is 24h nicotine equivalents a surrogate for smoke exposure based on its relationship with other biomarkers of exposure? Biomarkers. 2011;16(2):144–54. 69. Benowitz NL, Dains KM, Dempsey D, Yu L, Jacob P. Estimation of nicotine dose after low-level exposure using plasma and urine nicotine metabolites. Cancer Epidemiol Biomarkers Prev. 2010;19(5):1160–6. 70. Campaign for Tobacco-Free Kids. Tobacco and Socioeconomic Status [Internet]. Washington, D.C.; 2016 [cited 2019 May 30]. Available from: www.tobaccofreekids.org 71. Hecht SS, Yuan J-M, Hatsukami D. Applying tobacco carcinogen and toxicant biomarkers in product regulation and cancer prevention. Chem Res Toxicol. 2010 Jun 21;23(6):1001–8. 72. University of Hawaii Cancer Center. Multiethnic Cohort Study (MEC) [Internet]. [cited 2018 Dec 15]. Available from: http://www.uhcancercenter.org/research/mec 73. Stram DO, Park SL, Haiman CA, Murphy SE, Patel Y, Hecht SS, et al. Racial/Ethnic Differences in Lung Cancer Incidence in the Multiethnic Cohort Study: An Update. JNCI J Natl Cancer Inst. 2019;111(8).

105

74. Kayaalti Z, Mergen G, Söylemezoǧlu T. Effect of metallothionein core promoter region polymorphism on cadmium, zinc and copper levels in autopsy kidney tissues from a Turkish population. Toxicol Appl Pharmacol. 2010;245(2):252–5. 75. Joseph P. Mechanisms of cadmium carcinogenesis. Toxicol Appl Pharmacol. 2009;238(3):272–9. 76. Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nat Methods. 2012 Feb 4;9(2):179–81. 77. Howie BN, Donnelly P, Marchini J. A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies. Schork NJ, editor. PLoS Genet. 2009 Jun 19;5(6):e1000529. 78. Altshuler DL, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, et al. A map of variation from population-scale sequencing. Nature. 2010 Oct 27;467(7319):1061–73. 79. NCBI. NCBI’s genome browser for human (Homo sapiens) - Genome Data Viewer [Internet]. [cited 2019 Apr 30]. Available from: https://www.ncbi.nlm.nih.gov/genome/gdv/ 80. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, et al. PLINK: Whole genome data analysis toolset. Am J Hum Genet [Internet]. 2007 [cited 2019 Mar 25];81:559–75. Available from: http://zzz.bwh.harvard.edu/plink/contact.shtml 81. R Core Team. R: A language and envrionment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. 82. Cigan SS, Park SL, Murphy SE, Alexander BH, Stepanov I. Relationship between urinary cadmium levels and occupation in smokers from the Multiethnic Cohort Study. 2019;(in draft). 83. Adams S V., Barrick B, Christopher EP, Shafer MM, Makar KW, Song X, et al. Genetic variation in metallothionein and metal-regulatory transcription factor 1 in relation to urinary cadmium, copper, and zinc. Toxicol Appl Pharmacol. 2015 Dec;289(3):381–8. 84. Rentschler G, Kippler M, Axmon A, Raqib R, Skerfving S, Vahter M, et al. Cadmium concentrations in human blood and urine are associated with polymorphisms in zinc transporter genes. Metallomics. 2014;6(4):885–91. 85. Lei L, Chang X, Rentschler G, Tian L, Zhu G, Chen X, et al. A polymorphism in metallothionein 1A (MT1A) is associated with cadmium- related excretion of urinary beta 2-microglobulin. Toxicol Appl Pharmacol. 2012 Dec;265(3):373–9. 86. Kayaaltı Z, Aliyev V, Söylemezoğlu T. The potential effect of metallothionein 2A −5 A/G single nucleotide polymorphism on blood cadmium, lead, zinc and copper levels. Toxicol Appl Pharmacol. 2011

106

Oct;256(1):1–7. 87. Haiman CA, Patel YM, Stram DO, Carmella SG, Chen M, Wilkens LR, et al. Benzene Uptake and Glutathione S-transferase T1 Status as Determinants of S-Phenylmercapturic Acid in Cigarette Smokers in the Multiethnic Cohort. Cai Q, editor. PLoS One. 2016 Mar 9;11(3):e0150641. 88. Vahter M, Berglund M, Åkesson A, Lidén C. Metals and Women’s Health. Environ Res. 2002 Mar;88(3):145–55. 89. National Library of Medicine (US). Genetics Home Reference [Internet]. Bethesda (MD): The Library; 2013. Available from: https://ghr.nlm.nih.gov/gene/TENM4 90. Zalups RK, Ahmad S. Molecular handling of cadmium in transporting epithelia. Toxicol Appl Pharmacol. 2003 Feb;186(3):163–88. 91. US Department of Health and Human Services. The Surgeon General’s Advisory Committee on Smoking and Health. 1964. 92. Wang TW, Asman K, Gentzke AS, Cullen KA, Holder-Hayes E, Reyes- Guzman C, et al. Tobacco Product Use Among Adults — United States, 2017. MMWR Morb Mortal Wkly Rep. 2018 Nov 9;67(44):1225–32. 93. Yuan JM, Koh WP, Murphy SE, Fan Y, Wang R, Carmella SG, et al. Urinary levels of tobacco-specific nitrosamine metabolites in relation to lung cancer development in two prospective cohorts of cigarette smokers. Cancer Res. 2009;69(7):2990–5. 94. Hecht SS, Murphy SE, Stepanov I, Nelson HH, Yuan J-M. Tobacco Smoke Biomarkers and Cancer Risk Among Male Smokers in the Shanghai Cohort Study. 2012; 95. Yuan JM, Nelson HH, Carmella SG, Wang R, Kuriger-Laber J, Jin A, et al. CYP2A6 genetic polymorphisms and biomarkers of tobacco smoke constituents in relation to risk of lung cancer in the Singapore Chinese Health Study. Carcinogenesis. 2017;38(4):411–8. 96. Grambsch PM, Therneau TM. Proportional Hazards Tests and Diagnostics Based on Weighted Residuals. Biometrika. 1994 Aug;81(3). 97. Sorahan T, Lancashire RJ. Lung cancer mortality in a cohort of workers employed at a cadmium recovery plant in the United States: an analysis with detailed job histories. Occup Environ Med [Internet]. 1997 Mar [cited 2017 Jun 22];54(3):194–201. Available from: http://www.ncbi.nlm.nih.gov/pubmed/9155781 98. Järup L, Bellander T, Hogstedt C, Spang G. Mortality and cancer incidence in Swedish battery workers exposed to cadmium and nickel. Occup Environ Med. 1998;55(11):755–9. 99. Gorber SC, Schofield-Hurwitz S, Hardt J, Levasseur G, Tremblay M. The

107

accuracy of self-reported smoking: A systematic review of the relationship between self-reported and cotinine-assessed smoking status. Nicotine Tob Res. 2009 Jan;11(1):12–24. 100. Boldry EJ, Patel YM, Kotapati S, Esades A, Park SL, Tiirikainen M, et al. Genetic Determinants of 1,3-Butadiene Metabolism and Detoxification in Three Populations of Smokers with Different Risks of Lung Cancer. Cancer Epidemiol Biomarkers Prev. 2017 Jul;26(7):1034–42.

108

APPENDICES

109

APPENDIX A

University of Minnesota Institutional Review Board (IRB) Approval

110

111

112