A Method for Identifying Prevalent Chemical Combinations in the U.S. Population Dustin F
Total Page:16
File Type:pdf, Size:1020Kb
A Section 508–conformant HTML version of this article Research is available at https://doi.org/10.1289/EHP1265. A Method for Identifying Prevalent Chemical Combinations in the U.S. Population Dustin F. Kapraun,1 John F. Wambaugh,1 Caroline L. Ring,1,2 Rogelio Tornero-Velez,3 and R. Woodrow Setzer1 1National Center for Computational Toxicology, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA 2Oak Ridge Institute for Science and Education, Oak Ridge, Tennessee, USA 3National Exposure Research Laboratory, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA BACKGROUND: Through the food and water they ingest, the air they breathe, and the consumer products with which they interact at home and at work, humans are exposed to tens of thousands of chemicals, many of which have not been evaluated to determine their potential toxicities. Furthermore, while current chemical testing tends to focus on individual chemicals, the exposures that people actually experience involve mixturesof chemicals. Unfortunately, the number of mixtures that can be formed from the thousands of environmental chemicals is enormous, and testing all of them would be impossible. OBJECTIVES: We seek to develop and demonstrate a method for identifying those mixtures that are most prevalent in humans. METHODS: We applied frequent itemset mining, a technique traditionally used for market basket analysis, to biomonitoring data from the 2009–2010 cycle of the continuous National Health and Nutrition Examination Survey (NHANES) to identify combinations of chemicals that frequently co-occur in people. RESULTS: We identified 90 chemical combinations consisting of relatively few chemicals that occur in at least 30% of the U.S. population, as well as three supercombinations consisting of relatively many chemicals that occur in a small but nonnegligible proportion of the U.S. population. CONCLUSIONS: We demonstrated how FIM can be used in conjunction with biomonitoring data to narrow a large number of possible chemical combi- nations down to a smaller set of prevalent chemical combinations. https://doi.org/10.1289/EHP1265 Introduction of toxicity assessments focus on single chemicals, but unfortu- The ubiquitous use of man-made chemicals in consumer products nately, the effects of mixtures cannot always be determined using (Weschler 2009) and industrial processes (U.S. EPA 2014) leads simple additive assumptions (Berenbaum 1989). Thus, the to the potential for human exposure to large numbers of these National Research Council has suggested the need to shift away substances starting from the earliest stages of life (Carpenter from single chemical assessments in favor of mixtures testing et al. 1998). In fact, the U.S. Environmental Protection Agency (National Research Council 1994), and in response to this, the (EPA)’s Toxic Substances Control Act (TSCA) inventory now EPA has recommended that risk assessments be conducted using contains more than 84,000 chemical substances that may be in toxicity data on actual mixtures of concern or reasonably similar commercial use (Institute of Medicine 2014; U.S. Government mixtures (U.S. EPA 2000). Accountability Office 2013), and an estimated 30,000 of these At first glance, selecting mixtures to test seems an over- whelming prospect because of the sheer numbers. When consid- substances are produced at rates greater than one metric ton per n year (European Commission 2007). All humans, not just those ering a candidate pool of n chemicals, there are 2 − 1 possible inhabiting areas near major pollution centers, are now exposed to combinations. Thus, for a universe of 20 chemicals, the number thousands of chemicals through the air they breathe, the water of possible combinations is over one million, and the number of they drink, the food they eat, and the products they buy and use combinations doubles with each chemical we add to the candi- (Thornton et al. 2002). Furthermore, only a small fraction of the date pool. Fortunately, coexposure to environmental chemicals is chemicals known to be present in our environment have been suf- not purely random, but is subject to various structuring processes ficiently characterized in terms of their potential to cause human (Tornero-Velez et al. 2012), so we expect that the number of or ecological toxicity to support regulatory action (Judson et al. combinations of n chemicals that occur frequently in humans is n 2009; National Research Council 1984). Exacerbating this prob- likely much less than 2 − 1. To focus our assessment efforts, we lem of too many chemicals and insufficient data is the fact that therefore need to develop tools that can identify prevalent chemi- people in the real world are not exposed to individual chemicals cal mixtures. one at a time, but rather to mixtures of chemicals. The majority We suggest that the large number of mixtures that might be considered for toxicity testing be narrowed down to a relatively small number of mixtures of concern using a two-step process: first, identify combinations of chemicals that are most prevalent, Address correspondence to D.F. Kapraun, National Center for Computational and then identify the relative amounts (or concentrations) of the Toxicology, U.S. Environmental Protection Agency, Mail Drop D143-02, 109 fi T.W. Alexander Dr., Research Triangle Park, NC 27711 USA. Phone: (919) constituent chemicals to arrive at well-de ned mixtures. Note 541-4045. Fax: (919) 541-1194. Email: [email protected] that we make a distinction between chemical combinations and Supplemental Material is available online (https://doi.org/10.1289/EHP1265). chemical mixtures. In particular, we define a combination to be a The authors declare they have no actual or potential competing financial collection of chemicals that co-occur in an individual, whereas interests. we define a mixture to be a collection of chemicals that co-occur The views expressed in this manuscript are those of the authors and do in an individual in specific proportions as determined by their not necessarily represent the views or policies of the U.S. Environmental Protection Agency. concentrations in blood or urine. In this manuscript, we focus on Received 24 October 2016; Revised 17 April 2017; Accepted 19 April the first step in this process, identifying prevalent combinations 2017; Published 24 August 2017. of chemicals. Note to readers with disabilities: EHP strives to ensure that all journal Our approach for identifying combinations of chemicals that content is accessible to all readers. However, some figures and Supplemental are prevalent in humans relies on biomonitoring data, such as EHP Material published in articles may not conform to 508 standards due to those collected in the National Health and Nutrition Examination the complexity of the information being presented. If you need assistance accessing journal content, please contact [email protected]. Our staff Survey (NHANES). The National Center for Health Statistics will work with you to assess and meet your accessibility needs within (NCHS) of the Centers for Disease Control and Prevention con- 3 working days. ducts this ongoing survey of health metrics on a 2-y cycle, and Environmental Health Perspectives 087017-1 part of each survey cycle involves the investigation of approxi- those chemicals analyzed in NHANES Subsamples A, B, and mately 100 markers of chemical exposure in a representative C, respectively.) Because of this block structuring, information sample of the U.S. population (CDC 2016a). Since the beginning about the co-occurrence of chemicals from different groups is of the continuous NHANES program in 1999 (CDC 2016b), 265 not directly available from the NHANES data sets. chemicals in total have been included in NHANES biomonitoring Each of the chemical groups in NHANES 2009–2010 studies (Sobus et al. 2015). While these few hundred chemicals included four subgroups of chemicals, and depending on the only account for a very small fraction of the aforementioned tens types of chemicals in a given subgroup, laboratory analyses were of thousands of chemicals to which we may be exposed, conducted using either urine or blood spot (i.e., one-time) sam- NHANES currently provides the most comprehensive source of ples. Group A included (A1) metals; (A2) arsenics; (A3) perchlo- internal human exposure data. Therefore, we chose to use rate, nitrate, and thiocyanate; and (A4) phytoestrogens, all of NHANES biomonitoring data to explore methods for finding which were measured in spot urine samples. Group B included prevalent combinations of chemicals. (B1) environmental phenols; (B2) environmental pesticides; We propose that a market basket analysis technique known as (B3) phthalates; and (B4) polyaromatic hydrocarbons (PAHs), frequent itemset mining (FIM) (Borgelt 2012) can be used to nar- which were also all measured in urine. Finally, Group C row down the large number of possible combinations that can be included (C1) pyrethroids, herbicides, and organophosphate formed from a given pool of chemicals (such as the NHANES metabolites; (C2) polyfluoroalkyl chemicals; (C3) caffeine and chemicals) by identifying those combinations that are most prev- metabolites; and (C4) diethyltoluamide (DEET) and metabo- alent. While FIM has traditionally been applied to data sets lites. Most Group C chemicals were measured in urine, except describing consumer purchasing behavior (Agrawal and Srikant for the 12 polyfluoroalkyl chemicals (Subgroup C2), which 1994), it has been used in a variety of other contexts (Borgelt were measured in serum. We provide complete lists of the 2012). Recently, (Bell and Edwards 2014; 2015) applied FIM to chemicals included in Groups A, B, and C in Tables 1, 2, and NHANES data sets, but in their case, they sought to find associa- 3, respectively. More detailed information on these chemicals is tions between chemicals and diseases through association rules presented in Tables S1, S2, and S3. mining. So, although the number of peer-reviewed publications Age restrictions and excluded data. NHANES 2009–2010 utilizing NHANES biomarker data has increased steadily over included subjects of all ages, but only subjects aged 6 and older the last 10 y (Sobus et al.