Food and Chemical Toxicology xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Food and Chemical Toxicology

journal homepage: www.elsevier.com/locate/foodchemtox

Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment

Lutz Edler a, Andy Hart b, Peter Greaves c, Philip Carthew d, Myriam Coulet e, Alan Boobis f, ⇑ Gary M. Williams g, Benjamin Smith h, a German Cancer Research Centre (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany b The Food and Environment Research Agency – FERA, Sand Hutton, YO41 1LZ York, United Kingdom c Department of Cancer Studies and Molecular Medicine, University of Leicester, LE2 7LX Leicester, United Kingdom d Unilever, Colworth House Sharnbrook, MK44 1LQ Bedfordshire, United Kingdom e Nestlé Research Centre, Vers-Chez-Les-Blanc, 1000 Lausanne, Switzerland f Imperial College, Hammersmith Campus, Ducane Road, W12 0NN London, United Kingdom g New York Medical College, Basic Science Building, Room 413, Valhalla, NY 10595, United States h Firmenich, Rue de la Bergere 7, 1217-Meyrin 2, Switzerland article info abstract

Article history: This article addresses a number of concepts related to the selection and modelling of carcinogenicity data Available online xxxx for the calculation of a Margin of Exposure. It follows up on the recommendations put forward by the International Life Sciences Institute – European branch in 2010 on the application of the Margin of Expo- Keywords: sure (MoE) approach to substances in food that are genotoxic and carcinogenic. The aims are to provide Genotoxic carcinogen practical guidance on the relevance of animal tumour data for human carcinogenic hazard assessment, Tumour relevance appropriate selection of tumour data for Benchmark Dose Modelling, and approaches for dealing with Data selection the uncertainty associated with the selection of data for modelling and, consequently, the derived Point Benchmark Dose Modelling of Departure (PoD) used to calculate the MoE. Although the concepts outlined in this article are interre- Margin of Exposure Uncertainty assessment lated, the background expertise needed to address each topic varies. For instance, the expertise needed to make a judgement on biological relevance of a specific tumour type is clearly different to that needed to determine the statistical uncertainty around the data used for modelling a benchmark dose. As such, each topic is dealt with separately to allow those with specialised knowledge to target key areas of guidance and provide a more in-depth discussion on each subject for those new to the concept of the Margin of Exposure approach. Ó 2013 ILSI Europe. Published by Elsevier Ltd. All rights reserved.

Abbreviations: AIC, Akaike Information Criterion; BMD, Benchmark Dose; BMDL, 1. Introduction Lower Benchmark Dose; BMDU, Upper Benchmark Dose; BMR, Benchmark Response; COC, Committee On Carcinogenicity; ECHA, European Chemicals Agency; In recent years, the Margin of Exposure (MoE) has been pro- EFSA, European Food Safety Authority; EPA, Environmental Protection Agency; EPRI, European Parliaments Research Initiative; FDA, Food and Drug Administration; posed as an alternative to low dose extrapolation as a means of IARC, International Agency for Research on Cancer; IPCC, Intergovernmental Panel providing advice to risk managers on the potential level of concern on Climate Change; IPCS, International Programme on Chemical Safety; MoA, Mode from exposure to chemicals that are genotoxic and carcinogenic of Action; MoE, Margin of Exposure; NCI, National Cancer Institute; NIOSH, National and to help prioritise risk management actions (Benford et al., Institute for Occupational Safety and Health; NOAEL, Not Observed Adverse Effect 2010; JECFA, 2005). The MoE is expressed as the ratio between Level; NTP, National Toxicology Program; OECD, Organization for Economic Cooperation and Development; PoD, Point of Departure; PoD/RP, Point of Depar- an appropriate Point of Departure (PoD) on the dose–response ture/Reference Point; WHO, World Health Organisation. curve for a tumour response and a relevant estimate of human ⇑ Corresponding author. Address: ILSI Europe, Avenue E. Mounier 83, Box 6, 1200 exposure. In 2010, a special supplement volume was published Brussels, Belgium. Tel.: +32 2 771 00 14. in Food and Chemical Toxicology (Benford et al., 2010) entitled E-mail addresses: [email protected] (L. Edler), [email protected] (A. Hart), ‘‘Application of the Margin of Exposure (MoE) Approach to Sub- [email protected] (P. Greaves), [email protected] (P. Carthew), [email protected] (M. Coulet), [email protected] (A. Boobis), stances in Food that are Genotoxic and Carcinogenic’’. In 12 case [email protected] (G.M. Williams), benjamin.smith@firmenich.com, studies, different chemicals found in food and acknowledged to [email protected] (B. Smith).

0278-6915/$ - see front matter Ó 2013 ILSI Europe. Published by Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.fct.2013.10.030 Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 2 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx be acting via a direct DNA-reactive mode of action were considered ucts and the Environment (UK Committee on Mutagenicity of in order to provide guidance on how to apply the MoE approach to Chemicals in Food, 2000) and the European Food Safety Authority genotoxic and carcinogenic substances. A key conclusion, based on Scientific Committee’s opinion on genotoxicity testing strategies this work, was that ‘‘depending on the tumour endpoint that is se- applicable to food and feed safety assessment (EFSA, 2011a)do lected to be used in this approach and the ways in which the data provide solid guidance on appropriate tiered testing strategies for are analysed, it is possible to generate very different PoDs to be identifying genotoxic substances, including assessment of their di- used in calculating the MoE, and, hence, in the value of the MoE it- rect DNA-reactivity. self. It is, therefore, essential that the selection of the cancer end- In addition to establishing a direct DNA-reactive MoA, consider- point and mathematical treatment of the data are clearly ation of the biological relevance of animal tumours to human can- described and justified if the results of the MoE approach are to cer risk is key in the selection of tumour data for Benchmark Dose be trusted and to be of value to risk managers’’ (Benford et al., Modelling. This utilises what is commonly referred to as the ‘‘hu- 2010). man relevance framework approach’’ (Boobis et al., 2006), which In the article by Benford et al. (2010), the PoD, or reference has evolved over the years from a simple consideration of the mor- point, used for calculation of the MoE is in the lower 95% confi- phological and biological comparisons of rodent and human tu- dence limit (BMDL) of the Benchmark Dose (BMD) producing the mours to encompass a greater understanding of toxicity specified response (Benchmark Response or BMR) in a study in pathways and MoAs relevant to human carcinogenesis. Both the experimental animals. For substances that are both genotoxic genotoxic MoA and the relevance to humans of an animal tumour and carcinogenic, the BMR is typically taken as a 10% increase in response should be considered when selecting appropriate data the incidence of tumours above the modelled background. The sets for modelling and derivation of the BMDL. BMDL is used to establish a PoD that assures high (95%) confidence Since the selection of data requires expert opinion regarding that the specified response will not be greater at that dose level both the genotoxic MoA and relevance to human carcinogenesis, (EFSA, 2009b). As pointed out by Benford and co-authors, however, uncertainty enters into the choice of any data used to derive the the BMDL is heavily dependent on the risk assessors’ judgement as BMDL. This uncertainty needs to be characterised by the risk asses- to the selection and treatment of the data modelled and on how sor in such a way that it is transparent and can be taken into ac- the models are selected and employed. To address this issue, and count by the risk manager (Codex Alimentarius Commission, further build on the framework for applying the MoE approach to 2011; EFSA, 2009b, 2011a). In the case of data selection for deriv- substances that are genotoxic and carcinogenic, a new expert ing a MoE for a genotoxic carcinogen, the source of uncertainty is group was formed by ILSI Europe. The task of this group was to de- reflected in two key decisions: velop practical guidance for risk assessors on the factors to con- sider when selecting appropriate tumour data sets for – Decisions on which tumours/studies should be considered for Benchmark Dose Modelling and the subsequent derivation of a deriving the MOE. MoE. Specifically, the expert group focussed on the relevance of – Decisions on the value of the PoD to be selected from the animal tumour data for human carcinogenic risk assessment, tumours/studies. appropriate selection of tumour data for Benchmark Dose Model- ling, and approaches for dealing with the uncertainty associated In the first of these, the uncertainties relate to a categorical with the selection of data for modelling and, consequently, the de- question to include or exclude a particular study or tumour type. rived PoD used to calculate the MoE. In the second, the uncertainty relates to a quantitative question, A key consideration before deriving the MoE of any substance is in particular, what value to take for the PoD. to determine if the carcinogenic mode of action is relevant to hu- It should be noted that a third source of uncertainty in the cal- mans. In the previous ILSI Europe exercise, clear evidence of a culation of a MoE is related to the estimate of exposure. Due to the DNA-reactive mode of action, or at the very least the inability to scope of this article, attention is focussed on the uncertainties negate a genotoxic MoA, was considered a necessary pre-requisite inherent in the selection and modelling of tumour data. Similar ap- for deriving a MoE. Where clear evidence of DNA-reactivity is proaches for assessing uncertainty can, of course, be applied to shown, human relevance cannot be discounted (Preston and Wil- determine the uncertainty around exposure data used in the deri- liams, 2005). Although there may be some exceptions where a vation of a MoE. more detailed consideration is required, in principle, genotoxicity The present article summarizes the expert group review of the is assumed to be a carcinogenic mode of action (MoA) that is likely above issues and presents specific guidance under three main to be of relevance to human health across the full range of the headings: dose–response relationship (Williams et al., 2008). Conversely, for chemicals acting via other than direct DNA-reactive MoAs, it – Evaluating the human relevance of rodent tumours. is considered that there will be threshold dose levels for the key – Aspects of experimental design, statistical data analysis and events leading to the tumours (e.g. epigenetic alteration, cytotoxic- dose–response modelling. ity, protein interaction and cell proliferation), below which carcin- – Dealing with uncertainty. ogenicity is unlikely to arise. Determining the appropriate tumour type for calculating the Section 2 focusses on the first step on deciding on which tu- MoE of substances that are genotoxic and carcinogenic, however, mours/studies should be considered for deriving the MoE and pro- is not always straightforward, as some of the worked examples vides a rationale as to why specific tumour types may or may not previously published serve to illustrate (e.g. Carthew et al., be relevant for the assessment of genotoxic carcinogens. In partic- 2010a). Benford et al. (2010), recommended that the strength of ular, it reviews current knowledge, based on comparative pathol- evidence for genotoxicity should be based on a weight of evidence ogy, of rodent tumour types that appear to lack relevance to approach comprising structural alerts, in vitro and in vivo data, human cancer hazard identification. Various modes of action have read-across from related substances and, if available, human data. been linked to these tumour types, although the underlying mech- Such an approach places particular emphasis on the quality and anisms are often unclear (Boobis et al., 2008; Cohen, 2004)(Hoen- relevance of genotoxicity data. Although not specifically addressed erhoff et al., 2009; Holsapple et al., 2006; Meek et al., 2003). in this article, guidelines such as those recommended by the UK Section 3 (Aspects of statistical data analysis, experimental de- Committee on Mutagenicity of Chemicals in Food, Consumer Prod- sign and statistical uncertainty looks at the second step, derivation Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 3 of the PoD from the selected tumours/studies) looks at issues aris- diagnosis deals with this diversity by a detailed process of pattern ing from the design of carcinogenicity bioassays, statistical data recognition and placing neoplasms into a restricted number of analysis and statistical uncertainty in the BMD approach. The sta- diagnostic categories. Although diagnostic categories for tumours tistical analysis of dose–response data aims to describe the dose in rodents have been agreed internationally, they can be mislead- response relationship and to provide estimates of the response at ing because similar terms are also used for the diagnosis of human selected doses together with a characterization of the size or de- neoplasms where they may not always represent tumours with the gree of uncertainty. The BMD analysis specifically aims to deter- same biological properties. Neoplasms of one cell type may also mine a dose level at which there is high confidence in the PoD, show other differentiation patterns so that questions can arise usually the BMDL, at the specified BMR. This analysis depends about histogenesis and into which category they should be placed not only on technical assumptions in the dose response analysis for statistical analysis. and the statistical quality of the data available, but also on how The terms benign and malignant are used clinically for the pre- the dose response data were selected from the pool of available diction of a likely prognosis. However, these terms may represent data, whether and how they were pre-processed, and how the artificial divisions, at least for many tumour types, of neoplastic BMD analysis itself was performed. A major source of uncertainty development and progression in experimental rodent studies. In linked to the modelling is the choice of the dose–response model many cases, there is a clear sequence – hyperplasia–adenoma–car- fitted to the data and the choice of the fitting criteria. Given the cinoma, but in some instances, there seems to be a direct transition complexity of toxicological data, the selection of the data set for from hyperplasia through to a malignant phenotype without an statistical analysis and model selection are the two most influential intermediate stage of benign neoplasms. This has been a cause of factors determining the size of the BMDL. controversy in the context of hepatocellular lesions in rodents Finally, Section 4 of this report investigates new approaches for where the nature of the benign hepatic nodule has been a source assessing the uncertainty in both steps of the process: selection of of dissent and subject to changing terminology over the past tumours/studies and derivation of the PoD. Consideration is given 30 years. to uncertainties relating to the MoA, and additional uncertainties These various factors have led to the development of criteria for related to the toxicity studies that are not quantified by the BMDL. the combination of tumour types according to organ and tissue, The approach to uncertainty analysis is illustrated by application to where it is and is not appropriate to combine them for statistical two of the original case studies presented already by Benford et al. analysis. (2010). The following text discusses the human relevance of specific tu- mour types and the modes of action by which they can be induced. Some are generally induced by non-genotoxic agents, others by 2. Evaluating the human relevance of rodent tumours both non-genotoxic (epigenetic) and genotoxic (DNA-reactive) car- cinogens, and a few are induced primarily by genotoxic carcino- Although there have been several pathology reviews of rodent gens. Since a carcinogen can have both genotoxic and epigenetic tumour types of questionable significance to humans (Alison effects, the assessment of the potential MoAs for the tumours dis- et al., 1994; Cohen, 2004; Grasso and Golberg, 1966; Williams cussed is critical. Also, recommendations for combining tumour et al., 2008), much of the relevant information is scattered in the types for statistical analysis based on National Toxicology Program pathology literature. Here we attempt to summarize the current (NTP) criteria are summarised. For convenience, a summary table thinking on the relevance of rodent tumours for human cancer haz- is also provided. ard identification and provide criteria for the selection of tumour data for MoE calculations. However, it is important to emphasise 2.2. Mammary gland tumours that assessment of the relevance of neoplasms in rodents cannot be defined in simple guidelines. Each study requires careful charac- Mammary tumours are among the more common neoplasms terisation of tumour histopathology and discrimination between induced by both genotoxic and non-genotoxic chemicals in rats those neoplasms that occur spontaneously from those that may (Davies and Monro, 1995; Gold et al., 2001). It has been known be induced by treatment. In general, however, tumours arising for many years that a number of potent carcinogens such as from non-genotoxic MoAs tend to occur at high doses where there 7,12-dimethylbenzanthracene and 3-methylcholanthrene induce is histological evidence of persistent cellular toxicity, exaggerated mammary tumours in rats. This has led to their application in ani- pharmacodynamic effects or other perturbations of homeostasis mal models for human mammary cancer (Welsch, 1985). Although (Grasso, 1987). By contrast, the evidence from dosing a range of po- the mammary tumours induced by these DNA-reactive agents tent direct DNA-reactive carcinogens to rodents suggests that there show variable differentiation patterns, they typically start to ap- is clear histological evidence of an increase in malignancy in in- pear within weeks, or at the latest, after a few months of treatment duced tumours compared with tumours that develop spontane- (Geyer et al., 1951; Daniel and Prichard, 1964, 1961). ously. Clear evidence of a malignant phenotype is the presence of Mammary tumours also occur in rat carcinogenicity studies metastases distant from the primary tumour site rather than cyto- performed with chemicals that are not linked to increased cancer logical appearance alone, although the absence of metastases does risk in humans. Mammary tumours are common in aging female not necessarily mean that the tumours did not arise by a genotoxic rats and their incidence can be increased by a variety of factors that MOA. Furthermore, there is usually a much earlier age of onset of alter dopamine secretion over prolonged periods, notably adminis- tumours induced by genotoxic chemicals compared with those tration of oestrogenic compounds. Dopamine is a regulatory inhib- that develop spontaneously or follow administration of non-DNA itor of the secretion of prolactin by the pituitary gland. Rodents are reactive chemicals. particularly sensitive to oestrogenic inhibition of dopamine secre- tion, which in turn, leads to prolactin release (Alison et al., 1994). 2.1. Histopathological assessment of tumour types Prolonged administration of agents with oestrogenic activity causes hyperplasia of prolactin-producing cells. Prolactin is also There are several challenges to the assessment of the human luteotrophic in rats which leads to an increase in progesterone relevance of neoplasms that are increased in treatment groups in (Alison et al., 1994). Hence, the synergic activities of these hor- rodent carcinogenicity studies. Tumour types are diverse in mones are believed to lead to stimulation of mammary tissue appearance and their biological properties. Histological tumour which, if prolonged, leads to mammary tumour development. This Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 4 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx is supported by the fact that the dopamine agonist bromocriptine, suggests that even such increases are likely to be a species and a potent inhibitor of prolactin secretion, was shown to practically strain-specific effect (Caldwell, 1999; Thomas et al., 2007). The eliminate the development of mammary neoplasms in rats (Rich- high incidence of this neoplasm was one of the reasons for the ardson et al., 1984). This tumorigenic effect in the mammary gland NTP to switch to another strain of rat for carcinogenicity testing. is limited to rodents because prolactin is not luteotrophic in prima- Other factors included decreased fecundity, sporadic seizures and tes. This is supported by large epidemiological studies that have idiopathic (Dinse et al., 2010; King-Herbert and shown only a very small or no risk of breast cancer development Thayer, 2006; King-Herbert et al., 2010). following prolonged oestrogen therapy in women (Beral, 2003; Chlebowski et al., 2003; Speroff, 1999; Yager and Davidson, 2006). 2.4. Lung tumours Although the NTP recommended that mammary adenomas and fibroadenomas occurring in rodents are combined for statistical Rodents, especially mice, have a notoriously variable incidence analysis, it is now generally accepted that these neoplasms may of lung alveolar or bronchiolar tumours which is highly strain have a distinct histogenesis and should not be combined (Williams dependent (Haseman et al., 1998). In contrast to humans, where et al., 2008). However, it is recommended to combine the various carcinoma is more common in the lungs, hyperplasia and adenoma subtypes of mammary carcinomas (McConnell et al., 1986). are the main proliferative lesions of the mouse lung (Nikitin et al., 2004). These mouse neoplasms rarely metastasize. The high inci- 2.3. Haemopoietic and lymphoid neoplasms dence and the inherent variability of pulmonary adenoma and ade- nocarcinoma incidence in conventional mouse carcinogenicity A number of agents, notably benzene, alkylating anti-cancer bioassays sometimes gives rise to statistically significant differ- drugs and ionizing radiation that induce lymphoma and leukae- ences between control and treatment groups. Therefore, there is mias in rodent carcinogenicity studies have been linked to the considerable risk in over-interpretation of small group differences development of haemopoietic neoplasms in humans (Gold et al., in conventional mouse bioassays. These tumours are often small 2001; IARC Working Group on the Evaluation of Carcinogenic Risks compared to the size of the lungs and particular consideration to Humans, 1987). The increase in lymphoma or leukaemia in pa- needs to be given to tissue sampling procedures as well as age- tients reported after prolonged immunosuppression is also recapit- standardization and historical control incidence. Moreover, they ulated in rodent carcinogenicity studies (Caillard et al., 2005; show a spectrum of histological appearances and there is no sharp Cohen et al., 1983; Krueger et al., 1971; Vial and Descotes, 2003). distinction between benign and malignant tumours. Combining However, under certain circumstances development of increased the incidence rates for adenoma and carcinoma is, therefore, usu- numbers of lymphomas or leukaemias in rodent studies does not ally the most appropriate option for statistical analysis (Annex 1). indicate cancer risk for humans. Lymphoma incidence in rodent A considerable number of widely employed therapeutic agents studies is particularly variable because it varies with time and of different classes have produced small increases in the incidence may be influenced by non-specific factors such as housing and diet of pulmonary tumours in carcinogenicity studies performed in (Greaves, 2012). A negative correlation between tumours and mice, but these have usually been considered of little significance haemopoietic neoplasms in rats and mice has also been noted to humans (Davies and Monro, 1995). It is the rat’s, rather than (Young and Gries, 1984). Over 60 years ago, oestrogenic agents of the mouse’s, lung that appears more sensitive to tumour induction different types were shown to increase the incidence of lymphoid by agents considered to be lung carcinogens in humans (Hahn tumours in mice in a strain-dependent manner (Gardner et al., et al., 2007). 1944). However, detailed reviews of the clinical data on the wide- spread use of oestrogen and combination therapies in humans 2.5. Vascular tumours have not shown any association with increased haemopoietic can- cer risk (IARC Working Group on the Evaluation of Carcinogenic A more recently recognised group of neoplasms, which may not Risks to Humans, 1987, 1999, 2007). have relevance for human cancer hazard identification, are those Mononuclear cell leukaemia (large granulocytic leukaemia) in that arise in the vasculature, termed angiomas and angiosarcomas Fischer 344 rats represents a special case. A treatment-induced in- or haemangiomas and haemangiosarcomas. crease in the incidence of this neoplasm in rat carcinogenicity DNA-reactive agents, such as vinyl chloride and related com- studies is widely believed to lack relevance to identification of hu- pounds which cause cancer in humans, produce highly malignant man cancer risk (Caldwell, 1999; Cohen, 2004; Williams et al., sarcomas relatively rapidly in rodents following continuous dosing 2008). It is an unusual neoplasm because it is found only in signif- (Wright et al., 1991). By contrast, a number of non-DNA reactive icant numbers in aging Fischer 344 rats where it is an important pharmaceuticals and other chemicals have also produced angio- cause of death or early termination in bioassays. The exact nature mas and angiosarcomas in rodent studies, predominantly those of the cell involved is unclear, but it appears to have some charac- conducted in the mouse but also in the rat without this being asso- teristics of a natural killer cell. A human counterpart probably does ciated with human cancer risk (Cohen et al., 2009). These tumours exist, but it is extremely rare (Thomas et al., 2007). Several organs develop at a time when similar neoplasms develop in control ani- are usually infiltrated by leukaemic cells, but spleen, liver and mals where they can be quite common. The pharmaceuticals that lungs are most frequently involved. The morphology of the leukae- have this property represent a diverse group which includes olan- mic cells is variable, but well-differentiated cells show reniform zapine, troglitazone and pregabalin (Cohen et al., 2009; Duddy nuclei and cytoplasmic granules of variable size. Its incidence et al., 1999; Kakiuchi-Kiyota et al., 2009). For example, pregabalin, among control Fischer 344 rats has shown an increase in 2 year a structural derivative of the inhibitory neurotransmitter c-amino- NCI/NTP studies over the last two or three decades for reasons that butyric acid (GABA), was shown to produce a dose-dependent in- are unclear (Thomas et al., 2007). Moreover, its incidence can be crease in the incidence of haemangiosarcomas above control modified by experimental variables including diet and gavage incidence in both B6C3F1 and CD-1 mice at all dose levels tested, using corn oil as a vehicle. Splenic ionising radiation and adminis- even at the lowest dose where the exposure to the drug was similar tration of agents toxic to the spleen reduce the incidence of this to the human exposure at the maximum recommended dose. In all leukaemia and immunosuppression may increase it (Elwell et al., these cases, the tumours were of late onset and occurred in small 1996). Only a few chemicals have produced a meaningful signifi- numbers. Histologically, they resembled those that occur in ro- cant increase in this tumour type. Weight of evidence analysis dents spontaneously. Tumours of this type develop spontaneously Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 5 in rodents, particularly in mice, although less commonly in rats hyperplasia. Subsequent analysis of these finding has indicated (Greaves and Barsoum, 1990; Poteracki and Walsh, 1998). that these tumours were likely to have arisen due to an interaction Why these non-DNA reactive agents increase angioma and between a reduction in fluid intake and a lowering of pancreatic angiosarcoma incidence in mice or rats is unclear. However, they secretion induced by hydrogen peroxide along with the irritant ef- appear to influence growth regulatory pathways that underlie fects of dietary roughage in treated mice. This phenomenon lacks the genetic predisposition of rodents to vascular tumours. It has relevance for humans after exposure to doses resulting from the been shown that the proliferative rate of endothelial cells is lower normal use of hydrogen peroxide (DeSesso et al., 2000). in humans than in rodents which has been given as an explanation for the sensitivity of rodent endothelial cells compared with those 2.6.3. Hepatocellular tumours in humans (Ohnishi et al., 2007). Moreover, in humans, angiogen- The liver is the most common target organ identified in rodent esis and endothelial cell proliferation are well- regulated and vas- carcinogenicity studies for environmental and industrial chemicals cular proliferation as a result of prolonged stimulation is not as well as therapeutic agents (Davies and Monro, 1995; Gold et al., associated with vascular tumour development. It has been sug- 1991). In only a few cases have the many xenobiotics that produce gested that in the epigenetic induction of haemangiosarcoma in ro- liver tumours in rodent been shown to have any tumorigenic effect dents, there is a convergence of multiple events such as in humans (Williams, 1997). Application of the IPCS MoA frame- haemolysis, decreased respiration and adipocyte growth that leads work has been proposed as a means of addressing this issue (Hols- to dysregulated angiogenesis or erythropoiesis as a result of hy- apple et al., 2006). poxia and macrophage activation. These events lead to the release The response of the liver to xenobiotics that are potent genotox- of angiogenic growth factors and cytokines that stimulate endothe- ins and linked to cancer risk in humans is different from that to lial cell proliferation, which, if sustained, provide the milieu that non-DNA reactive chemicals. This response has been studied for can lead to vascular tumour formation (Cohen et al., 2009). several decades in the context of experimental models for carcino- genesis. Pioneering work by Druckrey et al. (1967), followed by 2.6. Alimentary tract tumours others (Peto et al., 1984, 1991) showed that as the dose of a DNA reactive carcinogen increases, so does the number of tumours 2.6.1. Forestomach tumours and the time to tumour development or latent period decreases. In rodents, the forestomach occupies about two-thirds of the Strain differences in response to these chemical agents occur but proximal area of the stomach and is lined by keratinized squamous appear to be limited to tumour latency (Takahashi et al., 2002). epithelium. The forestomach acts as a storage organ releasing rel- The short latent period is reflected in the pathology of the induced atively undigested food into glandular stomach in response to en- tumours, which typically show a highly malignant and metastatic ergy demand (Gartner and Pfaff, 1979). Hence, in carcinogenicity phenotype that often occurs within 6 months. bioassays, the forestomach mucosa is often exposed to xenobiotics In contrast, the pathological and biological features of tumours mixed in undigested food and chemicals for longer periods than induced by non-DNA reactive carcinogens differ markedly from elsewhere in the . A large number of chemi- those induced by DNA reactive carcinogens and are generally con- cals that show no reactivity with DNA, including the food antioxi- sidered to lack relevance for cancer risk in humans (Grasso, 1987; dant butylated hydroxyanisole, produce hyperplasia and Williams, 1997). Hepatic nodule and tumour formation appears to eventually neoplasms in the squamous mucosa of the forestomach be a phenotypical response of the rodent liver to diverse changes of rodents after prolonged administration of high doses (IARC induced by xenobiotics, bacterial infection, obesity, dietary defi- Working Group on the Evaluation of Carcinogenic Risks to Humans, ciency, high diets or portacaval (Ghoshal and Far- 2003). It is generally agreed that this type of change is an adaptive ber, 1984; Mikol et al., 1983; Rogers et al., 2004; Weinbren and response to prolonged local cytotoxicity that has no relevance for Washington, 1976). These changes are often associated with liver humans in the normal utilization of these drugs or other chemicals weight increases. Changes include hepatocellular hypertrophy, (Cohen, 2004). induction of metabolising enzymes which may represent adaptive responses (Williams and Iatropoulos, 2002) or prolonged inflam- 2.6.2. Intestinal neoplasms mation. All strains of mouse appear to be particularly responsive Another pathological phenomenon which appears to be of to these effects compared with rats even those that have a low doubtful relevance for human carcinogen risk assessment is the background incidence of hepatic neoplasms (Greaves, 1996). development of glandular gastrointestinal neoplasms as a conse- In the two well-studied examples, clofibrate and phenobarbi- quence of oral administration of poorly absorbed and irritant tone, foci of aberrant cells and tumours develop slowly and gener- materials that produce chronic inflammation of the mucosa. This ally show a benign phenotype. Moreover, significant increases in phenomenon has been best studied following administration of neoplasms occur only towards the end of 2 year studies or even high doses of degraded carrageenans, amylopectin sulphate or sul- after 2 years (Greaves et al., 1986; Jones et al., 2009; Nagai and Far- phated dextrans. All these substances produce adenomas and car- ber, 1999; Reddy and Qureshi, 1979; Svoboda and Azarnoff, 1979). cinomas in the rodent colon which are closely associated with In addition, they are usually seen only at higher doses where there persistent chronic inflammation of the glandular mucosa (Fath has been prolonged perturbation of hepatocellular homeostasis et al., 1984; Ishioka et al., 1985, 1987; Marcus and Watt, 1971; leading to adaptive responses that enhance tumorigenesis (Wil- Sharratt et al., 1970). Flat and polypoid neoplastic lesions develop liams and Iatropolos, 2002). A similar pattern has been seen with and malignancy is closely associated with more intense levels of the 20 or more cases of widely used pharmaceutical agents that inflammation (Clapper et al., 2007). However, doses needed to pro- produce hepatic nodules and neoplasms in rats, mice or both spe- duce sufficient inflammation to induce neoplasms are far higher cies (Davies and Monro, 1995), but are considered to lack relevance than any exposure likely to be achieved in humans. In the absence for human cancer risk assessment. of prolonged and severe inflammation, tumours do not occur. The separation of these two different responses of the rodent li- In this context, it is also worth remarking that the epithelial tu- ver to xenobiotics requires a critical assessment of the histopathol- mours induced in mice in the proximal duodenum by administra- ogical features within the liver. Unfortunately, pathological tion of large doses of oral hydrogen peroxide solution originally evaluation has been clouded over the years by disputes among reported by Ito and colleagues (Ito et al., 1981) were also associ- pathologists about the precise nature and terminology used in ated with epithelial erosion, prolonged inflammation and reactive the diagnosis of nodular lesions of the rodent liver. Whilst the Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 6 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx current classification: focus, adenoma, carcinoma is now widely following administration of not only some therapeutic agents but agreed and used in most laboratories, some caution is merited be- chemicals such as D-limonene, certain gasoline products, 1,4 cause these terms almost certainly embrace lesions of different dichlorobenzene, dimethyl methylphosphonate and the solvents biological types. For example, the non-invasive nodule induced hexachloroethane, isophorone, pentachloroethane, tert-butyl alco- by genotoxic carcinogens, termed adenoma, typically shows con- hol and tetrachloroethylene (Lock and Hard, 2004). siderable cellular pleomorphism and mitotic activity. This lesion A low incidence of renal tubular neoplasms develops in male is different to the more uniform, well-differentiated nodule also rats after prolonged treatment with agents that produce these called adenoma, which is induced by non-genotoxic agents. This changes. It is believed that binding of these chemicals or their latter type may even regress following withdrawal of treatment metabolites to a2u globulin occurs and prevents its normal lyso- (Greaves et al., 1986; Malarkey et al., 1995). somal elimination, leading to cellular overload and a process of in- creased individual tubular cell necrosis and proliferation. This in 2.6.4. Biliary tumours turn leads to development of tubular neoplasms. As a result, it is In the rat liver, bile duct proliferation (cholangiofibrosis or ade- believed that this form of neoplasms development is both sex- nofibrosis) associated with oval cell proliferation and intestinal and species-specific and does not indicate human cancer risk metaplasia of proliferating duct cells, is associated with adminis- (Baetcke et al., 1991). tration of hepatotoxins, notably polychlorinated biphenyls, furan Another complicating factor in the interpretation of treatment- and chloroform, as well as a number of therapeutic agents (Duns- induced increases in renal tubular neoplasms in rats is the pres- ford et al., 1985; Farber, 1963; Gregory et al., 1983; Jamison et al., ence of a spontaneous chronic renal disease termed chronic pro- 1996; Kimbrough et al., 1973; Maronpot et al., 1991; McGuire gressive nephropathy. A distinctive form of renal tumour is found et al., 1986; Sirica, 1996; Tryphonas et al., 1986). The spectrum in rats with advanced chronic nephropathy and chemicals that en- of histological changes observed following treatment with these hance the development of this disease may also increase the inci- chemicals includes hepatocellular vacuolation, degeneration, dence of renal adenomas and carcinomas.(Hard et al., 2008, 2011; necrosis, chronic inflammation, hepatocellular regeneration, zones Travlos et al., 2011). These tumours are not relevant to humans. of marked fibrosis and oval cell proliferation and also the presence of glandular structures resembling intestinal epithelium. Taken to- 2.7.2. Urothelial (transitional cell) neoplasms gether, these are the histological manifestation of prolonged hepa- Chemically-induced bladder calculi or microcrystals in the rat tic injury. Under these circumstances, when the homeostatic or mouse bladder lumen lead to inflammation, hyperplasia and response mechanisms are overwhelmed, stem cells are activated an increase in cellular proliferation and eventually to urothelial to proliferate and these contribute to the glandular differentiation neoplasms. If doses of chemicals are administered below the (Thorgeirsson, 1996). amount necessary for calculus formation, no effects on the urothe- Unfortunately, these reactive changes can be so florid that there lium are produced (Clayson et al., 1995; Cohen, 1995, 1998). The is a tendency to over-diagnose these lesions as cholangiocarcino- rat, particularly the male rat, appears to be particularly susceptible mas. The term cholangiocarcinoma, intestinal type has been coined to the formation of urinary precipitate (Cohen et al., 2000). The because of the frequent occurrence of metaplastic glandular cells overall evidence supports the concept that calculi are the primary of intestinal type (Elmore and Sirica, 1993). These are probably factor in the induction of cell proliferation and eventually of tu- not malignant neoplasms because they are totally unlike the mours through the passage of urine-borne growth factors into malignant counterpart found in humans as they do not spread out- the damaged urothelium (Cohen et al., 2000). Other relevant fac- side the liver itself or produce metastatic deposits. Indeed, they are tors include alterations in urinary pH, osmolality, urinary and uro- also distinct from those malignant hepatocellular tumours induced thelial metal ion imbalance and rat bladder infection. These by DNA-reactive carcinogens that sometimes show focal glandular tumours are not relevant to humans in the absence of significant or bile duct differentiation patterns. Therefore, this form of prolif- induced inflammatory or reactive alterations in the urothelium. erative lesion does not indicate a human cancer hazard, although clearly it is indicative of hepatotoxicity. 2.8. Female genital tract

2.7. Urinary tract 2.8.1. Endometrial carcinomas Reproductive senescence in aged female rats is thought to be 2.7.1. Renal tumours in rats caused by hypothalamic aging rather than the loss of ovarian A number of non-genotoxic chemicals have been shown to pro- responsiveness as in women. The former leads to a reduction in duce increases in renal neoplasms in rats and mice through sus- dopamine, an increase in prolactin, persistence of corpora lutea tained stimulation of cell proliferation in the renal tubule as a and elevated progesterone levels. Dopamine agonists, such as bro- regenerative response to direct cytotoxicity or indirectly through mocriptine, inhibit prolactin secretion which leads to oestrogen a process of lysosomal overload (Hard, 1998; Lock and Hard, dominance and persistent oestrus which leads to endometrial 2004). One of these processes considered to lack human relevance stimulation and eventually endometrial carcinoma (Richardson is when a chemical interferes with a2u globulin. This is a low et al., 1984). Such effects appear to be specific to rats (Alison molecular weight protein that is synthesised predominantly in et al., 1994). Bromocriptine has dissimilar effects in humans (Rich- the liver of the male rat and which normally passes freely through ardson et al., 1984). It has been used therapeutically for over the glomerulus into the renal tubule. It is partly endocytosed by 30 years and uterine carcinomas have not been reported in treated proximal renal tubular cells to form eosinophilic lysosomal drop- patients (Misbin, 2009). lets. Certain chemicals that interfere with this process produce excessive accumulation of this protein in the proximal tubular cell 2.8.2. Ovarian tumours in mice and give rise to a condition termed hyaline droplet nephropathy in Tubulostromal tumours are common in rodent species but male but not female rats. It is characterised by the presence of uncommon in humans (Alison and Maronpot, 1987). It has been eosinophilic (hyaline) droplets in the cytoplasm of proximal tu- shown in mice that these tumours can result in response to oocyte bules associated with granular cast formation, degenerative alter- loss from a variety of causes including that induced by xenobiotics. ations and increased cell turnover. This a2u globulin-related Oocyte loss leads to reduction in oestrogen feedback to the hypo- nephropathy is a condition reported to develop in male rats thalamus, stimulation of pituitary follicle stimulating hormone Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 7 and luteinising hormone release which results in hyperplasia of where their incidence depends on age, sex and strain (Barsoum ovarian stromal cells and eventually neoplasms. It is argued that et al., 1985; Berkvens et al., 1980; Kovacs et al., 1977; McComb the mouse is particularly sensitive to this phenomenon so ovarian et al., 1984). Although mostly of a benign phenotype, they repre- tumours in mice are irrelevant to human cancer risk assessment sent a significant cause of death in rat carcinogenicity studies be- (Alison et al., 1994; French, 1989; Stitzel et al., 1989). cause as they enlarge, they compress adjacent brain and neural structures. It is believed that prolonged elevation of circulating 2.8.3. Smooth muscle tumours of the mesovarium prolactin is important in the pathogenesis of these tumours in rats. Prolonged treatment of female rats with sympathomimetic With aging, there is gradual loss of dopaminergic neurons in the drugs has been reported to result in the development of benign hypothalamus of rats resulting in decreased dopamine secretion. smooth muscle neoplasms (leiomyomas) in supporting tissues of The loss of the negative dopaminergic control mechanisms on lac- the ovary (mesovarium) (Amemiya et al., 1984; Gopinath and Gib- totrophs causes chronic stimulation of these cells, inducing high son, 1987; Jack et al., 1983; Nelson et al., 1972; Nelson and Kelly, mitotic activity, hypertrophy, hyperplasia, and the development 1971). It has been shown that these leiomyomas neither progress of pituitary as well as mammary tumours (see below). Xenobiotics nor regress following a 44-week post-dose recovery period, subse- and other experimental factors, which interfere with these control quent to 80 weeks of treatment with salbutamol and were pre- mechanisms, may alter the incidence of pituitary tumours in car- vented by concurrent administration of the b-blocker, cinogenicity studies. Overfeeding accelerates the reduction of propranolol (Gopinath and Gibson, 1987). The tumour develop- hypothalamic dopamine activity, increases pituitary prolactin re- ment relates to the presence of specific b2 adrenoceptors in mesov- lease, and results in the early development of potentially fatal pitu- arial smooth muscle cells and the pharmacological potency of the itary gland tumours in rats (Molon-Noblot et al., 2003). Conversely administered b-receptor agonists (Colbert et al., 1991). The precise dietary restriction prevents these age-related decreases in hypo- mechanisms are unclear, but it has been suggested that the pro- thalamic dopaminergic activity without reducing the responsive- longed and intense activation of these receptors mediate smooth ness of pituitary cells to hypothalamic hormones (Molon-Noblot muscle proliferation (Apperley et al., 1978; Gopinath and Gibson, et al., 2003). Xenobiotics may act in analogous ways. Administra- 1987). No such tumours are reported in women treated with this tion of oestrogens and oral contraceptive steroids are believed to class of drug (Poynter et al., 1978). induce pituitary tumours in rats and mice through inhibition of dopamine secretion which leads to prolactin release. There is no 2.9. Endocrine system evidence of this occurring in humans (Lloyd, 1983; Lumb et al., 1985; Russfield, 2013). Older comparative pathology studies in Endocrine tissues have only a limited repertoire of responses to prostate cancer patients receiving diethylstilboestrol for as long stimuli, one of which is endocrine secretion and proliferation. This as 4 years failed to show any of the histological changes in the pitu- probably accounts for the frequent occurrence of endocrine hyper- itary gland that accompany prolonged oestrogen administration in plasia and neoplasms in high dose carcinogenicity studies where rats and mice (Dekker and Russfield, 1963). prolonged and exaggerated stimulation of endocrine tissue occurs. This unique sensitivity of the rat to pituitary tumour develop- Rodents appear to be particularly sensitive and hormonal control ment is the basis for the lack of relevance of these tumours for mechanisms are sometimes different from those in humans (Ca- the identification of carcinogenic hazard for humans. Although less pen, 2001). Review of the marketed human drugs in the Physicians’ well-documented, the mouse is, like the rat, sensitive to the induc- Desk Reference in 1994 showed that of 101 agents out of 241 that tion of pituitary tumours by xenobiotics (Nilsson and Bierke, produced such tumours in rodents, at least 15 were reported to 1997). produce thyroid tumours in rats, nine were linked to adrenal tu- mours in rats and pituitary adenomas developed in eight rat and 2.9.2. Thyroid follicular tumours four mouse bioassays (Davies and Monro, 1995). None of these The comparative aspects of thyroid pathology and mechanisms have been linked to the development of endocrine tumours in pa- involved in thyroid tumour development have been extensively tients. Hence, from a comparative pathology perspective, the neo- studied (Capen, 1997, 1998, 2001; McClain, 1995): The rodent thy- plasms in endocrine organs and hormone-responsive tissues in roid is especially sensitive to physiological perturbations, which if chronic rodent studies following treatment with high doses of sustained, increase the development of thyroid tumours. Com- many xenobiotics have been generally considered by pathologists pared with humans, thyroxin metabolism takes place more rapidly to have limited significance for human safety assessment (Capen, in rats because of the absence of thyroxin-binding globulin in the 2001). circulation. Moreover, rats are more sensitive to thyroid peroxidase It should be noted that the distinction between endocrine inhibition than humans (Alison et al., 1994). hyperplasia, benign and malignant neoplasms in rodents based The wide variety of drugs, other chemicals, and physiologic per- on histological criteria is not clear cut. The distinction on morpho- turbations, which increase thyroid tumour development, appear to logical grounds between benign and malignant neoplasms is based act through secondary (indirect) mechanisms, i.e., through pro- almost solely on nuclear pleomorphism and penetration of endo- longed hypersecretion of thyroid stimulating hormone. The activa- crine cells into the surrounding tissues and blood vessels. In hu- tion of the thyroid gland during the treatment of rodents with mans, these histological features may be seen in benign lesions substances that stimulate thyroxin catabolism in the liver is (Lewis, 1996). It is probable, therefore, that a number of the endo- well-known. The rat is particularly sensitive because of the in- crine carcinomas diagnosed in rodents would not be considered creases in glucuronyl transferase activity which can occur follow- malignant by pathologists involved in human diagnostic pathol- ing administration of high doses of hepatic enzyme inducers or ogy. This difficulty is overcome by the recommendation to com- agents that simply increase liver cell mass. Humans are far less bine adenomas and carcinomas found in the same endocrine sensitive to these effects for there is no evidence of a significant in- organ in the assessment of rodent carcinogenicity studies (McCon- crease in thyroid stimulating hormone (TSH) even following treat- nell et al., 1986). ment with high doses of very powerful enzyme inducers, such as rifampicin (Capen, 2001). Moreover increases in TSH produced by 2.9.1. Pituitary tumours in rats other mechanisms in humans are accompanied by hypertrophy There is a large body of published work on the pathology of rat of the thyroid gland rather than the hyperplasia seen in rodents pituitary tumours because they are so common in this species (Cohen, 2004). Thus, treatment-related increases in follicular Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 8 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx thyroid tumours in rats by such MoAs do not indicate cancer risk in et al., 1999; IARC Working Group on the Evaluation of Carcinogenic humans. Risks to Humans, 2003). Although gastric enterochromaffin neoplasms can occur in hu- 2.9.3. Adrenal medullary tumours mans through a similar mechanism in patients with severe auto- The most common neoplasm of the adrenal medulla observed immune chronic atrophic gastritis (Burkitt and Pritchard, 2006; in rodent carcinogenicity bioassays is the pheochromocytoma. A Solcia et al., 2000), this MoA is threshold sensitive and does not ap- pheochromocytoma is a tumour of neoplastic pheochromocytes pear to occur with the therapeutic use of agents that inhibit gastric or chromaffin cells considered to derive from cells of neural crest secretion. These drugs have been used extensively for many years origin. Although immunohistochemical analysis of rat pheochro- and gastric biopsies in long-term clinical trials have not shown an mocytomas as well as biochemical analysis suggested that unlike increase in the incidence of carcinoid tumours, dysplasia or gastric their human counterparts they secrete only noradrenalin, recent carcinomas (IARC Working Group on the Evaluation of Carcino- studies with mRNA analyses have suggested that rat pheochromo- genic Risks to Humans, 2003; Lamberts et al., 2001). Thus, these cytomas also form adrenalin (Greim et al., 2009). Some strains of findings in rodents are believed to be an effect of exaggerated rat are particularly liable to develop focal hyperplasia and neo- pharmacodynamic activity and not an indicator of carcinogenic plasms of the adrenal medulla both spontaneously with advancing potential. age and following administration of xenobiotics. Reported inci- dences of spontaneous proliferative lesions among different rat 2.9.5. Leydig cell tumours in rats strains are up to nearly 90% (Tischler and DeLellis, 1988). Males A number of rat strains show a striking predisposition to devel- are more susceptible than females and dietary factors are impor- op Leydig (interstitial) cell tumours with advancing age. They are tant. There is a reported association with severe chronic renal dis- common in the Fischer 344 rat strain where they are reported to ease in rats, ascribed to disturbed calcium balance (Nyska et al., occur in over 70% of aging males, in association with age-related 1999). A variety of agents have been shown to potentiate the declines in plasma testosterone and increases in gonadotrophin development of these tumours in rat carcinogenicity bioassays levels. This high incidence in Fischer 344 rats is reflected by the notably lactose and polyols, such as sorbitol, mannitol, xylitol wide range of xenobiotics which are capable of increasing the inci- and lactitol (Baer, 1988; Tischler et al., 1996). It has been suggested dence of Leydig cell tumours when administered at high does for that these substances inhibit catecholamine synthesis, possibly by long periods of time. A detailed review of comparative physiology interference with calcium balance, which in turn, results in com- and pathology indicated that rats are quantitatively far more sen- pensatory medullary hyperplasia and eventually neoplasms sitive to the development of Leydig cell tumours than men. It was (Boelsterli and Zbinden, 1985). A MoA involving inhibition of cate- argued that Leydig cell luteinizing hormone releasing hormone cholamine activity is supported by the observation that the widely (gonadotropin-releasing hormone) receptors are unique to rats. used b-blockers tenormin and timolol have also been associated Rats also have over 10 times more luteinizing hormone receptors with an increase in pheochromocytoma development in rat carcin- than men (Cook et al., 1999). It was concluded that for compounds ogenicity studies without this proving a cancer risk to patients de- that induce Leydig cell tumours in rats there is little or no evidence spite prolonged periods of treatment (Greaves, 2012). of cancer hazard in humans (Baer, 1992; Cook et al., 1999; Prentice This exquisite sensitivity of the rat medulla, the innocuous nat- and Meikle, 1995). ure of some of the precipitating agents, and the lack of direct hu- Another tumour closely associated with the high background man counterpart suggest that these lesions usually have little or incidence of Leydig cell tumours in Fischer 344 rats is the tunica no relevance to human safety when produced in rats following vaginalis mesothelioma. Hormone imbalance brought about by administration of high doses of xenobiotics. perturbations of the endocrine system is believed to be a key fac- tor leading to its induction. A review of a number of carcinogenic- 2.9.4. Gastrointestinal carcinoid (neuroendocrine) tumours ity studies in the Fischer 344 rat has suggested that this tumour is The hyperplasia of gastric endocrine cells and development of also a male Fischer 344 rat specific event and chemicals that carcinoid-like neoplasms was first reported in the stomach of rats induce only tunica vaginalis mesothelioma in the male Fischer treated with omeprazole over 20 years ago (Ekman et al., 1985; 344 rat in a typical rat carcinogenicity bioassay are likely to be Havu et al., 1990). Omeprazole inhibits gastric acid secretion by irrelevant in human cancer risk assessment (Maronpot et al., blocking the enzyme H+, K+-ATPase, the proton pump of the pari- 2009). etal cells. Similar findings have been reported in both rats and mice with other drugs of the same type as well as potent and long-acting 2.9.6.

H2 receptor antagonists and other agents that produce achlorhyd- Most rodent pancreatic tumours are derived from acinar cells ria. A number of reviews have been published. Rats appear partic- whereas human pancreatic neoplasms are of ductular origin. Tu- ularly sensitive to these effects (Betton et al., 1988; Betton and mours of islet cells also occur and these are rare in humans. Few Salmon, 1984; Fossmark et al., 2008; IARC Working Group on the genotoxic carcinogens have induced pancreatic tumours. Factors Evaluation of Carcinogenic Risks to Humans, 2003; Poynter et al., influencing cholecystokinenin levels have a neoplastic effect in ro- 1986, 1985). It is postulated that omeprazole and these other dent pancreas (Gumbmann et al., 1989). agents produce prolonged inhibition of acid secretion in the rat (and mouse), which causes activation and hyperplasia of gastrin 2.10. Mesenchymal tumours cells and marked hypergastrinaemia. Hypergastrinaemia, in turn, stimulates enterochromaffin cells of the fundus, which in time re- 2.10.1. Solid state tumorigenesis sults in enterochromaffin hyperplasia and eventually neoplasms In rats and mice, repeated subcutaneous injection of agents not (Chen and Hakanson, 2003; Hakanson et al., 1986). generally considered carcinogenic, may give rise to sarcomas (mes- Certain other chemicals, such as the herbicides alachlor and enchymal tumours) around the injection sites. Such agents include butachlor, also produce similar gastric neuroendocrine tumours concentrated solutions of glucose and other sugars, sodium chlo- in rodents by an indirect hormonal mechanism involving parietal ride, certain water-soluble food colourings and surfactants, carbo- cell loss and decreased gastric pH leading to hypergastrinaemia, xymethycellulose and macromolecular dextrans (Carter, 1970; enterochromaffin hyperplasia and eventually neoplasms (Heydens Grasso and Golberg, 1966; Hooson et al., 1973). The sarcomas are

Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 9 usually reported as fibrosarcomas or malignant fibrohistiocytic 2.11.2. Harderian gland tumours tumours and occasionally of other types such as rhabdomyosarco- The harderian gland is an organ in the orbit of the rat, mouse mas. Some of these materials, such as macromolecular iron dex- and hamster as well as many other terrestrial vertebrates but trans, have been used therapeutically in humans by the which is not found in humans. It is composed of tubuloalveolar parenteral route for many years without evidence of tumour induc- endpieces with wide lumens but devoid of an intra-glandular duct tion. As humans appear insensitive to these local tumorigenic ef- system. It secretes lipid-containing substances by a merocrine fects, the development of sarcomas at injection sites is not mechanism. In rodents, the secretions contain variable amounts considered to indicate potential cancer hazard to humans. of porphyrin pigment and the gland itself contains extremely high Likewise, in rodents, subcutaneous implantation of inert plas- levels of porphyrins. More prominent porphyrin deposits are ob- tics and other materials of certain dimensions and surface charac- served in female rodents. Adenomas and carcinomas in this organ teristics can give rise to similar sarcomas around implantation are found far more commonly in the mouse than the rat, although sites, the so called ‘Oppenheimer effect’ or ‘solid state carcinogenesis their incidence is variable and strain dependent. Reported inci- (Autian, 1973; Brand et al., 1976; Oppenheimer et al., 1953). This dences vary between 0.5% and nearly 15% (Eiben, 2001; Krinke phenomenon remains unexplained, for it does not fit easily into et al., 2001). An increase in incidence of harderian gland tumours conventional concepts of tumour initiation, promotion and pro- has been reported in mice following ionizing radiation and admin- gression. Preneoplastic foci are only identified with difficulty in istration of a number of chemicals (Gold et al., 2001; Haseman and the fibroblastic proliferation that normally surrounds the implants Lockhart, 1993; Krinke et al., 2001). Although tumour responses in (Kirkpatrick et al., 2000). Humans appear insensitive to these ef- the mouse harderian gland may be regarded as unique to this spe- fects, for epidemiological studies have shown little or no evidence cies, they are usually associated with treatment-induced tumours of carcinogenesis in association with medical or prosthetic im- in other organs and also tumour development in rats (Gold et al., plants of various types (Bryant and Brasher, 1995; Friis et al., 1993). Hence, such responses cannot be viewed in isolation and 1997; Sunderman, 1989). be regarded as unique to the mouse. It has been shown that the early tissue response to injection or implantation of inert plastics and the various subcutaneously in- 2.11.3. Brain tumours jected non-DNA reactive carcinogenic agents is different to that Relatively few xenobiotics produce brain neoplasms (neuroec- occurring following injection of potent DNA reactive carcinogens todermal or microglial tumours) in rodent studies (Gold et al., or oncogenic viruses. Whereas non-DNA reactive carcinogenic 2001). The only cases of therapeutic agents producing a tumori- agents elicit proliferation of fibroblasts and extensive collagen genic effect in the rodent brain are some anticancer alkylating deposition, the DNA reactive carcinogens appear to inhibit connec- agents, such as cyclophosphamide, procarbazine and chlorambucil tive tissue repair and produce morphologically abnormal fibro- (Davies and Monro, 1995; Gold et al., 1993). Other chemicals that blasts (Hooson et al., 1973; Westwood et al., 1979). DNA reactive have been shown to produce brain tumours in rodent studies also carcinogens are also typically associated with other forms of neo- tend to be mutagens and produce tumours in other organs (Gold plasms sometimes at distant sites (Grasso and Golberg, 1966). et al., 1993). In humans, no underlying causes have been identified Moreover, sarcomas start to appear rapidly in rodents, often before for the majority of malignant gliomas, and the only established risk 20 weeks, following injection or inoculation of potent DNA reactive factor is exposure to ionizing radiation. There is little evidence for carcinogens (Chesterman et al., 1966; Davenport et al., 1941; Hoo- an association with head injury, foods containing N-nitroso com- son et al., 1973; Tokiwa et al., 1987; Westwood et al., 1979). This is pounds, exposure to electromagnetic fields and occupational haz- in contrast to the 50 weeks or more taken for sarcomas to develop ards (Fisher et al., 2007; Wen and Kesari, 2008). The industrial in rats around inert plastic implants and non-DNA reactive chem- chemical acrylonitrile produced brain tumours in rats but not mice icals. Hence, histopathological evaluation of the nature of the tis- (IARC Working Group on the Evaluation of Carcinogenic Risks to sue response and the time to tumour development can be useful Humans, 1999). There is evidence for a MoA involving oxidative in the evaluation of mesenchymal tumours induced by injection stress (Jiang et al., 1998; Whysner et al., 1998). The epidemiology or implantation of xenobiotics. of acrylonitrile reveals no increase in cancer in humans (IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, 1999). 2.11. Other neoplasms Although the prevalence of brain tumours in rodents used in carcinogenicity bioassays is generally very low, incidence between In contrast to the neoplasms discussed above, significant in- strains and laboratories and different studies can be very variable creases in the tumour types in rodents discussed in this section (Haseman et al., 1998). The rat appears to develop brain neoplasms are more commonly associated with administration of genotoxic spontaneously with advancing age more commonly than mice. chemicals or likely human carcinogens (see Annex 1). Whereas incidences in rats generally range from 0% to 5%, inci- dences of up to 10% have been reported in some laboratories (Tuck- 2.11.1. Tumours of epidermis and adnexia including Zymbal’s gland er, 1997). Prevalence increases with age, and they are more A number of different reviews of the NTP data-base have shown common in males. However, they can occasionally be seen as early that the epidermis along with Zymbal’s gland, a modified seba- as 6 months in rat studies (Son and Gopinath, 2004). Moreover, ceous gland in the outer ear of rats, but not mice or humans, tend these tumours can be very small and their detection closely related to be targets for potent genotoxic carcinogens (Ashby and Tennant, to sampling procedures. 1991; Gold et al., 1993, 2001). These tumours are far less com- Rat brain tumours have been generally diagnosed as astrocyto- monly induced in rodents by non-genotoxic substances (Davies mas. Recently, however, immunohistochemical studies of acryloni- and Monro, 1995). Moreover, the NTP database suggests that trile-induced brain tumours support origin from microglia agents that produce tumours in Zymbal’s gland are strongly asso- (Kolenda-Roberts et al., 2013). Since microglia are of histocytic der- ciated with tumour development not only in the skin but also pre- ivations, which may inform the MOA assessment. putial glands in males and clitoral and mammary glands in This variability coupled with the inconsistency and paucity of females. Evaluation of tumourigenicity of such agents needs to histological brain sectioning, particularly in older studies, are fac- consider the overall pattern of tumour development in skin and tors that have the potential to confound interpretation of small accessory glands. group differences in tumour incidence at the end of 2-year studies. Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 10 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx

Here, consideration of dose–response, age of onset and pattern of drinking water. However, summary data (i.e., means with standard differentiation are particularly important in the evaluation of deviations per group) would in most cases suffice. In some stud- group differences in brain tumours in rodent studies (Koestner, ies, adjustment is performed during the conduct of the bioassay 1986). e.g. by calculating the target dose in units of mg per actual indi- vidual body weight weekly such that the planned exposure is maintained at the same level throughout the study. Availability 3. Aspects of statistical data analysis, experimental design and of such information in the key publication of the bioassay or statistical uncertainty in available supplementary information is a clear indication for study quality. Existing guidance for risk assessment of substances that are The choice and suitability of the route of exposure in a carcino- genotoxic and carcinogenic aims at assessing the quality of ani- genicity bioassay are essential considerations in human risk mal carcinogenicity data (EMEA, 2006; EPA, 2005; FDA, 2008; assessment. Oral exposure has been identified as the gold standard OECD, 2008, 2010) and focuses on aspects such as the choice for administration when assessing the risk of substances which of species/strain, doses used, duration of study, tissue examina- could be ingested by humans, in particular, when assessing food tion, diagnostic criteria and statistical analysis methods risks. It should be noted that the oral route itself can be modified (Williams et al., 2008). Such guidance may be used to assess in various ways. The compound can be mixed in feed or drinking the overall quality of a study when a decision has to be made water as mentioned above, but it can be also given directly by in- on whether that study is of sufficient quality to be considered tra-gastric gavage. All these modifications may affect the precision for risk assessment; however, it rarely provides sufficient advice of the dose values used in the analysis and hence the estimation of on the selection of data for dose–response modelling such as the the BMD as well as the calculation of the BMDL. When routes of Benchmark Dose (BMD) approach. Although good study quality exposure other than that of interest for humans are applied, mea- is often an indication of good data quality, the quality of tumour sures must be taken to calculate the respective oral intake equiva- data remains a separate issue when its suitability for dose– lents. For example, when dose–response data from dermal response modelling has to be judged. For basic elements of the exposure are the only data available for a substance to be assessed BMD approach and its application (see e.g. EFSA, 2009b; Benford for food risks one has to consider absorption, retention and kinetics et al., 2010). of that substance in the animal and account for this in the BMD This section looks specifically at the use of the BMD approach in analysis. Respectively, different measures will be needed when deriving a PoD or reference point (RP), for calculating the MoE of other routes, e.g., intra-venous or intra-peritoneal, are chosen. A genotoxic and carcinogenic substances and discusses issues arising fully developed physiologically-based pharmacokinetic model for from the design of carcinogenicity bioassays, data quality and sta- the substance and the animal species could be used for route-to- tistical analysis. route extrapolations. However, such models have been developed only in a few situations (e.g., for dioxins and then only for a few 3.1. Suitability of data for modelling animal strains). It should be noted that a dose–response analysis can also be 3.1.1. Utility and quality aspects of data from an in vivo toxicity study based on an internal dose which is usually expressed as a body General criteria for rating the quality of toxicity studies have burden in terms of concentrations in specified critical tissue (e.g., been proposed by Klimisch et al. (1997) and were further specified blood, brain or adipose tissue). Since the BMD approach is based recently by Schneider et al. (2009) concerning test substance iden- on formal mathematical modelling, the dose metric itself does tification, test organism characterization, design of the study, doc- not affect the estimation of the BMD and the calculation of the umentation of the results, and plausibility of results. Most relevant BMDL. However, the way by which, in each study, the dose mea- for the use of the BMD approach are the number of animals per sure is derived from the information available may have a large experimental group and the range and spread of concentration/ influence on the precision, e.g., when concentration data are trans- dose levels. When selecting dose–response data one should con- formed into intake data. When adjusting for the animal’s body sider, in particular, species, strain and sex differences in sensitivity weight, the measure could strongly depend on the availability to the test substance in target organs. Those need to be docu- and the precision of the body weight data. mented in study reports such that the BMD analysis can take this Studies following internationally recognized guidelines may be information into account. considered as being of sufficient quality, and preferably, these Most relevant for the size and precision of the BMD and BMDL should be used for the BMD approach. However, labelling a study are the number and the range of concentration/dose levels in the as a guideline study should not obviate the need for its thorough study. In principle, these are defined by the design of the study examination. There is always some flexibility in guidelines, which but also by the type of dosing, i.e., whether the amount of com- allows for designs less suitable for dose–response analysis, in par- pound administered is adjusted for the individual animal’s body- ticular, when the primary study objective has been hazard identi- weight (b.w.) or not. Adjusted exposure is usually measured as in- fication. Therefore, compliance with the criteria of guideline take, e.g., in terms of mg substance per kg b.w. per day. Otherwise, studies does not necessarily mean that the study has a high qual- dosing may be expressed in terms of concentrations of the com- ity for the BMD approach and, vice versa, one cannot assume that pound in the medium administered (e.g., as amount per kg of feed a non-guideline study is not suitable for the BMD approach. Gen- ingested per day or as a concentration in drinking water (typically erally agreed criteria to judge the quality of a study for dose–re- expressed in units of ppm). Factors for converting chemical sub- sponse and BMD analysis are difficult to establish and, at present, stance concentrations in feed or drinking water into daily doses not available. However, quality standards regarding the test in studies on rats and mice have been suggested by EFSA (2012). material used in the bioassay, the genetic origin and stability of Most suitable for dose–response evaluations would be dose data animal strains and the health status of the animals at the start adjusted for the animal’s body weight over the whole duration of of the experiment have been laid out in guidelines and should the experiment, i.e., from the start of treatment until end of study not be different for data used for BMD analysis than for any other or death of the animal, performed at regular periods, e.g., weekly. purpose. Risks from failure to comply with those standards Preferably, one would use individual weight data for the analysis, should have been considered already when planning key studies eventually also data per cage, e.g., when dosing was through and again when their data are used for dose–response analysis Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 11 and risk characterization. Box 1 lists major points to be consid- incidence data. Furthermore, each individual animal must be ered when assessing the choice of the test substance in a study precisely evaluated at necropsy using the most appropriate to be used for dose–response assessment. The answer to these methods. Biological precision differs fundamentally from statis- questions has direct consequences on the uncertainty of the out- tical precision, which depends on the number of doses and a come considered in more detail in chapter 4. sufficient number of animals per dose group. When using the BMD approach, the power of the statistical analysis, given the design of the study, depends primarily on the total number of Box 1 Major criteria for characterization of a test animals used and a sufficient number of dose groups tested. substance. For an extended discussion of dose selection, see (Rhomberg et al., 2007). A reasonable balance between the number of ani- Is the substance tested representative for the material to mals per dose group and the number of dose groups is essen- which human exposure occurs tial, see also Section 3.5. When using the BMD approach Is the purity given? statistical precision is technically determined through mathe- Was the stability tested (including in feed where matical calculations during model fitting and usually expressed appropriate)? in terms of statistical confidence intervals, including the one- Are the impurities/contaminants characterized (where sided confidence limit BMDL. Model fitting is, however, chal- appropriate)? lenging when only a few dose groups are available. Although What solvents were used (if appropriate)? one can estimate the BMD and calculate the BMDL even with only two dose groups and a control, such an analysis would be limited by the fact that model selection would then be re- stricted and not all models suitable for a BMD analysis in gen- 3.1.2. Suitability of data of an animal carcinogenicity study eral – and available in software – would be applicable; and, if In principle, a carcinogenicity bioassay is carried out according applicable, only with incomplete statistical quality. In fact, some to international standards, applies fixed protocols and typically is models would be equivalent to the so-called full model (see conducted under strict GLP guidelines. Taking the OECD Guidelines Section 3.6) such that the uncertainty of the BMD could not TG 415 (OECD, 2008) as an example the protocol specifies the fol- be addressed correctly, even when a BMDL is calculated for- lowing criteria: mally. Furthermore, the different approaches implemented in different software solutions may add uncertainty. – At least one strain of rats and one of mice of both sexes; Other challenges arise when there are multiple endpoints – At least two doses, with one being a maximally tolerated dose; in one study, when outcomes from different studies are to – 50 or more animals per sex and per dose group; be combined, or when information on background incidence – Assessment of approximately 40 tissues from typical sites of is to be incorporated. Combined analysis using covariates tumour induction; (covariate adjusted BMD analysis) is the preferred method – Availability of historical control data. over an approach which pools the data, since the latter may lead to biased and imprecise BMD/L values. Covariate adjusted However, studies with only two dose groups and a control BMD analysis needs, however, specific considerations, both group often do not provide suitable data for a full BMD analysis statistical and biological ones as discussed shortly below in since not all models established for the BMD approach (see EFSA, Section 3.8.3. 2009b) can be fitted with sufficient statistical quality in such cases When bioassay protocols deviate from guidelines or common (see below in this section). practice (e.g., in their definition of target organ carcinogenicity Critical for the calculation of the tumour incidence is the classi- or of what is the tumour endpoint) the quality of the outcome fication of an animal as tumour bearing or non-tumour bearing. of the BMD analysis may become questionable, at least, if those Since the number of animals per dose group is limited and may deviations were not specified in advance, e.g., in the study de- be in practice as low as 10 animals per dose group (rarely reaching sign. This applies in particular for the analysis of rare tumours. the size of n = 50 often used in cancer bioassays of the U.S. National Although one may still perform a dose–response analysis, esti- Toxicology Program (see e.g. Rieth and Starr, 1989; Gaylor, 2005) mate the BMD and calculate the BMDL, the relevance of the any error in the assignment of the cancer endpoint (even in only outcome must be evaluated together with the biological and one or two animals per study) can change the BMD and BMDL val- toxicological quality of the bioassay. Since positive controls ues appreciably since the total number of animals is an important are usually not used in carcinogenicity studies, not least for ani- mathematical parameter in the BMD analysis. Premature death mal welfare considerations, small numbers of tumours may and reduced survival rates could lower the size of the study fur- indicate low incidence, but also low assay sensitivity, e.g., when ther. Although this can be accounted for by using the so-called the dose range was restricted to doses that are possibly too effective sample sizes per dose group instead of the original sizes, low. When dosing in the experiment or the experiment itself it could bias the BMD analysis when the amount of the reduction is terminated before the standard lifespan of the animals, ob- depends on the dose (see the remarks on intercurrent mortality be- served incidences must be adjusted for time of actual dosing low in Section 3.8.2). and the standard lifespan (i.e. 2 years for rats), which is not There have been several examples in the past where a re- straightforward, see also Section 3.8.2. Similarly, estimates of examination of the tumour pathology changed the conclusion the dose administered should also be adjusted when the ani- on tumour incidence. This can occur when additional sections mals are for practical reasons dosed for less than 7 days per are examined. It is also important to specify the tumour end- week, to provide BMD and BMDL values for lifetime exposure point precisely when both malignant and benign tumour data as required for cancer risk assessment. Obviously, all such are available (e.g., when liver is the target organ for carcinoge- adjustment needs detailed information from the course of the nicity and hepatocellular carcinoma and adenoma have been bioassay described in the study report or publication. Box 2 observed or when additional subtypes of tumours in the liver lists major points to be considered when assessing the study were identified), see chapter 2 for detailed biological consider- design and reporting of a dose–response experiment with direct ations regarding the biological information content of cancer consequences on the uncertainty discussed in Section 4. Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 12 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx

goodness-of-fit criteria based on likelihood theory and uses also the Akaike Information Criterion (AIC, see EFSA (2009b) and so- Box 2 Major questions for assessing study design and called local-goodness of-fit methods for model selection. It should reporting. be noted that such a stepwise approach differs from what is recom- mended in this document and also in EFSA (2009b). It has also been Was the study performed following accepted GLP proce- suggested to base goodness-of-fit on a series of statistical hypoth- dures, internationally recognised guidelines or GLP equiv- esis tests (realized for example in the BMDS software of US EPA alent procedures? when applying nested exponential models to continuous data). Were species, strain, sex and the administration of the test Although such a strategy adds to objectivity and transparency substance adequate and adequately described? when deciding on the final model to be used, it is not without arbi- Was dose selection adequate (i.e. regarding absolute trariness due to the multiplicity of testing and the absence of sta- dose-level, dose spacing, number of dose-groups)? tistical rules of how to set the significance level for a sequence of Was the group size adequate? goodness-of-fit tests. When there are several studies and a multi- Was the study duration over the standard life-span of the tude of tumour sites and all data sets would be formally tested species and, if not, were the outcomes adjusted? (e.g. using a trend test), it is obvious that because of such multiple Was feed (or water, if applicable) consumption measured testing, one must expect more false positive test results than what and, if not, were conversion factors used and are those is controlled by the nominal statistical significance level. Account- adequate? ing for such multiple testing is difficult since a significance level set too small may lead to rejection of data sets which contain relevant dose–response information for a BMD analysis. Another issue is the choice of the statistical test procedure itself 3.2. Selection of data for dose–response evaluation when testing for the presence of a dose response relationship. Using a trend test, which is a very sensitive statistical tool, could When carcinogenicity is the critical effect, the dose–response be misleading, such as when all effects remain below the primarily curve should be monotone-increasing based on a reasonable num- chosen level of the BMR since one may be tempted to lower the ber of dose groups. Because of statistical variation, observed tu- BMR to effect sizes ‘‘detected’’ by the trend test. In principle, a data mour incidences often deviate from monotonicity, in particular, driven determination of the size of the BMR is not advisable. Sim- when the total number of animals per dose group is small and as ilarly, when using a trend test, one may accept data where the re- such the precision of the estimate of incidence at the individual sponse rates do not change for some doses but are high at a few dose groups is low. Therefore, the observed shape of the dose–re- very high doses since the test may claim statistical significance sponse curve alone is not a sufficient criterion for selecting dose– in that case. However, such data may signal specific biological ef- response data. Depending on numbers of animals per dose group, fects which are not appropriate for dose–response analysis. In such the number of dose groups and the number and type of models and similar cases, a pre-inspection of dose–response data using available, situations are conceivable where a statistically justified only statistical testing may become difficult and non-defendable. fit can be obtained (at least with some models) even when the A particular problem is presented by dose–response curves curve (e.g., as seen by visual inspection) deviates from monotonic- which increase slowly at low doses or exhibit highly variable re- ity. Such deviation may, however, be implausible for biological rea- sponses such that a positive trend can hardly be established. Fal- sons, e.g., when available information on the MoA would sely ruling out those data sets from a BMD analysis would likely contradict the type and extent of qualitative changes in response also rule out some BMDL values informing on the possible risks. over a certain dose region. Hence, such data may not be suitable It should be noted that it is difficult to predict in general whether for a dose–response analysis and also not for the BMD approach. the BMD estimated from a slowly increasing curve would lead to At the same time, it may be very difficult to set a biological limit an over- or underestimation of the PoD since the BMD depends on the degree of monotonicity necessary to comply with the also on the level of the BMR chosen. MoA assumed for carcinogenicity of the substance considered. When the whole dose response curve stays below the chosen Even more difficult would be to translate that biological constraint BMR value (as default for tumour incidences chosen as 10%) then into a statistical limit on the goodness-of-fit for selecting models. the BMD and usually also the BMDL would be undefined (numeri- Therefore, in risk assessment practice, the biological relevance of cally its value would be infinity when analysed statistically) or deviations from monotonicity must be weighed against the statis- they may range near the highest doses of that experiment, in par- tical significance of model fitting such that statistical model selec- ticular, when that dose showed a response near or higher than the tion does not overrule available biological dose–response BMR. In both cases, a PoD derived from the BMD approach should information. This section discusses issues of this complex relation- be interpreted with caution and the uncertainty should be indi- ship between data selection and model selection and reviews prac- cated in the risk characterization and when calculating a MoE. tical proposals for data selection. Decreasing the level of the BMR from 10% to a lower value only The more the data exhibit a convincing association between for modeling reasons is not recommended in general, except when exposure and cancer incidence, the more evident would be the out- biological reasons support the deviation or when the design of the come of a dose–response and BMD analysis. There have been at- study was aimed at such very low incidences explicitly. In practice, tempts to qualify data through pre-screening dose–response data a study with low incidences at all doses examined may indicate a for their suitability for dose–response analysis and the BMD ap- serious deficiency of the design and would therefore not be eligible proach, e.g., by establishing criteria for the selection of datasets for a BMD analysis. (see e.g., Davis et al., 2011). Therefore, decision trees were con- Similar consequences may arise when the data exhibit high var- structed, which are based on a number of criteria using both qual- iability. This happens particularly when the total number of ani- itative and quantitative statistical elements and criteria both for mals per dose group is small and the observed incidence has a the data examination (e.g., requiring statistical significance when large standard deviation such that the observed dose–response using a trend test) were formulated. This results mostly in a step- curve appears irregular. However, high variability may also be wise determination of a final model in which one BMD/L is then the result of increased biological variability, e.g., in cases where tu- determined. This approach does not make strict use of the standard mours depend on hormone or immune system-related pathways Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 13 or the sample sizes are in the moderate range (n = 10–20). Given analysis, and as such, it resembles the pre-screening approach the range of responses observed is large enough to obtain an esti- using a trend test discussed above and which is not recommended mate of the BMD, the goodness-of-fit may then still be insufficient. for establishing a PoD. Because of inherent imprecision in the BMD estimate, the BMDL In conclusion, it is noted that when the BMD software is not suf- may be then be orders of magnitude lower than the BMD. In the ficiently robust, e.g., when it comes to analysing sparse data sets or latter case, one would not use this study for the establishment of dose–response data with complex shape, an in depth examination a PoD. EFSA (2009b) actually advises that a BMD analysis where of the utility of studies is recommended before dose–response the BMD and the associated BMDL values differ by an order of mag- analysis is performed technically. In particular instances, toxicolo- nitude should not be used for risk characterization. In addition, one gists and modelers may diverge from each other in their judge- can impose a restriction that BMDLs should not be an order of ment on the suitability of the dose–response data. Current magnitude lower than the lowest dose used in the cancer bioassay, practice for data selection is such that data sets on endpoints not see also Section 3.4. showing dose response relationships are normally not used for Such situations – slowly increasing dose–response curves and/ deriving a BMD. The decision to disregard data and/or endpoints or high variability of response – lead to uncertainty of the MoE for modeling is done by visual inspection of the data, or more for- which may only be reduced by more data, either through more mally, by including them in the BMD analysis in which case the dose groups or through more animals per dose group. As a rule, a decision is made on a statistical basis. In view of the aforemen- study suitable for the BMD approach should include doses where tioned difficulties, a combination of biological and statistical a sufficiently high-level of response, preferably with at least one expertise can help prevent false selection or false discarding of dose at which a response in the range of 50% or higher is expected. data. In the interests of transparency, the reasons for excluding Leveraging the significance level of the BMDL to a higher value (i.e., any data set should be stated clearly. using a confidence level of 10% in contrast to 5% and as such allow- ing for a narrower confidence interval) may help to formally calcu- 3.3. Key steps of the BMD approach late the BMDL, but introduces further arbitrariness and is not recommended. The application of the BMD approach for estimating the BMD, It is obvious that the considerations given above to one dose– and establishing the BMDL as a PoD – can be summarised (EFSA, response data set have implications when several data sets (from 2009a) in four main steps: one or several studies) are analysed. Data sets with a slowly increasing dose–response curve and/or with high variability of re- I. Specification of type of dose–response data sponse could generate a set of low BMDL values to be considered II. Specification of the BMR together with much higher BMDL values obtained from available III. Selection of candidate dose–response model(s) values from other data sets, see Section 3.7. IV. Identification of acceptable models. The BMD approach is a general method of fitting dose–response models and one may tentatively apply it to any dose–response data In general, BMD and BMDL values for carcinogenicity are calcu- for determining whether a BMD estimate with a confidence inter- lated for an extra risk specified through the level of the Benchmark val can be obtained for the identification of a PoD. This idea makes Response (BMR) and a set of agreed models is fitted using appro- use of a general equivalence of the existence of a dose–response priate software. The Scientific Opinion of the EFSA (2009b) pro- relationship and the complete and exhaustive characterization of posed a BMR = 10% as default for carcinogenicity data addressing, dose–response data by the BMD approach: considering the BMR in particular, experimental animal data where the number of ani- and the level of the confidence interval of the BMD (BMDL, BMDU) mals per dose group would usually be not larger than n = 50, but as free ‘‘parameters’’, the application of the BMD approach is noth- often smaller. ing other than a full assessment of the dose–response information The four steps of the BMD approach listed above should be content in the data set. Varying the level of the BMR, e.g., between accompanied by appropriate reporting, not only of the results fi- 0% and the maximum possible BMR, corresponds to varying the nally obtained, but also of all relevant information that would al- dose between 0 and the largest dose considered. Varying the level low other risk assessors to judge and eventually repeat the of the confidence interval is related to varying the degree of al- analysis. A comprehensive scheme for reporting has been compiled lowed statistical uncertainty of modeling. Therefore, the difference in EFSA (2009a) and reiterated in EFSA (2011b). Besides providing between the BMD and the BMDL and between the BMD and the information on all of the endpoints it is advisable to justify any BMDU or the difference between BMDL and BMDU, is a measure decisions made during the BMD analysis, e.g. regarding model for the suitability of the BMD approach to that data set. When selection, and to include in the report information on data sets applying the BMD approach in this manner to a set of studies and studies that were not used. The stepwise and decision tree and data sets, those suitable for the BMD approach would be iden- based procedure of Davis et al. (2010) differs from the EFSA ap- tified. Because of this equivalence, one may set up a data screening proach in that it uses an adaptive approach to find the best fitting program, easy to implement in current software, for the time effi- model in contrast to the EFSA approach, which is based on finding cient screening of many data sets and many potential endpoints, in all models which are compatible with the dose–response data, i.e., particular, when the set of models used is somehow restricted in those with an acceptable fit, once the data have been selected. advance either by statistical or by biological arguments. However, The BMD approach accounts for the statistical variability of the in practice, this approach is unlikely to relieve the risk assessor dose–response data by calculating the confidence bounds. Primar- from the obligation of examining the utility of the data for dose–re- ily, statistical uncertainty of the BMD is addressed through its con- sponse analysis in detail. Furthermore, it must be noted that this fidence interval ranging from the lower bound (the BMDL) to the approach is not without assumptions and depends on the criteria upper bound (the BMDU). The default confidence level is 95% used to decide on the acceptability (e.g., on the maximum allowed (one-sided) and so the interval (BMDL, BMDU) is a two-sided con- distance between BMD and BMDL) and on the significance level of fidence interval of the BMD at the level of 90% which can also be a goodness-of-fit test chosen, notwithstanding the need of having used as a means to express the variability of the BMD. It should the appropriate software at hand. Finally, such an automatic statis- be noted that the BMDL or the BMDL – BMDU interval covers tical screening of all data available would rarely inform why a spe- exclusively uncertainty of the underlying data. Properties of the cific study and its potential endpoints were not amenable to a BMD design of a study, e.g., the choice of the number and location of Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 14 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx dose levels and also the sample sizes of each dose level, are covered defining optimal designs do not exist for BMD modelling. Indeed, by the BMDL and BMDU only as far as the models fitted to these for establishing an optimal design one would have to know in ad- data allow for. Therefore, general design issues may add to the vance the ‘‘true’’ biological model in order to determine the opti- uncertainty of the BMD and as such also of the BMDL. mal number and location of the dose levels and sample sizes for Another type of uncertainty arises when different models are the BMD analysis. Although statistical approaches exist to define fitted to the data and when some models fit equally well but result approximate optimal designs, those require a certain amount of in different BMDs and BMDLs. This reflects model uncertainty. To knowledge on the shape of the dose–response, which may often take that into account, various models need to be fitted to the same not be available when designing a carcinogenesis bioassay. At pres- dataset. Selecting the BMDL of the best-fitting model is likely to ent, designs of carcinogenicity studies are mostly based on agreed underestimate the uncertainty in the BMD approach, while select- guidelines and experience, and their design parameters are often ing the model with the lowest BMDL generally results in an over- not optimally chosen for a BMD analysis. As a consequence, no sta- estimate of the risk. The model averaging approach (see e.g., tistically sound rules exist to assess the relationship between the Wheeler and Bailer, 2007, 2008), characterises the uncertainty in size of the BMR and design parameters. Obviously, if dose levels the value of the BMD on statistical grounds. It takes account of were chosen too low, the effect size could become so low that the relative likelihood of alternative models and hence the model the response would not meet the BMR of 10% for most or even average BMDL is expected to be numerically higher than the lowest for all doses chosen, see also Section 3.2. Then, no BMD and BMDL BMDL value resulting from applying a suite of models. That ap- can be determined and the modelling may not be useful at all. If proach was used by Benford et al. (2010) when an expert group the dose levels are chosen too high, such that at most or all dose working in close collaboration with WHO/IPCS and EFSA carried levels the incidence is near or equal to 100%, a BMD would be very out case studies on 12 different chemicals and investigated the difficult to estimate since the observations would merely indicate application of the MoE approach to genotoxic carcinogens in food; that it is lower than the lowest dose but could be close to zero. see in particular Section 3.6. This aspect has also been discussed In the extreme, when the dose–response relationship jumps from among risk assessors when selecting the lowest BMDL from the a control or a low dose with an effect still below the BMR to an ef- set of all accepted models and not the BMDL of the lowest BMD. fect well above the BMR at the next higher dose, parsimonious Therefore, selecting the lowest BMDL accounts for the largest pos- modelling would essentially reduce to linear interpolation be- sible uncertainty within the set of models chosen at the higher end tween two adjacent doses. Those cases have led to recommenda- of possible risks and is a conservative approach as it treats all ac- tions that, studies with more dose levels and less animals per cepted models as equal. dose should be used for BMD analysis. It should be noted that this would usually not much reduce the total number of animals per 3.4. Specification of a BMR study when the same precision of estimation is kept in modelling. All these observations emphasize that the BMD approach is design When BMD modelling was first introduced in risk assessment in sensitive. Deficiencies in study design usually result in low BMDLs 1984, it was already advocated that the BMR should be set equal to but may also make calculation of a BMDL impossible. In the latter a low but measurable response level reflecting an effect that is neg- situation, the conclusion can only be that the database is too poor ligible or non-adverse. The default BMR for cancer incidence in ani- for a dose–response analysis based on the BMD approach. mal studies with an extra risk level of 10% is far from any Finally, it should be noted that choosing the BMR as a level of acceptable human risk level which historically ranges between extra risk makes the approach relatively robust against variations 104 and 106, corresponding to BMRs between 0.01% and of the response in the control since that BMR is defined as a spec- 0.0001%. Setting the BMR equal to 10% and as such at or near the ified increase in incidence over the modelled background, see EFSA limit of sensitivity of most cancer bioassays (usually with sample (2009a). Clearly, if the background level is very high, the sensitivity sizes not larger than n = 50 per dose group) must, therefore, be of the approach is reduced. In this case, information on historical viewed as a compromise between measurability and relevance. controls could be used to assess the quality of the study as indi- Choosing the BMR too low would normally result in an extrap- cated in Section 4.1. Otherwise, historical control information is olation outside the range of the observed data and induce severe not used in the BMD approach. model dependence of the BMDL. Such a low BMR could let different models return drastically different BMD and BMDL values, reduc- 3.5. Selection of dose–response models ing confidence in the modelling per se. This could be characterized as a situation where the risk assessment would be driven by the There exists a standard set of dose–response models for risk models fitted to the data and not by the data. On the other hand, assessment of carcinogens in available software packages (e.g., there have also been tendencies to define the BMR even higher BMDS, PROAST) which can be used for the BMD analysis. Recom- than 10% where effects were more likely, e.g., at response levels mended are eight dose–response models: of 25% and even 50% (see e.g., Sanner et al., 2001) and Gold et al. (1998) who recommended two dose descriptors, T25 and TD50, – Probit respectively, for the ranking of carcinogens and for risk assess- – Log-Probit ment. Such an option would be reasonable from statistical grounds, – Logistic when several dose groups exhibit zero response and when rather – Log–logistic large incidences, say of 50% or higher were seen at all higher doses – Weibull (see Section 3.2). However, such data sets contain limited dose re- – Multistage sponse information and may fail the goodness-of-fit criteria re- – Quantal-Linear quired for the BMD approach, at least for some models. The – Gamma-Multihit BMR = 10% for carcinogenicity studies should be seen for what it is, a default where deviations from this value should be docu- This set is thought to be flexible enough to cover a wide range of mented and justified. dose–response relationships of tumour incidence (EFSA, 2009a). The design of the study has the strongest influence on the esti- Whether one should already restrict the set of models used at mate of the model parameters in statistical modelling (see Sec- the beginning of a BMD analysis is questionable, since it is, per tion 3.1.2). Unfortunately, statistical methods developed for se, a rather small set of models and may just cover most of the Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 15 shapes of dose–response data observed in carcinogenicity studies. proach has not been recommended (EFSA, 2009a); therefore, any All these models are defined by a structural form, the model equa- model chosen would simply depend on the metric defining what tion, and the model parameters. The parameters are naturally con- is ‘‘best’’. strained, such that the response values range between 0% and 100% Another model selection method is to find a set of models and the response itself, i.e., tumour incidence, increases monoton- which all exhibit a fit to the data that appears acceptable to risk ically with dose. Obviously then, the slope of a dose–response assessors and may be judged to be ‘‘sufficiently’’ good. In that case, model should always be positive. acceptability of a model is statistically assessed using the likeli- By imposing additional constraints on model parameters, one hood value associated with the fitted model versus the full model can restrict the possible range of modelling. An often used as well as versus the reduced model. The full model is the model restriction aims to exclude dose response curves which have a that does not assume any dose–response function (its parameters steep slope near or at the origin (i.e. at dose = 0). In fact, some are simply the frequencies observed per dose level). Its log-likeli- models have a mathematically infinite slope at the origin, i.e. hood is, therefore, identical for each model fit as long as the same the curve starts at d = 0 as a vertical line but bends itself imme- data set is used. The same holds for the reduced model reflecting no diately afterwards. Those curves can be excluded by restricting dose-relationship and where a straight line parallel to the dose axis the slope at d = 0 to be not higher than a fixed value, say 1. This representing the mean response in the study is fitted. The statisti- has been implemented as an option in the BMDS software. It has cal fit of an acceptable model should be statistically significantly been noted, in manuals for the BMDS software (see e.g. EFSA, better than the reduced model (p < 0.05) and not significantly 2011b; EPA, 2012b) that constraining the slope parameter would worse than the full model (p > 0.05). In cases where none of the be a preferred option to take to exclude an infinite slope at dose models pass these criteria, visual inspection of the data may show zero since that would be biologically implausible. However, it that some models still adequately describe the observed dose–re- has also been argued that this option should be avoided and that sponse. In that case, the decision to accept a particular model the full range of model parameters should be allowed for each needs to acknowledge the high level of uncertainty in the BMD model. For the log–logistic, Weibull and Gamma models such and the BMDL value. constraints translate into constraints of the shape parameter, It should be noted that the choice of the significance level of usually denoted by c, such that c > 1 is often considered as a de- 0.05 above is based more on convention than on statistical reason- fault constraint. It was also suggested not to restrict the slope of ing related to risk assessment and should therefore be considered the log-probit model allowing for a very rapidly rising tumour as a default value. Deviations to lower or higher values may be jus- incidence. Others (e.g., Wheeler and Bailer, 2007) argued that tified when either many or only few data are available, respec- relaxing the lower bound of the shape parameter in the Weibull tively. When analysing one data set, the general principle of the Model from 1 to 0.5 would allow models to be somewhat supra- BMD approach as developed by EFSA (2009a) is to find all models linear and would improve the statistical properties of the model that are compatible with the data, i.e., those with an acceptable fit averaged BMDL. That approach was used by Benford et al. (2010) where the comparison is with the full model and a goodness-of-fit in model averaging. test based on the profile likelihood. When only non-nested models It should be noted that so far, no criteria have been developed to are available, an acceptable model should, in principle, provide a guide risk assessors in the use of constraints. This may be ham- reasonable description of the dose–response data, according to a pered by an inconsistent terminology in the parameterization of goodness-of-fit test with a p-value greater than 0.05 when using the model families implemented in BMD software but also by an the profile maximum likelihood method and the default p-value. intrinsic difficulty to visualize how response depends on both the Once all selected models have been fitted, a series of statistical dose and the values of the parameters in a model. As a default, it judgements are made to assess their fit. When none of the models is recommended not to constrain the model parameters as long passes the goodness-of-fit test, but visual inspection indicates an as there are no convincing biological arguments available. This is adequate fit, assumptions for modelling may be violated and one supported from a statistical point of view in keeping the space of may either reject some or all of the data or accept a lower p-value. the model parameters as wide as possible in order to avoid fitting Having a range of ‘‘sufficiently’’ fitting models available, risk asses- problems near (multiple) boundaries defined by restrictions im- sors have often preferred to choose the lowest BMDL value posed on model parameters. amongst those available. This may overestimate the uncertainty of the BMD and, as such, also risk of the genotoxic carcinogen, 3.6. Identification of acceptable models and assessing model fit see Section 3.7. An alternative selection method is the more recently devel- When fitting several models to a dataset, one may select oped Bayesian model averaging of Wheeler and Bailer (2007) where the estimates of the different models are combined (a) the ‘‘best’’-fitting model through a weighted average of the dose–response models, (b) a range of ‘‘sufficiently’’ fitting models where the weights reflect the relation of the fitted curves to (c) an ‘‘average model’’ BMDL determined through model the observed data. In Bayesian model averaging, these weights averaging. represent the models posterior probability of being the true model given the data observed. As a consequence, this method When selecting the ‘‘best’’ fitting model, researchers use a met- assumes that the true model is one of the models in the family ric, e.g., a goodness of fit index or a likelihood based criterion, to of models being averaged. As such, the BMDL calculated from identify a model from which the BMD and the BMDL would be de- model averaging reflects both the sampling variability and mod- rived. Choosing the model with the largest p-value of the good- el uncertainty and is expected to be numerically higher than ness-of-fit test could be such an approach. The maximum the lowest BMDL of the underlying set of models. Benford likelihood value of a model fit would be the preferred index when et al. (2010) applied the Bayesian model averaging approach the set of models represents a ‘‘nested’’ model family, such that of Wheeler and Bailer (2007) to a selected number of genotoxic this value could be compared on sound statistical grounds. Since carcinogens such that at first each of seven models (Probit, Lo- the set of models identified for genotoxic carcinogens in Section 3.5 gistic, Log-Probit, Log–Logistic, Multi-Stage, Weibull, Gamma) does not constitute a nested family, determination of the best fit- was fit separately to each data set and then model averaging ting model is, in a strict statistical sense, impossible, and this ap- was performed as follows: Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 16 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx

i. For each model, assess the log-likelihood values associated with study as a covariate, which again would then allow the selec- with the best fit; tion of a lowest BMDL as PoD from a range of BMDLs. However, ii. The log-likelihood values are considered as a measure of the when the data sets are too heterogeneous, e.g., when originating relative goodness of fit of the models. After penalising for the from different studies performed at different sites and times and, number of estimated parameters in the model, these are in particular when performed under different designs, one may de- used to compute the weights; cide against such a combination and chose as PoD the lowest BMDL iii. For any particular dose (not necessarily an applied dose) a obtained from all acceptable models of all data sets available. This model average response is calculated, by taking the pre- allows an overall BMDL of that study to be defined. When different dicted responses for each model, and calculating a weighted studies are sufficiently comparable in terms of design for the average of them, with weights from point ii. In this way, (same) tumour response, but combined analysis is difficult, one models that fit well are taken into account more heavily may calculate the average of the lowest BMDLs over the studies compared to models that fit less well; and use that as the PoD. However, for a meaningful presentation iv. Find the dose where the model average response is equal to of the overall data one may also decide to report such studies the BMR. This dose defines the model average BMD; separately. v. The BMDL was then calculated as the lower 5th percentile of Selecting the lowest BMDL tends to be conservative, but is rec- the cumulative distribution of 2000 BMDs obtained from a ommended until more advanced methods, e.g., more probabilisti- bootstrap type of simulation. Therefore using the average cally-based such as (Bayesian) model averaging, have been fully model from step iii) the expected proportion of animals with developed and validated. tumours was computed for each dose and 2000 data sets were ‘‘boostraped’’ taking the actual dose levels and sample 3.8. Selected aspects of data analysis sizes of the experiment and sampling the tumour bearing animals from a binomial distribution with a probability 3.8.1. Non-standard carcinogenicity assays equal to the expected proportion per dose. Finally an aver- The gold standard study design for determining lifetime cancer age model was fitted to each of the 2000 data sets generating risk for genotoxic carcinogens has been the 2-year continuously the cumulative distribution of the BMDLs. dosed carcinogenicity bioassay. In practice, available data for a BMD analysis may not comply with this when, e.g., the duration The calculations used the software package mabmd described in of the study is shorter (e.g., lasting less than the standard life span Wheeler and Bailer (2008). of the animal species) or when dosing is non-continuous (e.g. 5 days/week, see also Section 3.1). For ‘‘synchronizing’’ the differ- 3.7. Determination of the BMDL as the PoD ent designs, the dose metric then needs to be adjusted to the long-term-bioassay i.e. for rats a 2-year study. Studies with sub- The determination of a PoD follows a stepwise process account- stantially shorter duration (e.g., only 50 weeks or even only ing for (i) the selected tumour responses, (ii) the selected studies 90 days), in which cancer incidence has been reported together where those have been investigated, and (iii) the accepted models with other toxicities and where carcinogenicity may not have been of each study data set. the primary endpoint, have to be considered with great caution for When analysing the dose–response data of a carcinogenic bio- use in BMD analysis and establishing a PoD and MoE since the time assay, the BMD analysis would ideally be performed on one critical of observation may be too short for tumorigenesis. Low overall tu- tumour response identified as the most relevant for the risk assess- mour incidence, e.g., compared to historical controls, and the ment, typically the one of highest carcinogenic potency that then occurrence of precursors, e.g., pre-neoplastic lesions and adenomas likely results in the lowest BMD and BMDL values. For that tumour only, may be taken as an indication of such a premature study. In response, one estimates the BMD for each of the accepted models that case, it may be difficult to adjust the data and a BMD analysis and calculates the respective BMDLs. The values obtained from is then not possible for carcinogenesis as endpoint. the accepted models determine the range of BMDL values reflect- When different dose metrics were used in different studies and/ ing differences between those models. As outlined above, one or for different endpoints (e.g., administered dose per initial kg may require that the range of those BMDL values should not ex- b.w., average administered dose per weekly kg b.w., average initial ceed one order of magnitude as an acceptance rule to establish internal dose, peak internal dose, cumulative lifetime dose) toxic- the set of accepted BMDLs. In cases where one most relevant or okinetic modelling could be applied if sufficient data are available. most sensitive tumour cannot be identified, e.g., when, for several Such modelling would normally require individual animal data. tumour types, a genotoxic MoA cannot be excluded, one would Adjustment for type of exposure (e.g., diet, drinking water, and ga- proceed in the same way for each of the selected tumour re- vage) would require detailed data on feeding, food/water con- sponses, then identify in the hazard characterization step for that sumption, body weight and measured content/concentration of study a set of tumour-specific PoDs. Pragmatically, one may define the chemical in diet and water in order to estimate a dose for the a study BMDL by taking the lowest BMDL as PoD to provide a BMD analysis, see also Section 3.1.1. Using default assumptions conservative approach. Alternatively, one would keep those about food or water consumption and animal weight in case where tumour-specific PoDs separately and calculate consequently only summary data are available (e.g. weekly mean body weight of tumour-specific MoEs for that study and assess them together with dose group, weekly consumption per cage) would be a source of their uncertainties in a comprehensive risk characterization step uncertainty. accounting for the carcinogenic MoA and its relevance for humans (see Section 4 and the case studies). 3.8.2. Intercurrent mortality In risk assessment, it is common practice to characterize a tu- The evaluation of overall tumour incidence and time-dependent mour response in one data set by one value only, and, as such, tumour incidence may be confounded by intercurrent mortality in selecting the lowest BMDL as the PoD is a natural option and con- long-term carcinogenicity bioassays (Gart et al., 1986). This can sidered to be conservative, as it will lead to the lowest MoE for that bias comparisons between dose-groups when different proportions data set. If more than one data set is available for the same tumour of animals die in those groups during the course of the study. response, e.g., from several studies, one may examine whether When premature death is due to toxicity from an effect unrelated these data sets can be combined for a comprehensive BMD analysis to the critical endpoint, then intercurrent mortality could Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 17 be adjusted by calculating adjusted incidences for each dose group. subpopulations and assess heterogeneity and differences between A simple adjustment is the poly-k method proposed by Bailer and them. On the other hand, human exposure data are often more Portier (1988) which modifies the denominator in the quantal imprecise than those from a well-designed animal experiments, estimate of incidence using the total number of animal years at risk which adds to the uncertainty of calculated BMDLs. (see also (EPA, 2012b; Portier et al., 1986; Bieler and Williams, Besides distinguishing between prospective and retrospective 1993). study designs, it is important to examine whether the study design More complex is the situation when causes of intercurrent was experimental or observational. In experimental designs, indi- death (e.g., due to death from pneumonia or infectious diseases) viduals are exposed to predetermined doses where the numbers are not independent from the development of tumours and/or of individuals per dose group are usually balanced. Such dose infor- death from tumours of interest. Such competing risks complicate mation is different from that from cross-sectional or survey de- statistical analyses (see e.g. Cohen et al., 2000; Satagopan et al., signs where a large sample or a whole population is screened for 2004) and additional uncertainty remains with the BMD analysis exposure and where the observed effects are recorded. As a rule, when performed for the chosen critical tumour endpoint only. No- individual exposure is dispersed over a very large range of doses, tice, however, that this problem is neither specific to the BMD ap- where only a few individuals share the same dose. At the extreme, proach nor to carcinogenicity as critical endpoint but occurs at the all dose groups could have the size n = 1. At the same time, the re- evaluation of any critical endpoint when its occurrence competes sponses in observational studies may exhibit a large variability and with the occurrence other effects which potentially probit its are often only reported as data aggregated in a few dose groups, observation during the long-term bioassay. Time-to-tumour mod- e.g., five quintiles where the lowest quintile group serves as refer- els and calculation of adjusted incidences could be used for estima- ence group of background exposure but may combine low exposed tion of the BMD and calculating the BMDL (see e.g., Gart et al., and unexposed individuals. Aggregation affects also the highest 1986). Recently, US EPA issued therefore the software ‘‘MSW’’ dose group, which may comprise a wide dose range. (‘‘Multistage Weibull Time-to-Tumour Model’’) for time-to-tumour Not all study designs reporting human data provide sufficient data, see (EPA, 2012a)). data to apply the BMD approach. This is, in particular, the case for epidemiological case-control studies whereby design cancer 3.8.3. Combination of studies cases are recorded first and later matched with non-diseased con- Advantages and the prerequisites for combining studies have trols for calculating odds ratios for risk identification. This is in already been addressed above in the context of calculating a com- contrast to the aims and the type of data generated for risk charac- mon BMDL from several studies. Combination is, in particular, an terization. Studies reporting odds ratios may, therefore, only be issue when the amount of available data in single studies is limited used for the BMD approach when the data available describe both and substudies or subgroups (e.g., males and females) can be com- the prevalence and the exposure for the whole population and bined such that a larger database can be used so that the BMD ap- when the study provides sufficient details on its design and reali- proach has higher power and the BMD estimate is more precise. As zation. Unfortunately, such information is rarely available and, as a consequence of combining studies, the confidence interval of the such, case-control studies may usually not be suitable for the BMD would become narrower and the BMDL values become higher BMD approach. than when calculated for each subgroup separately. Furthermore, Finally, carcinogenic response in human studies may be con- statistical variation of the BMD estimates between the different founded by factors which interfere with dose–response much more studies could be evaluated statistically in that joint analysis. How- than what is known from well-designed experimental studies. ever, the risk assessor may wish to examine whether it is appropri- Here, co-exposure to other chemicals may be a prominent factor. ate to combine the dose–response analysis of subgroups on Failure to take confounding factors into account may result in biological and statistical grounds. The preferred BMD analysis to either under- or overestimation of the BMD and, as such, lead to combine subgroups would then be a dose–response analysis with biased BMDL values. The inclusion of confounding factors in the covariates that describe the subgrouping, i.e. a covariate adjusted BMD approach requires multivariable regression modeling where analysis. If subgroups differ only in the background response or the availability of the appropriate software becomes critical. It maximum potencies, a covariate adjusted analysis would result should also be noted that unknown factors are more frequent in in a higher overall BMDL value than when choosing the lowest sin- human studies and generate additional variability, in particular, gle study BMDL. Nevertheless, risk assessors may be advised to ex- when compared to experiments where the subjects are prospec- plore any differences between the BMDLs of the single studies and tively randomized to dose groups. judge if other sources of variation (e.g., differences in design and In summary, dose range and study design, response/effect type, conduct of the study) would not exclude a combined analysis. risk specification, effect size determination, model selection, and One should also note that a covariate adjusted BMD analysis inves- quality assessment of the outcome are important issues when tigates several data sets together and examines during the analysis the BMD approach is applied to human data (EFSA, 2011b). Simi- in which aspects (specified by model parameters) the data can be larities between analysing human and animal data were found combined or not. Therefore, it does not automatically lead to one for handling the response type, the effect size determination and single ‘‘combined ‘‘ BMDL value but to a (small) set of BMDL values quality assessment. One may use lower BMRs for human tumour from which an overall PoD has to be selected, e.g., by using the or cancer incidence data than the default BMR = 10%, but that lowest of such BMDLs again. should be on a case-by-case basis.

3.9. Applying the BMD approach to human data 4. Dealing with uncertainty The BMD approach is a method sufficiently general to analyse dose–response data from animals, humans, as well as ecological Vastly different PoDs and MoEs may be generated, depending or environmental data. An obvious reason for its usefulness when on the selection and the analysis of carcinogenicity data (Benford applied to human data is that then interspecies-extrapolation be- et al. 2010). This implies substantial uncertainty regarding the comes unnecessary for risk characterisation; see also EFSA MOE, which needs to be characterised by the risk assessor so that (2009b, 2011b). Furthermore, when human information is avail- it is transparent and can be taken into account by risk managers able one may be able to model the dose–response in sensitive (Codex Alimentarius Commission, 2011; EFSA, 2009a). Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 18 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx

According to Benford et al. (2010), key issues causing uncer- Table 1 tainty include the biological relevance of an observed tumour type Proposed terms for expressing different degrees of likelihood (from Mastrandrea et al., 2010). for humans, the quality of data for deriving a BMDL, the dose range considered and the selection of a BMR. These issues cause uncer- Standard term Probability tainty at two key steps in the derivation of an MOE for a genotoxic Virtually certain 99–100% probability carcinogen: when deciding which tumours and studies to consider Very likely 90–100% probability and when interpreting the resulting PoDs and MOEs. The following Likely 66–100% probability About as likely as not 33–66% probability sections describe approaches for addressing uncertainty in these Unlikely 0–33% probability two key steps of the assessment. In the first step, the uncertainties Very unlikely 0–10% probability relate to categorical questions – e.g., whether the MoA causing a Exceptionally unlikely 0–1% probability particular tumour involves genotoxicity. This requires weighing the relevant evidence and associated uncertainties to assess the likelihood that genotoxicity is involved. In the second step, the Defining a standard scale of likelihood terms quantitatively, in uncertainty relates to a quantitative question, namely which val- terms of probabilities, should improve the consistency of their ues to take for the BMD and BMDL. The latter are continuous vari- use and interpretation. An example of such a scale is the approach ables estimated by dose–response modelling that vary according to established by the Intergovernmental Panel on Climate Change which choices are made, in particular, regarding the choice of mod- (IPCC, 2005) reproduced in Table 1 (see e.g. Mastrandrea et al., el, transformations of doses and responses, the confidence level 2010). Note that the probability intervals are overlapping (e.g., chosen, and the size of the effect level that is defined as the critical the range 99–100% is a subset of 90–100% and both are subsets BMR. This requires identifying relevant sources of uncertainty and of the range 66–100%). evaluating their impact on estimation of the BMD. Different ap- We propose that the scale in Table 1 could be used for express- proaches are required for evaluating uncertainty for these two ing uncertainty about whether the MoA for a tumour involves types of question (Hart et al., 2010), as described below. Both ap- genotoxicity, keeping in mind that a chemical can have both geno- proaches are illustrated in case studies on Sudan 1 and PhIP, in An- toxic and epigenetic effects. To avoid confusion with informal use nexes 2 and 3 respectively. of the same terms in other contexts, it would be advisable to cap- italise the first letters of each word when using the probability scale, and present it together with its defined probability range, 4.1. Uncertainty about whether the MoA for a tumour involves e.g., ‘Very Likely (90–100%)’. If an assessor wishes to express a genotoxicity range of probabilities that is not represented by any of the defined terms (e.g. 60–80%), they can do so numerically. Toxicologists typically address questions about MoA using a Whether the result is expressed quantitatively or qualitatively, weight of evidence approach, weighing the evidence for and expressing uncertainty in a weight of evidence conclusion requires against a genotoxic MoA (Boobis et al., 2008). Transparency re- a subjective probability judgement. Probabilities assessed in this quires clear documentation of the weight of evidence assessment. way should be reported as such, to distinguish them from statisti- The approach proposed here is to summarise the reasoning in a cal probabilities calculated from data. tabular format, listing the lines of evidence, identifying their Assessors may use various strategies to judge probability. If strengths and uncertainties, and showing how they contribute to using a scale of defined probability terms (e.g. Table 1), the the overall conclusion. The approach has been adapted from that assessor could simply consider which term best describes their suggested by Hart et al. (2010), which is similar in concept to the assessment of the weight of evidence. A more structured ap- graphical ‘evidence maps’ proposed by Schütz et al. (2008) and proach is to start by identifying a baseline or ‘prior’ probability the weight of evidence approach of Suter and Cormier (2011). based on the assessor’s knowledge of similar assessments, and Transparency also requires that an expression of uncertainty is then adjust this upwards or downwards to reflect the evidence included when communicating the outcome of the weight of evi- specific to the current assessment. A third option, which is espe- dence assessment. Whether the MoA for a tumour involves geno- cially suitable for novel problems where precedents are lacking, toxicity is a ‘‘yes/no’’ question. To express uncertainty, the is to start with a very wide baseline probability (e.g. 0–100%) assessor needs to indicate the relative likelihood of these two alter- and gradually narrow it down by considering the available native answers. Probability is the natural scale for expressing this evidence. relative likelihood. In this context, it is used to represent the degree If the chemical belongs to a class of chemicals which are known to which one proposition is considered more likely than the other. to have a higher or lower proportion with genotoxic MoA, then this For example, stating that there is an 80% chance that the MoA for a proportion may be taken as the baseline probability. The baseline particular tumour involves genotoxicity implies a 20% chance for a probability can then be adjusted upwards or downwards for the non-genotoxic MoA and means that the MoA is judged to be four chemical under assessment, based on the evidence available. If times more likely to involve genotoxicity than not. This is a form the substance-specific data used in the assessment are strong of expression that is common in some areas of forecasting, e.g., and unequivocal, then the baseline probability will have negligible the ‘chance of rain tomorrow’ in some television weather forecasts. influence on the outcome. An alternative is to express likelihood using verbal scales (e.g., low, Naturally, assessors will be uncertain when making subjective medium, high, etc.). However, such terms are interpreted in differ- probability judgements as described above, due to limitations they ent ways by different people (see (Theil, 2002) for a meta-analysis recognise in the available information and their own expertise. of relevant studies). This makes it difficult to ensure that the verbal This uncertainty should be expressed, for example by giving a expression of likelihood is interpreted by risk managers, or the range for the estimated probability rather than a point estimate. public, in the way that was intended by the assessor. It is even dif- In making an assessment, all relevant evidence should be con- ficult to ensure that different assessors working together (e.g., on sidered. This will include experimental or observational studies an expert committee) interpret the same terms in the same way. as well as other types of evidence, such as theoretical or experi- This problem cannot be resolved by using standard terms with ver- mental evidence on the MoA. ‘Line of evidence’ is used below as bal definitions, as this simply transfers the problem of interpreta- a general term to refer to any type of relevant evidence. When tion to the definitions. there are large numbers of studies, it may be sufficient to Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 19

Table 2 Tabular format proposed for summarising weight of evidence assessments of the probability that a tumour is caused by a MoA involving genotoxicity. Symbols and terms used must be defined in the table legend, or accompanying text or tables (see text above). Numbers in square brackets, e.g. [1], refer to steps in the description of the approach (see text above).

Overall question: insert question text here (e.g. was tumour type X in study Y caused Influence on conclusion by a MOA involving genotoxicity?) [1] Baseline assessment using evidence on other chemicals of the same class (when Enter here your baseline probability range for the assessment, as a percentage available and appropriate) [4]. (X–Y%) Summarise here the evidence and reasoning for your baseline probability.

Line of evidence 1 – summarise the line of evidence as concisely as possible, Enter here your assessment of the influence of the line of evidence on your including its key strengths, weaknesses and uncertainties [2]. assessment of the probability, using the symbols defined in the text (e.g. "") [3]

Insert rows for additional lines of evidence as needed [2]

Overall conclusion: Insert here a brief summary of the reasoning that led to your Express here your conclusion in the form of a probability range, and the conclusion on the probability of the tumour in question being caused by an MOA associated standard phrase if applicable (see Table 1) [5,6] involving genotoxicity. [7] summarise similar studies with similar conclusions as a single line as a separate row in the table, together with a brief summary of evidence, provided that any additional weight implied by the of the evidence it is based on. consistency of their results is taken into account. 5. Make a judgement about the overall answer to the question to Finally, it is important to document the assessment in a trans- be addressed, taking careful account of all the studies or lines of parent way, so that others can see what evidence was considered evidence. This judgement should not be done by any simplistic and how it was evaluated. For this purpose, it is recommended aggregation, such as counting the numbers of symbols. If using to summarise the assessment in a tabular format, as illustrated a baseline probability, start with this and consider how much it in Table 2. The table content should be concise, providing a readily might be changed by each line of evidence in turn to arrive at accessible overview of the assessment, and is not a substitute for your final probability. Guard against the natural tendency to more detailed discussion of the type normally found in toxicolog- anchor too strongly on your baseline probability. If not using ical assessments, which should be provided in the text accompany- a baseline probability, review your evaluation of the individual ing the table. lines of evidence and express your judgement of the probability Based on the above considerations, the following approach is that the tumour was caused by a genotoxic MoA. Alternatively, proposed: assume a baseline probability of 0–100% (=no information), and consider how much it might be changed by each line of evi- 1. Write the question to be addressed in the header of the assess- dence in turn to arrive at a final probability. Give a range rather ment table (Table 2). The question considered here is, whether a than a point estimate for the final probability, to express your particular tumour type seen in a particular study was caused by uncertainty about the judgements you have made (e.g., due to a MoA involving genotoxicity. limitations in the information available, or in your expertise 2. Identify lines of evidence that contribute to answering the on the issues in question). question. List them in the table template, using the format 6. There is a natural tendency to be overly-confident in expert shown in Table 2. Keep the table text brief: summarise each line judgements. Therefore, review your final probability, and con- of evidence as concisely as possible, including its key strengths, sider whether the range should be wider. Ask yourself why weaknesses or uncertainties. If a more detailed narrative is higher or lower probabilities are not reasonable, adjust your required, provide this in text sections accompanying the table. range if appropriate and document your reasoning. 3. Evaluate how each line of evidence, would influence the 7. Enter your final assessment of the probability range in the bot- judgement of the probability that the tumour was caused tom right hand cell of the table. In the cell to the left of this, by a MoA involving genotoxicity, relative to the prior proba- enter a brief summary of the reasoning that led you to this esti- bility. Record that evaluation in the right hand column of the mate. Optionally, present the final conclusion using a standard table. We suggest using symbols for this purpose: up arrows verbal phrase for a defined range of probabilities, from Table 1. for lines of evidence, which push your probability towards 100% and down arrows for those pushing towards 0%. The meaning of the symbols must be defined, to communicate """ or strong influence on the probability the assessment as clearly as possible. It is suggested to define ;;; them on an ordinal scale, e.g., as representing a small, med- "" or ;; intermediate influence on the probability ium or large influence on the probability such as: The sym- " or ; minor influence on the probability bols provide an indication of the relative impact of the negligible influence in either direction different lines of evidence, to help the assessor and others ;/" (or other pairs of symbols) influence is uncertain form judgements on the overall conclusion. It is also impor- but within the indicated range tant to indicate when the contribution of a line of evidence is uncertain (e.g. ;/"), or is negligible (). 4. In assessments, where read-across from other chemicals in the Examples of this approach may be found in the accompanying same class is considered useful, estimate the proportion of case studies (Annexes 2 and 3). those chemicals in the class, which are considered to have Note that the approach described in this section can be applied genotoxic MoA and use this proportion as baseline probability. to other categorical questions involved in the assessment of geno- Take account of any uncertainty affecting this judgement by toxicity, e.g., whether a tumour observed in rodents is relevant to expressing the probability as a range (e.g. 30–50%). Enter this humans (see Section 2). Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 20 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx

Table 3 Tabular format recommended for evaluating unquantified uncertainties affecting the derived BMDL. If symbols or words are used in the right hand column they must be defined in the table legend, in accompanying text or in a diagram. Letters in square brackets, e.g. [a], refer to steps in the description of the approach (see above).

Sources of uncertainty additional to those represented by the BMDL Evaluation of uncertainty

Uncertainty 1: very briefly describe the uncertainty and your evaluation of how different the BMDL Record here your evaluation as a range of numbers, symbols or might be if this uncertainty was resolved or corrected for [a] words [b,c,f] Insert more rows for additional uncertainties, as needed [a]

Overall assessment: verbal description of your assessment of the overall unquantified uncertainty Record here your evaluation of the overall unquantified affecting the BMDL and a very brief explanation of how it is derived from the individual uncertainties uncertainty as a range of numbers, symbols or words [d,f] [e]

4.2. Uncertainty about the value of the BMD for a given tumour carcinogen. The aim of this approach is to evaluate how different the BMDL might be if it were possible to resolve or correct for all Uncertainties affecting the estimation of the BMD may be char- the additional uncertainties that are not quantified by the BMDL it- acterised by identifying the potential range of values the BMD self. As for the categorical questions (Section 4.1), it is important to could take, and their relative likelihoods. This may be done quan- document the assessment of uncertainties in a transparent way. titatively, qualitatively, or by a combination of these methods. For this purpose, it is recommended to use a tabular format, as The BMD approach accounts for the statistical uncertainty of illustrated in Table 3. the dose–response data from a carcinogenicity study by calculating Steps for assessing additional uncertainties affecting the BMDL the lower 95% confidence bound (BMDL) of the BMD (see Section 3). are as follows: The width of the corresponding two-sided 90% confidence interval (BMDL, BMDU) quantifies the uncertainty arising from variability a. Systematically consider all aspects of the study that gener- of the underlying data. Aspects of study design, e.g., the choice of ated the data and the methods used to calculate the BMDL, the number and location of dose levels and also the sample size and identify any potential sources of uncertainty, i.e., ele- at each dose level, are covered by the BMDL and BMDU insofar ments of the design, conduct and evaluation of a study or as the fitted models allow. analysis of the data that might have affected the calculated More general aspects of study design and quality may add to BMDL. List the uncertainties in a table, using the template the uncertainty of the BMD but are not quantified by the BMDL. shown in Table 3. Further sources of uncertainty are due to assumptions made during b. Consider each tabulated source of uncertainty in turn, and the modelling itself. Uncertainty based on the selection of the BMR evaluate how much the BMDL might change if that uncer- can be assessed by repeating the calculations with alternative BMR tainty was resolved or corrected for. For example, if the values, e.g., 5% or 1%, provided that no other criteria are violated actual concentrations in test diet were thought to be signifi- (see Section 3.4). Further modelling uncertainty arises from the cantly lower than the reported values, consider how much choice of models based on the specific criteria used to find accept- lower the BMDL might be if it was calculated using the actual able models and to select a study BMDL (see Section 3.6), and could concentrations. Express your judgement about this by using be assessed by repeating the calculations with alternative criteria. pairs of numbers, symbols or words to cover the range in If model averaging is used, this captures uncertainty about which which you are reasonably sure the adjustment for each of the models that are averaged may be correct, but further uncer- uncertainty would lie. For example, giving a range of 0.5– tainty may remain about the potential relevance of other possible 3 for a particular uncertainty would represent a judgement models that are not included in the ‘model space’ for averaging. that resolving this uncertainty would change the estimated Other uncertainties may arise on a case by case, from specific is- BMDL by a factor between 0.5 and 3, i.e. the BMDL might sues affecting the studies available for the assessment (for some be reduced by a factor of 2 or increased by a factor of 3. examples, see the case studies in Annexes 2 and 3). In principle, Record your evaluations for all the identified uncertainties it would be possible to include such additional uncertainties in in the right hand column of the table (for a template, see the calculation of the BMDL if they can be quantified probabilisti- Table 3), and add to the left hand column a very brief expla- cally, but this requires specialised statistical methods that are not nation of the basis for your estimate of the impact of each included in the software currently used for calculating BMDs. If the uncertainty. Where more detailed explanation is needed, this quality of the study is too poor to support any estimate of the BMD, should be provided in text sections accompanying the table. it should be excluded from the assessment (see Section 3). c. If symbols or words are used to express the impact of the Every assessment will leave some uncertainties unquantified, so uncertainties, it is important to define their meaning, prefer- these need to be considered qualitatively or semi-quantitatively. ably using a quantitative scale, so that the assessor’s evalua- Methods for this have been reviewed by Hart et al. (2010). Several tion of the uncertainties is transparent and can be of these involve listing the uncertainties in a table together with interpreted by others without ambiguity. When defining scores or narratives which characterise their individual and com- the scale, make it wide enough to accommodate the largest bined impacts on the quantitative estimates in question (e.g. ECHA, uncertainties in the assessment. Set the intervals for different 2008; EFSA, 2006; EPRI, 2009). These are the basis of the approach symbols in a way that you find helpful to differentiate smal- recommended below, for expressing uncertainties associated with ler and larger uncertainties. A multiplicative (logarithmic) the derivation of the BMD and BMDL for a potential genotoxic scale is likely to be appropriate. An example is shown below.

Image 1. Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 21

d. Review all the evaluated uncertainties in the table and form overall uncertainty is expressed partly by the BMDL and a judgement about their overall, combined impact, i.e. how partly by your evaluation of unquantified uncertainties. Both different the BMDL might be, if all the uncertainties were should be taken into account in subsequent steps of the resolved. This should not be done by any simplistic aggrega- assessment. It is not proposed to use the evaluation of tion, such as counting the numbers of symbols. Consider unquantified uncertainties to adjust the BMDL. Instead, the carefully how the different uncertainties would combine to evaluation of unquantified uncertainties should be taken change the BMDL, i.e. if all the additional uncertainties were into account when interpreting the MOE. resolved, what is the greatest range in which the ‘corrected’ BMDL might lie. As part of this, it is important to consider 4.3. Evaluation of uncertainty in different parts of the assessment potential dependencies between different uncertainties. In particular, is it reasonable to aggregate the maximum mag- The approaches described in Sections 4.1 and 4.2 can be used to nitudes of all the uncertainties, or would that exaggerate evaluate uncertainties affecting different steps in selecting the PoD their combined effect? Express your assessment of the com- for a MoE assessment, as illustrated in Fig. 1. This includes uncer- bined uncertainty using numbers or symbols, in the same tainties regarding the MoA of observed tumours and the derivation way as the individual uncertainties (e.g. 0.5–5, 0.5 times of a PoD, as discussed above, but also other uncertainties, e.g., to 5 times, or /++). If the overall uncertainty is large, regarding the human relevance of tumours observed in rodents expressing it may require extension of the scale. (see Section 2). Furthermore, it may be helpful to break down e. Also express the outcome in words as a short narrative, the assessment of MoA into several steps. In case studies on Sudan including a very brief explanation of how you derived it from 1 and PhIP (Annexes 2 and 3), it was found helpful to consider first the individual uncertainties. This will assist readers in whether the chemical is directly genotoxic in general and then understanding your conclusion, and may also be useful to consider, as a second step, whether particular observed tumours include in the conclusion or summary of the assessment. were caused by a MoA involving genotoxicity. Any terms used to express a relative likelihood or subjective In many carcinogenicity studies, the sites where a genotoxic probability in a narrative (e.g. ‘unlikely’) should be defined carcinogen causes tumours are not necessarily concordant be- (e.g., see Table 1) to ensure the meaning of your conclusion tween species. For this reason, a tumour observed in a tissue not is as unambiguous as possible. Where more detailed expla- relevant to humans may nevertheless be relevant as a measure of nation is needed, this should be provided in text sections potency for causation of tumours in other sites. Therefore, a MoE accompanying the table. should be calculated unless none of the observed tumours is con- f. There is a natural tendency to by overly-confident in expert sidered relevant to humans. The likelihood of this can be consid- judgements. Therefore, before finalising your assessment, ered after assessing the human relevance of each tumour review your evaluation of each individual uncertainty and individually, as indicated in Fig. 1. the overall uncertainty, and consider whether any of the ranges should be wider. Adjust your assessment if appropri- ate and document your reasoning in the Table. If the uncer- 5. Conclusions tainty is very wide, it is important to make this transparent by assigning a wide range in the assessment (e.g. / The above sections provide practical guidance on the selection +++). of data for deriving a BMDL and help address the concern raised g. Present the evaluation of unquantified uncertainties by Benford et al. (2010) with respect to the derivation of the BMDL together with the BMD and BMDL and explain that the and subsequent calculation of the MoE. Incorporating this frame-

Fig. 1. Illustration of different steps where uncertainty may be evaluated when selecting a PoD for use in calculating a MoE for a specific tumour identified as critical in carcinogenesis bioassays. Plus and minus symbols indicate the influence of evidence on probabilities at each step, and the influence of one probability on the next. Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 22 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx work into standard practice for BMD data selection and modelling, This publication was coordinated by Mr Massimo Ambrosio, Scien- along with a transparent evaluation of the uncertainties surround- tific Project Manager at ILSI Europe. The expert group received ing the data selection and derived PoD, will increase the consis- funding from the ILSI Europe Risk Assessment of Genotoxic Carcin- tency and reliability of assessments and improve the basis for ogens Task Force. Industry members of this task force are listed on risk management. the ILSI Europe website at www.ilsi.eu. For further information A suggested framework for incorporating the above advice into about ILSI Europe, please email [email protected] or call +32 2 any assessment is outlined in Box 3. 771 00 14. The opinions expressed herein and the conclusions of this publication are those of the authors and do not necessarily represent the views of ILSI Europe nor those of its member Box 3 Suggested framework for selection of data for Bench- companies. mark Dose modeling and derivation of a Margin of Exposure. Appendix A. Supplementary material 1. Does animal carcinogenicity data of sufficient quality exist for the substance that could potentially allow Supplementary data associated with this article can be found, in modelling and derivation of a BMDL? the online version, at http://dx.doi.org/10.1016/j.fct.2013.10.030. – Are there one or more studies conducted in accor- dance with appropriate guidelines with a clear dose–response relationship? (see Section 3) References 2. If not, are the data sufficient to put a lower bound on the PoD? Alison, R., Maronpot, R., 1987. Comparative Ovarian Pathology Conference: Papers 3. Assessment of whether the tumour data are relevant Presented at a Round Table Conference held in Research Triangle Park on September 24–25, 1985. Research Triangle Park, pp. 5–130. and appropriate for assessing human cancer risk Alison, R.H., Capen, C.C., Prentice, D.E., 1994. Neoplastic lesions of questionable – Assessment of mode of action – How likely is a direct significance to humans. Toxicol. Pathol. 22, 179–186. DNA reactive mechanism for this substance? Amemiya, K., Kudoh, M., Suzuki, H., Saga, K., Hosaka, K., 1984. Toxicology of mabuterol. Arzneimittelforschung 34, 1680–1684. – Strength of evidence based on structural alerts, Apperley, G.H., Brittain, R.T., Coleman, R.A., Kennedy, I., Levy, G.P., 1978. in vitro studies, in vivo studies & read-across from Characterization of the beta-adrenoceptors in the mesovarium of the rat related substances – How likely is it that the [proceedings]. Br. J. Pharmacol. 63, 345P–346P. Ashby, J., Tennant, R.W., 1991. Definitive relationships among chemical structure, observed tumours were caused by a non-threshol- carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. ded genotoxic mechanism? Mutat. Res 257, 229–306. – How likely are the observed tumours to be of rele- Autian, J., 1973. The new field of plastics toxicology–methods and results. Crit. Rev. Toxicol. 2, 1–40. vance to humans? (see Section 2) Baer, A., 1992. Significance of Leydig cell neoplasia in rats fed lactitol or lactose. J. – Take account of the uncertainties affecting these Am. Coll. Toxicol. 11, 189–207. judgements (see Section 4) Baer, A., 1988. Sugars and adrenomedullary proliferative lesions: the effects of 4. Modelling of appropriate tumour data & ranking of lactose and various polyalcohols. Int. J. Toxicol. 7, 71–81. Baetcke, K.P., Hard, G.C., Rodgers, S.R., McGaughy, R.E., Tahan, L.M., 1991. Alpha2u- appropriate BMDLs Globulin: Association with Chemically Induced Renal Toxicity and Neoplasia in – Model all relevant datasets, including individual the Male Rat. Environmental Protection Agency Office of Research and treatment-related tumours and the indicence of Development, Washington, DC, USA. Bailer, A.J., Portier, C.J., 1988. Effects of treatment-induced mortality and tumor- treatment-related tumours (see Section 3) induced mortality on tests for carcinogenicity in small samples. Biometrics 44, – Rank BMDLs based on magnitude and statistical 417–431. model fit (exclude those where the data are inade- Barsoum, N.J., Moore, J.D., Gough, A.W., Sturgess, J.M., de, l.I., 1985. Morphofunctional investigations on spontaneous pituitary tumors in Wistar quate to make inferences about the dose–response rats. Toxicol. Pathol. 13, 200–208. relationship) (see Section 4) Benford, D., Bolger, P.M., Carthew, P., Coulet, M., DiNovi, M., Leblanc, J.C., Renwick, – Additional uncertainty – do additional uncertainties A.G., Setzer, W., Schlatter, J., Smith, B., Slob, W., Williams, G., Wildemann, T., 2010. Application of the Margin of Exposure (MOE) approach to substances in exist that may affect the BMD/BMDL? (see Section 4) food that are genotoxic and carcinogenic. Food Chem. Toxicol. 48 (Suppl. 1), pp. 5. Selection of the BMDL for calculating the MoE S2–24. – Select the most appropriate BMDL to serve as the Beral, V., 2003. Breast cancer and hormone-replacement therapy in the Million Women Study. Lancet 362, 419–427. PoD for calculating an MoE. This may not necessar- Berkvens, J.M., Van Nesselrooy, J., Kroes, R., 1980. Spontaneous tumours in the ily be the lowest BMDL, depending on the likelihood pituitary gland of old Wistarrats: a morphological and immunocytochemical of genotoxic MOA, relevance to humans and uncer- study. J. Pathol. 130, 179–191. tainties in the modelling of the BMD/BMDL. Betton, G.R., Dormer, C.S., Wells, T., Pert, P., Price, C.A., Buckley, P., 1988. Gastric ECL- cell hyperplasia and carcinoids in rodents following chronic administration of – Take account of any additional uncertainties affect- H2-antagonists SK&F 93479 and oxmetidine and omeprazole. Toxicol. Pathol. ing the BMD/BMDL (identified in step 4) when inter- 16, 288–298. preting the MoE. Betton, G.R., Salmon, G.K., 1984. Pathology of the forestomach in rats treated for 1 year with a new histamine H2-receptor antagonist, SK&F 93479 trihydrochloride. Scand. J Gastroenterol. (Suppl. 101), 103–108. Bieler, G.S., Williams, R.L., 1993. Ratio estimates, the delta method, and quantal response tests for increased carcinogenicity. Biometrics. 49, 793–801. Boelsterli, U.A., Zbinden, G., 1985. Early biochemical and morphological changes of the rat adrenal medulla induced by xylitol. Arch. Toxicol. 57, 25–30. Boobis, A.R., Cohen, S.M., Dellarco, V., McGregor, D., Meek, M.E., Vickers, C., Conflict of Interest Willcocks, D., Farland, W., 2006. IPCS framework for analyzing the relevance of a cancer mode of action for humans. Crit. Rev. Toxicol. 36, 781–792. The authors declare that there are no conflicts of interest. Boobis, A.R., Doe, J.E., Heinrich-Hirsch, B., Meek, M.E., Munn, S., Ruchirawat, M., Schlatter, J., Seed, J., Vickers, C., 2008. IPCS framework for analyzing the relevance of a noncancer mode of action for humans. Crit. Rev. Toxicol. 38, 87– Acknowledgments 96. Brand, K.G., Johnson, K.H., Buoen, L.C., 1976. Foreign body tumorigenesis. Crit. Rev. Toxicol. 4, 353–394. This work was conducted by an expert group of the European Bryant, H., Brasher, P., 1995. Breast implants and breast cancer – reanalysis of a branch of the International Life Sciences Institute (ILSI Europe). linkage study. N. Engl. J. Med. 332, 1535–1539. Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 23

Burkitt, M.D., Pritchard, D.M., 2006. Review article: pathogenesis and management Dunsford, H.A., Maset, R., Salman, J., Sell, S., 1985. Connection of ductlike structures of gastric carcinoid tumours. Aliment. Pharmacol. Ther. 24, 1305–1320. induced by a chemical hepatocarcinogen to portal bile ducts in the rat liver Caillard, S., Dharnidharka, V., Agodoa, L., Bohen, E., Abbott, K., 2005. Posttransplant detected by injection of bile ducts with a pigmented barium gelatin medium. lymphoproliferative disorders after renal transplantation in the United States in Am. J. Pathol. 118, 218–224. era of modern immunosuppression. Transplantation 80, 1233–1243. ECHA, 2008. Guidance on Information Requirements and Chemical Safety Caldwell, D.J., 1999. Review of mononuclear cell leukemia in F-344 rat bioassays Assessment. Chapter R.19. Uncertainty Analysis. and its significance to human cancer risk: a case study using alkyl phthalates. EFSA, 2006. Guidance of the Scientific Committee on a Request from EFSA Related to Regul. Toxicol. Pharmacol. 30, 45–53. Uncertainties in Dietary Exposure Assessment. EFSA Journal, 438. Capen, C.C., 1997. Mechanistic data and risk assessment of selected toxic end points EFSA, 2009a. Guidance of the Scientific Committee on Transparency in the Scientific of the thyroid gland. Toxicol. Pathol. 25, 39–48. Aspects of Risk Assessment carried out by EFSA. Part 2: General Principles, Capen, C.C., 1998. Correlation of mechanistic data and histopathology in the European Food Safety Authority, 1051. evaluation of selected toxic endpoints of the endocrine system. Toxicol. Lett. EFSA, 2009b. Use of the Benchmark Dose Approach in Risk Assessment. EFSA 102–103, 405–409. Journal, 1150. Capen, C.C., 2001. Overview of structural and functional lesions in endocrine organs EFSA, 2011a. European Food Safety Authority Scientific Committee’s Opinion on of animals. Toxicol. Pathol. 29, 8–33. Genotoxicity Testing Strategies Applicable to Food and Feed Safety Assessment. Carter, R.L., 1970. Induced Subcutaneous. Sarcomata: their development and critical EFSA, 2011b. Use of BMDS and PROAST Software Packages by EFSA Scientific Panels appraisal. In: Roe, F.J.C. (Ed.), Metabolic Aspects of Food Safety. Oxford, pp. 569– and Units for applying the Benchmark Dose (BMD) Approach in Risk 591. Assessment, European Food Safety Authority, Parma, EN-113. Carthew, P., DiNovi, M., Setzer, R.W., 2010a. Application of the margin of exposure EFSA, 2012. Guidance on selected default values to be used by the EFSA Scientific (MoE) approach to substances in food that are genotoxic and carcinogenic: Committee, Scientific Panels and Units in the absence of actual measured data. example furan (CAS No. 110-00-9). Food Chem. Toxicol. 48 (Suppl. 1), S69–S74. EFSA Journal 10, 2579. [32 pp.]. Chen, D., Hakanson, H., 2003. Mechanism of Gastric ECL-Cell Tumour Induction: Eiben, R., 2001. Frequency and time trends of spontaneous tumors found in B6C3F1 Observations in Rats. International Agency for Research on Cancer, Lyon, France, mice oncogenicity studies over 10 years. Exp. Toxicol. Pathol. 53, 399–408. pp. 121–131. Ekman, L., Hansson, E., Havu, N., Carlsson, E., Lundberg, C., 1985. Toxicological Chesterman, F.C., Harvey, J.J., Dourmashkin, R.R., Salaman, M.H., 1966. The studies on omeprazole. Scand. J. Gastroenterol. (Suppl. 108), 53–69. pathology of tumors and other lesions induced in rodents by virus derived Elmore, L.W., Sirica, A.E., 1993. ‘‘Intestinal-type’’ of adenocarcinoma preferentially from a rat with Moloney leukemia. Cancer Res. 26, 1759–1768. induced in right/caudate liver lobes of rats treated with furan. Cancer Res. 53, Chlebowski, R.T., Hendrix, S.L., Langer, R.D., Stefanick, M.L., Gass, M., Lane, D., 254–259. Rodabough, R.J., Gilligan, M.A., Cyr, M.G., Thomson, C.A., Khandekar, J., Elwell, M.R., Dunnick, J.K., Hailey, J.R., Haseman, J.K., 1996. Chemicals associated Petrovitch, H., McTiernan, A., 2003. Influence of estrogen plus progestin on with decreases in the incidence of mononuclear cell leukemia in the Fischer rat. breast cancer and mammography in healthy postmenopausal women: the Toxicol. Pathol. 24, 238–245. Women’s Health Initiative Randomized Trial. JAMA 289, 3243–3253. EMEA, 2006. ICH Topic S1B Carcinogenicity: Testing for Carcinogenicity of Clapper, M.L., Cooper, H.S., Chang, W.C., 2007. Dextran sulfate sodium-induced Pharmaceuticals. Note for Guidance on Carcinogenicity CPMP/ICH/299/95. colitis-associated neoplasia: a promising model for the development of EPA, 2005. Guidelines for Carcinogen Risk Assessment. In: Risk Assessment Forum, chemopreventive interventions. Acta Pharmacol. Sin. 28, 1450–1459. US Environmental Protection Agency. Clayson, D.B., Fishbein, L., Cohen, S.M., 1995. Effects of stones and other physical EPA, About the MSW Time to Tumor Model, 2012a. US Environmental Protection factors on the induction of rodent bladder cancer. Food Chem. Toxicol. 33, 771– Agency. Ref Type: Electronic Citation. 784. EPA 2012b, Benchmark Dose Technical Guidance. Codex Alimentarius Commission, 2011. Working Principles for Risk Analysis in the EPRI, 2009. Treatment of Parameter and Model Uncertainty for Probabilistic Risk Framework of Codex Alimentarius. Assessments, EPRI. Cohen, S.M., 1998. Urinary bladder carcinogenesis. Toxicol. Pathol. 26, 121–127. Farber, E., 1963. Ethionine carcinogenesis. Adv. Cancer Res. 7, 383–474. Cohen, S.M., 1995. Role of urinary physiology and chemistry in bladder Fath Jr., R.B., Deschner, E.E., Winawer, S.J., Dworkin, B.M., 1984. Degraded carcinogenesis. Food Chem. Toxicol. 33, 715–730. carrageenan-induced colitis in CF1 mice: a clinical, histopathological and Cohen, S.M., 2004. Human carcinogenic risk evaluation: an alternative approach to kinetic analysis. Digestion 29, 197–203. the two-year rodent bioassay. Toxicol. Sci. 80, 225–229. FDA, 2008. Guidance for Industry: S1C(R2) Dose Selection for Carcinogenicity Cohen, S.M., Arnold, L.L., Cano, M., Ito, M., Garland, E.M., Shaw, R.A., 2000. Calcium Studies, US Department of Health and Human Services, Food and Drug phosphate-containing precipitate and the carcinogenicity of sodium salts in Administration, Center for Drug Evaluation and Research. rats. Carcinogenesis 21, 783–792. Fisher, J.L., Schwartzbaum, J.A., Wrensch, M., Wiemels, J.L., 2007. Epidemiology of Cohen, S.M., Erturk, E., Skibber, J.L., Bryan, G.T., 1983. Azathioprine induction of brain tumors. Neurol. Clin. 25, pp. 867–890, vii. lymphomas and squamous cell carcinomas in rats. Cancer Res. 43, 2768–2772. Fossmark, R., Qvigstad, G., Waldum, H.L., 2008. Gastric cancer: animal studies on the Cohen, S.M., Storer, R.D., Criswell, K.A., Doerrer, N.G., Dellarco, V.L., Pegg, D.G., risk of hypoacidity and hypergastrinemia. World J. Gastroenterol. 14, 1646– Wojcinski, Z.W., Malarkey, D.E., Jacobs, A.C., Klaunig, J.E., Swenberg, J.A., Cook, 1651. J.C., 2009. Hemangiosarcoma in rodents: mode-of-action evaluation and human French, J.E., 1989. NTP Technical Report on the Toxicology and Carcinogenesis relevance. Toxicol. Sci. 111, 4–18. Studies of Nitrofurantoin in F344/N Rats and B6C3F Mice (Feed Studies). Colbert, W.E., Wilson, B.F., Williams, P.D., Williams, G.D., 1991. Relationship National Institute of Health, (CAS no. 67–20-9). between in vitro relaxation of the costo-uterine smooth muscle and Friis, S., McLaughlin, J.K., Mellemkjaer, L., Kjoller, K.H., Blot, W.J., Boice Jr., J.D., mesovarial leiomyoma formation in vivo by beta-receptor agonists. Arch. Fraumeni Jr., J.F., Olsen, J.H., 1997. Breast implants and cancer risk in Denmark. Toxicol. 65, 575–579. Int. J. Cancer 71, 956–958. Cook, J.C., Klinefelter, G.R., Hardisty, J.F., Sharpe, R.M., Foster, P.M., 1999. Rodent Gardner, W., Dougherty, T., Williams, W.L., 1944. Lymphoid tumors in mice Leydig cell tumorigenesis: a review of the physiology, pathology, mechanisms, receiving steroid hormones. Cancer Res. 4, 73–87. and relevance to humans. Crit. Rev. Toxicol. 29, 169–261. Gart, J.J., Krewski, D., Lee, P.N., Tarone, R.E., Wahrendorf, J., 1986. Statistical methods Daniel, P.M., Prichard, M.M., 1964. Three types of mammary tumour induced in rats in cancer research. Volume III–The design and analysis of long-term animal by feeding with DMBA. Br. J. Cancer 13, 513–520. experiments. IARC Sci. Publ. 79, 1–219. Daniel, P.M., Prichard, M.M., 1961. The production of mammary tumours in rats Gartner, K., Pfaff, J., 1979. The forestomach in rats and mice, a food store without feeding with 3-methylcholanthrene. Br. J. Cancer 15, 828–832. bacterial protein digestion. Zentralbl. Veterinarmed. A 26, 530–541. Davenport, H.A., Savage, J., Dirstine, M., Queen, F.B., 1941. Induction of tumors in Gaylor, D.W., 2005. Are tumor incidence rates from chronic bioassays telling us rats by carcinogens in various lipids. Cancer Res. 1, 821–824. what we need to know about carcinogens? Regul. Toxicol. Pharmacol. 41, 128– Davies, T., Monro, A., 1995. Marketed human pharmaceuticals reported to be 133. tumorigenic in rodents. J. Am. Coll. Toxicol. 14, 90–107. Geyer, R.P., Bleisch, V.R., Bryant, J.E., Robbins, A.N., Saslaw, I.M., Stare, F.J., 1951. Davis, J.A., Gift, J.S., Zhao, Q.J., 2011. Introduction to benchmark dose methods and Tumor production in rats injected intravenously with oil emulsions containing U.S. EPA’s benchmark dose software (BMDS) version 2.1.1. Toxicol. Appl. 9,10-dimethyl-1,2-benzanthracene. Cancer Res. 11, 474–478. Pharmacol. 254, 181–191. Ghoshal, A.K., Farber, E., 1984. The induction of liver cancer by dietary deficiency of Dekker, A., Russfield, A., 1963. Pituitary tropic hormone studies and morphologic choline and methionine without added carcinogens. Carcinogenesis 5, 1367– observations in carcinoma of the prostate. Cancer 16, 743–750. 1370. DeSesso, J.M., Lavin, A.L., Hsia, S.M., Mavis, R.D., 2000. Assessment of the Gold, L.S., Slone, T.H., Ames, B.N., 1998. What do animal cancer tests tell us about carcinogenicity associated with oral exposures to hydrogen peroxide. Food human cancer risk?: Overview of analyses of the carcinogenic potency Chem. Toxicol. 38, 1021–1041. database. Drug Metab Rev. 30, 359–404. Dinse, G.E., Peddada, S.D., Harris, S.F., Elmore, S.A., 2010. Comparison of NTP Gold, L.S., Manley, N.B., Slone, T.H., Ward, J.M., 2001. Compendium of chemical historical control tumor incidence rates in female Harlan Sprague Dawley and carcinogens by target organ: results of chronic bioassays in rats, mice, hamsters, Fischer 344/N Rats. Toxicol. Pathol. 38, 765–775. dogs, and monkeys. Toxicol. Pathol. 29, 639–652. Druckrey, H., Preussmann, R., Ivankovic, S., Schmahl, D., 1967. Organotropic Gold, L.S., Slone, T.H., Manley, N.B., Bernstein, L., 1991. Target organs in chronic carcinogenic effects of 65 various N-nitroso- compounds on BD rats. Z bioassays of 533 chemical carcinogens. Environ. Health Perspect. 93, 233– Krebsforsch. 69, 103–201. 246. Duddy, S.K., Gorospe, S.M., Bleavins, M.R., de, l.I., 1999. Spontaneous and Gold, L.S., Slone, T.H., Stern, B.R., Bernstein, L., 1993. Comparison of target organs of thiazolidinedione-induced B6C3F1 mouse hemangiosarcomas exhibit low ras carcinogenicity for mutagenic and non-mutagenic chemicals. Mutat. Res. 286, oncogene mutation frequencies. Toxicol. Appl. Pharmacol. 160, 133–140. 75–100.

Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 24 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx

Gopinath, C., Gibson, W.A., 1987. Mesovarian leiomyomas in the rat. Environ. Health Ito, A., Watanabe, H., Naito, M., Naito, Y., 1981. Induction of duodenal tumors in Perspect. 73, 107–113. mice by oral administration of hydrogen peroxide. Gann 72, 174–175. Grasso, P., 1987. Experimental liver tumours in animals. Baillieres Clin. Jack, D., Poynter, D., Spurling, N.W., 1983. Beta-adrenoceptor stimulants and Gastroenterol. 1, 183–205. mesovarian leiomyomas in the rat. Toxicology 27, 315–320. Grasso, P., Golberg, L., 1966. Early changes at the site of repeated subcutaneous Jamison, K.C., Larson, J.L., Butterworth, B.E., Harden, R., Skinner, B.L., Wolf, D.C., injection of food colourings. Food Cosmet. Toxicol. 4, 269–282. 1996. A non-bile duct origin for intestinal crypt-like ducts with periductular Greaves, P., 2012. Lymphoreticular neoplasms,Histopathology of Preclinical Toxicity fibrosis induced in of F344 rats by chloroform inhalation. Carcinogenesis Studies: Interpretation and Relevance in Drug Safety Evaluation. Academic 17, 675–682. Press/Elsevier, Amsterdam, pp. 128–141. JECFA. Summary and conclusions of the 64th meeting of the Joint FAO/WHO Expert Greaves, P., 1996. The evaluation of potential human carcinogens: a Committee on Food Additives. histopathologist’s point of view. Exp. Toxicol. Pathol. 48, 169–174. Jiang, J., Xu, Y., Klaunig, J.E., 1998. Induction of oxidative stress in rat brain by Greaves, P., Barsoum, N., 1990. Pathology of tumours in laboratory animals: acrylonitrile (ACN). Toxicol. Sci. 46, 333–341. tumours of the rat, tumours of soft tissues. IARC Sci. Publ., 597–623. Jones, H.B., Orton, T.C., Lake, B.G., 2009. Effect of chronic phenobarbitone Greaves, P., Irisarri, E., Monro, A.M., 1986. Hepatic foci of cellular and enzymatic administration on liver tumour formation in the C57BL/10J mouse. Food alteration and nodules in rats treated with clofibrate or diethylnitrosamine Chem. Toxicol. 47, 1333–1340. followed by phenobarbital: their rate of onset and their reversibility. J. Natl. Kakiuchi-Kiyota, S., Vetro, J.A., Suzuki, S., Varney, M.L., Han, H.Y., Nascimento, Cancer Inst. 76, 475–484. M., Pennington, K.L., Arnold, L.L., Singh, R.K., Cohen, S.M., 2009. Effects of Gregory, M., Monro, A., Quinton, M., Woolhouse, N., 1983. The acute toxicity of the PPARgamma agonist troglitazone on endothelial cells in vivo and oxamniquine in rats; a sex-dependent hepatotoxicity. Arch. Toxicol. 54, 247– in vitro: differences between human and mouse. Toxicol. Appl. Pharmacol. 255. 237, 83–90. Greim, H., Hartwig, A., Reuter, U., Richter-Reichhelm, H.B., Thielmann, H.W., 2009. Kimbrough, R.D., Linder, R.E., Burse, V.W., Jennings, R.W., 1973. Adenofibrosis in the Chemically induced pheochromocytomas in rats: mechanisms and relevance rat liver, with persistence of polychlorinated biphenyls in adipose tissue. Arch. for human risk assessment. Crit. Rev. Toxicol. 39, 695–718. Environ. Health 27, 390–395. Gumbmann, M.R., Dugan, G.M., Spangler, W.L., Baker, E.C., Rackis, J.J., 1989. King-Herbert, A., Thayer, K., 2006. NTP workshop: animal models for the NTP rodent Pancreatic response in rats and mice to trypsin inhibitors from soy and cancer bioassay: stocks and strains–should we switch? Toxicol. Pathol. 34, 802– potato after short- and long-term dietary exposure. J. Nutr. 119, 1598–1609. 805. Hahn, F.F., Gigliotti, A., Hutt, J.A., 2007. Comparative oncology of lung tumors. King-Herbert, A.P., Sills, R.C., Bucher, J.R., 2010. Commentary: update on animal Toxicol. Pathol. 35, 130–135. models for NTP studies. Toxicol. Pathol. 38, 180–181. Hakanson, R., Oscarson, J., Sundler, F., 1986. Gastrin and the trophic control of Kirkpatrick, C.J., Alves, A., Kohler, H., Kriegsmann, J., Bittinger, F., Otto, M., Williams, gastric mucosa. Scand. J. Gastroenterol. (Suppl. 118), 18–30. D.F., Eloy, R., 2000. Biomaterial-induced sarcoma: a novel model to study Hard, G.C., 1998. Mechanisms of chemically induced renal carcinogenesis in the preneoplastic change. Am. J. Pathol. 156, 1455–1467. laboratory rodent. Toxicol. Pathol. 26, 104–112. Klimisch, H.-J., Andreae, M., Tillmann, U., 1997. A systematic approach for Hard, G.C., Bruner, R.H., Cohen, S.M., Pletcher, J.M., Regan, K.S., 2011. Renal evaluating the quality of experimental toxicological data. Regul. Toxicol. histopathology in toxicity and carcinogenicity studies with tert-butyl alcohol Pharamacol 25, 1–5. administered in drinking water to F344 rats: a pathology working group review Koestner, A., 1986. The brain-tumour issue in long-term toxicity studies in rats. and re-evaluation. Regul. Toxicol. Pharmacol. 59, 430–436. Food Chem. Toxicol. 24, 139–143. Hard, G.C., Seely, J.C., Kissling, G.E., Betz, L.J., 2008. Spontaneous occurrence of a Kolenda-Roberts, H.M., Harris, N., Singletary, E., Hardisty, J.F., 2013. distinctive renal tubule tumor phenotype in rat carcinogenicity studies Immunohistochemical characterization of spontaneous and acrylonitrile- conducted by the national toxicology program. Toxicol. Pathol. 36, 388–396. induced brain tumors in the rat. Toxicol. Pathol. 41, 98–108. Hart, A., Gosling, G., Boobis, A.R., Coggon, D., Craig, P., Jones, D., 2010. Development Kovacs, K., Horvath, E., Ilse, R.G., Ezrin, C., Ilse, D., 1977. Spontaneous pituitary of a Framework for Evaluation and Expression of Uncertainties in Hazard and adenomas in aging rats: a light microscopic, immunocytological and fine Risk Assessment. Research Report to Food Standards Agency, Food and structural study. Beitr. Pathol. 161, 1–16. Environment Research Agency, Project No. T01056. Krinke, G., Fix, A., Jacobs, M., Render, J., Weisse, J., 2001. Eye and harderian gland. In: Haseman, J.K., Hailey, J.R., Morris, R.W., 1998. Spontaneous neoplasm incidences in Mohr, U. (Ed.), International Classification of Rodent Tumors. Springer-Verlag, Fischer 344 rats and B6C3F1 mice in two-year carcinogenicity studies: a Heidelberg, pp. 39–162. National Toxicology Program update. Toxicol. Pathol. 26, 428–441. Krueger, G.R., Malmgren, R.A., Berard, C.W., 1971. Malignant lymphomas and Haseman, J.K., Lockhart, A.M., 1993. Correlations between chemically related site- plasmacytosis in mice under prolonged immunosuppression and persistent specific carcinogenic effects in long-term studies in rats and mice. Environ. antigenic stimulation. Transplantation 11, 138–144. Health Perspect. 101, 50–54. Lamberts, R., Brunner, G., Solcia, E., 2001. Effects of very long (up to 10 years) proton Havu, N., Mattsson, H., Ekman, L., Carlsson, E., 1990. Enterochromaffin-like cell pump blockade on human gastric mucosa. Digestion 64, 205–213. carcinoids in the rat gastric mucosa following long-term administration of Lewis, P., 1996. General Aspects of Endocrine Pathology, The Endocrine System. ranitidine. Digestion 45, 189–195. Churchill Livingstone, New York. Heydens, W.F., Wilson, A.G., Kier, L.D., Lau, H., Thake, D.C., Martens, M.A., 1999. An Lloyd, R.V., 1983. Estrogen-induced hyperplasia and neoplasia in the rat anterior evaluation of the carcinogenic potential of the herbicide alachlor to man. Hum. pituitary gland: an immunohistochemical study. Am. J. Pathol. 113, 198–206. Exp. Toxicol. 18, 363–391. Lock, E.A., Hard, G.C., 2004. Chemically induced renal tubule tumors in the Hoenerhoff, M.J., Hong, H.H., Ton, T.V., Lahousse, S.A., Sills, R.C., 2009. A review of laboratory rat and mouse: review of the NCI/NTP database and categorization of the molecular mechanisms of chemically induced neoplasia in rat and mouse renal carcinogens based on mechanistic information. Crit. Rev. Toxicol. 34, 211– models in National Toxicology Program bioassays and their relevance to human 299. cancer. Toxicol. Pathol. 37, 835–848. Lumb, G., Mitchell, L., de, l.I., 1985. Regression of pathologic changes induced by the Holsapple, M.P., Pitot, H.C., Cohen, S.M., Boobis, A.R., Klaunig, J.E., Pastoor, T., long-term administration of contraceptive steroids to rodents. Toxicol. Pathol. Dellarco, V.L., Dragan, Y.P., 2006. Mode of action in relevance of rodent liver 13, 283–295. tumors to human cancer risk. Toxicol. Sci. 89, 51–56. Malarkey, D.E., Devereux, T.R., Dinse, G.E., Mann, P.C., Maronpot, R.R., 1995. Hooson, J., Grasso, P., Gangolli, S.D., 1973. Injection site tumours and preceding Hepatocarcinogenicity of chlordane in B6C3F1 and B6D2F1 male mice: evidence pathological changes in rats treated subcutaneously with surfactants and for regression in B6C3F1 mice and carcinogenesis independent of ras proto- carcinogens. Br. J. Cancer 27, 230–244. oncogene activation. Carcinogenesis 16, 2617–2625. IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, 2003. Marcus, R., Watt, J., 1971. Colonic ulceration in young rats fed degraded Summary Report. Gastric Neuroendocrine Tumors., IARC, Lyon. carrageenan. Lancet 2, 765–766. IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, 1999. Maronpot, R.R., Giles, H.D., Dykes, D.J., Irwin, R.D., 1991. Furan-induced hepatic Acrylonitrile. IARC.Monogr. Eval. Carcinog. Risks. Hum 71 Pt 1, 43–108. cholangiocarcinomas in Fischer 344 rats. Toxicol. Pathol. 19, 561–570. IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, (1987, Maronpot, R.R., Zeiger, E., McConnell, E.E., Kolenda-Roberts, H., Wall, H., Friedman, Lyon, F.), 1987. Overall evaluations of carcinogenicityan updating of IARC M.A., 2009. Induction of tunica vaginalis mesotheliomas in rats by xenobiotics. monographs volumes 1 to 42: this publication represents the views and expert Crit. Rev. Toxicol. 39, 512–537. opinions of an IARC ad-hoc Working Group on the Evaluation of Carcinogenic Mastrandrea, L.D., Field, C., Stocker, T., Edenhofer, O., Ebi, K., Frame, D., Held, H., Risks to Humans, which met in Lyon [sic]. 7. Kriegler, E., Mach, K., Matschoss, P., Plattner, G., Yohe, G., Zwiers, F., 2010. IARC Working Group on the Evaluation of Carcinogenic Risks to Humans, World Guidance Note for Lead Authors of the IPCC Fifth Assessment Report on Health Organization, International Agency for Research on Cancer, 2007. Consistent Treatment of Uncertainties. Intergovernmental Panel on Climate Combined estrogen-progestogen contraceptives and combined estrogen- Change, Geneva. progestogen menopausal therapy, v. 91 ed. International Agency for Research McClain, R.M., 1995. Mechanistic considerations for the relevance of animal data on on Cancer, Lyon, France. thyroid neoplasia to human risk assessment. Mutat. Res. 333, 131–142. IPCC, 2005. Guidance Notes for Lead Authors of the IPCC Fourth Assessment Report McComb, D.J., Kovacs, K., Beri, J., Zak, F., 1984. Pituitary adenomas in old Sprague- on addressing uncertainties. Dawley rats: a histologic, ultrastructural, and immunocytochemical study. J. Ishioka, T., Kuwabara, N., Fukuda, Y., 1985. Induction of colorectal adenocarcinoma Natl. Cancer Inst. 73, 1143–1166. in rats by amylopectin sulfate. Cancer Lett. 26, 277–282. McConnell, E.E., Solleveld, H.A., Swenberg, J.A., Boorman, G.A., 1986. Guidelines for Ishioka, T., Kuwabara, N., Oohashi, Y., Wakabayashi, K., 1987. Induction of colorectal combining neoplasms for evaluation of rodent carcinogenesis studies. J. Natl. tumors in rats by sulfated polysaccharides. Crit. Rev. Toxicol. 17, 215–244. Cancer Inst. 76, 283–289.

Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx 25

McGuire, E.J., DiFonzo, C.J., Martin, R.A., de, l.I., 1986. Evaluation of chronic toxicity Rogers, A.B., Boutin, S.R., Whary, M.T., Sundina, N., Ge, Z., Cormier, K., Fox, J.G., 2004. and carcinogenesis in rodents with the synthetic analgesic, tilidine fumarate. Progression of chronic hepatitis and preneoplasia in Helicobacter hepaticus- Toxicology 39, 149–163. infected A/JCr mice. Toxicol. Pathol. 32, 668–677. Meek, M.E., Bucher, J.R., Cohen, S.M., Dellarco, V., Hill, R.N., Lehman-McKeeman, Russfield, A., 2013. Pituitary tumors. In: Sommers, S. (Ed.), Endocrine Pathology L.D., Longfellow, D.G., Pastoor, T., Seed, J., Patton, D.E., 2003. A framework for Decennial, 1966–1975. Appleton-Century Crofts, New York, pp. 41–79. human relevance analysis of information on carcinogenic modes of action. Crit. Sanner, T., Dybing, E., Willems, M.I., Kroese, E.D., 2001. A simple method for Rev. Toxicol. 33, 591–653. quantitative risk assessment of non-threshold carcinogens based on the dose Mikol, Y.B., Hoover, K.L., Creasia, D., Poirier, L.A., 1983. Hepatocarcinogenesis in rats descriptor T25. Pharmacol. Toxicol. 88, 331–341. fed methyl-deficient, amino acid-defined diets. Carcinogenesis 4, 1619–1629. Satagopan, J.M., Venkatraman, E.S., Begg, C.B., 2004. Two-stage designs for gene- Misbin, R., 2009. Cycloset (bromocriptine), Center for Drug Evaluation and Research, disease association studies with sample size constraints. Biometrics 60, 589– Food and Drug Administration, Rockville, MD, NDA Submission 20-860. 597. Molon-Noblot, S., Laroque, P., Coleman, J.B., Hoe, C.M., Keenan, K.P., 2003. The Schneider, K., Schwarz, M., Burkholder, I., Kopp-Schneider, A., Edler, L., Kinsner- effects of ad libitum overfeeding and moderate and marked dietary restriction Ovaskainen, A., Hartung, T., Hoffmann, S., 2009. "ToxRTool", a new tool to assess on age-related spontaneous pituitary gland pathology in Sprague-Dawley rats. the reliability of toxicological data. Toxicol Lett. 189, 138–144. Toxicol. Pathol. 31, 310–320. Schütz, H., Spangenberg, A., Wiedemann, P.M., 2008. Evidence maps – a tool for Nagai, M.K., Farber, E., 1999. The slow induction of resistant hepatocytes during summarizing and communicating evidence in risk assessment. In: Wiedemann, initiation of hepatocarcinogenesis by the nongenotoxic carcinogen clofibrate. P., Schutz, H. (Eds.), The Role of Evidence in Risk Characterization: Making Sense Exp. Mol. Pathol. 67, 144–149. of Conflicting Data. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim. Nelson, L.W., Kelly, W.A., 1971. Mesovarial leiomyomas in rats in a chronic toxicity Sharratt, M., Grasso, P., Carpanini, F., Gangolli, S.D., 1970. Carrageenan ulceration as study of soterenol hydrochloride. Vet. Pathol., 452–457. a model for human ulcerative colitis. Lancet 2, 932. Nelson, L.W., Kelly, W.A., Weikel Jr., J.H., 1972. Mesovarial leiomyomas in rats in a Sirica, A.E., 1996. Biliary proliferation and adaptation in furan-induced rat liver chronic toxicity study of mesuprine hydrochloride. Toxicol. Appl. Pharmacol. injury and carcinogenesis. Toxicol. Pathol. 24, 90–99. 23, 731–737. Solcia, E., Rindi, G., Larosa, S., Capella, C., 2000. Morphological, molecular, and Nikitin, A.Y., Alcaraz, A., Anver, M.R., Bronson, R.T., Cardiff, R.D., Dixon, D., Fraire, prognostic aspects of gastric endocrine tumors. Microsc. Res. Tech. 48, 339–348. A.E., Gabrielson, E.W., Gunning, W.T., Haines, D.C., Kaufman, M.H., Linnoila, R.I., Son, W.C., Gopinath, C., 2004. Early occurrence of spontaneous tumors in CD-1 mice Maronpot, R.R., Rabson, A.S., Reddick, R.L., Rehm, S., Rozengurt, N., Schuller, and Sprague-Dawley rats. Toxicol. Pathol. 32, 371–374. H.M., Shmidt, E.N., Travis, W.D., Ward, J.M., Jacks, T., 2004. Classification of Speroff, L., 1999. Postmenopausal hormone therapy and the risk of breast cancer. proliferative pulmonary lesions of the mouse: recommendations of the mouse Maturitas 32, 123–129. models of human cancers consortium. Cancer Res. 64, 2307–2316. Stitzel, K.A., McConnell, R.F., Dierckman, T.A., 1989. Effects of nitrofurantoin on the Nilsson, A., Bierke, P., 1997. Neoplasia and preneoplasia of the mouse pituitary primary and secondary reproductive organs of female B6C3F1 mice. Toxicol. gland. In: Bannasch, P., Goessner, W. (Eds.), Pathology of Neoplasia and Pathol. 17, 774–781. Preneoplasia in Rodents: EULEP Colo0072 Atlas. Schattauer, Stuttgart, pp. 95– Sunderman Jr., F.W., 1989. Carcinogenicity of metal alloys in orthopedic prostheses: 113. clinical and experimental studies. Fundam. Appl. Toxicol. 13, 205–216. Nyska, A., Haseman, J.K., Hailey, J.R., Smetana, S., Maronpot, R.R., 1999. The Suter, G.W., Cormier, S.M., 2011. Why and how to combine evidence in association between severe nephropathy and pheochromocytoma in the male environmental assessments: weighing evidence and building cases. Sci. Total F344 rat – the National Toxicology Program experience. Toxicol. Pathol. 27, Environ. 409, 1406–1417. 456–462. Svoboda, D.J., Azarnoff, D.L., 1979. Tumors in male rats fed ethyl OECD, 2008. Draft OECD Guideline for the Testing of Chemicals Test Guideline 451. chlorophenoxyisobutyrate, a hypolipidemic drug. Cancer Res. 39, 3419–3428. OECD, 2010. OECD Draft Guidance Document No. 116 on the Design and Conduct of Takahashi, M., Dinse, G.E., Foley, J.F., Hardisty, J.F., Maronpot, R.R., 2002. Chronic Toxicity and Carcinogenicity Studies, Supporting TG 451, 452, 453. Comparative prevalence, multiplicity, and progression of spontaneous and Ohnishi, T., Arnold, L.L., Clark, N.M., Wisecarver, J.L., Cohen, S.M., 2007. Comparison vinyl carbamate-induced liver lesions in five strains of male mice. Toxicol. of endothelial cell proliferation in normal liver and adipose tissue in B6C3F1 Pathol. 30, 599–605. mice, F344 rats, and humans. Toxicol. Pathol. 35, 904–909. Theil, M., 2002. The role of translations of verbal into numerical Oppenheimer, B., Oppenheimer, E.T., Stout, A.P., 1953. Carcinogenic effect of probability expressions in risk management: a meta-analysis. J. Risk imbedding various plastic films in rats and mice. Surg. Forum 4, 672–676. Res. 5, 177–186. Peto, R., Gray, R., Brantom, P., Grasso, P., 1991. Effects on 4080 rats of chronic Thomas, J., Haseman, J.K., Goodman, J.I., Ward, J.M., Loughran Jr., T.P., Spencer, P.J., ingestion of N-nitrosodiethylamine or N-nitrosodimethylamine: a detailed 2007. A review of large granular lymphocytic leukemia in Fischer 344 rats as an dose-response study. Cancer Res. 51, 6415–6451. initial step toward evaluating the implication of the endpoint to human cancer Peto, R., Gray, R., Brantom, P., Grasso, P., 1984. Nitrosamine carcinogenesis in 5120 risk assessment. Toxicol. Sci. 99, 3–19. rodents: chronic administration of sixteen different concentrations of NDEA, Thorgeirsson, S.S., 1996. Hepatic stem cells in liver regeneration. FASEB J. 10, 1249– NDMA, NPYR and NPIP in the water of 4440 inbred rats, with parallel studies on 1256. NDEA alone of the effect of age of starting (3, 6 or 20 weeks) and of species (rats, Tischler, A.S., DeLellis, R., 1988. The rat adrenal medulla. II. Proliferative lesions. J. mice or hamsters). IARC. Sci. Publ., 627–665. Am. Coll. Toxicol. 7, 23–41. Portier, C.J., Hedges, J.C., Hoel, D.G., 1986. Age-specific models of mortality and Tischler, A.S., Powers, J.F., Downing, J.C., Riseberg, J.C., Shahsavari, M., Ziar, J., tumor onset for historical control animals in the National Toxicology Program’s McClain, R.M., 1996. Vitamin D3, lactose, and xylitol stimulate chromaffin cell carcinogenicity experiments. Cancer Res. 46, 4372–4378. proliferation in the rat adrenal medulla. Toxicol. Appl. Pharmacol. 140, 115– Poteracki, J., Walsh, K.M., 1998. Spontaneous neoplasms in control Wistar rats: a 123. comparison of reviews. Toxicol. Sci. 45, 1–8. Tokiwa, H., Otofuji, T., Horikawa, K., Sera, N., Nakagawa, R., Maeda, T., Sano, N., Poynter, D., Harris, D.M., Jack, D., 1978. Salbutamol: lack of evidence of tumour Izumi, K., Otsuka, H., 1987. Induction of subcutaneous tumors in rats by 3,7- and induction in man. Br. Med. J. 1, 46–47. 3,9-dinitrofluoranthene. Carcinogenesis 8, 1919–1922. Poynter, D., Pick, C.R., Harcourt, R.A., Selway, S.A., Ainge, G., Harman, I.W., Spurling, Travlos, G.S., Hard, G.C., Betz, L.J., Kissling, G.E., 2011. Chronic progressive N.W., Fluck, P.A., Cook, J.L., 1985. Association of long lasting unsurmountable nephropathy in male F344 rats in 90-day toxicity studies: its occurrence and histamine H2 blockade and gastric carcinoid tumours in the rat. Gut 26, 1284– association with renal tubule tumors in subsequent 2-year bioassays. Toxicol. 1295. Pathol. 39, 381–389. Poynter, D., Selway, S.A., Papworth, S.A., Riches, S.R., 1986. Changes in the gastric Tryphonas, L., Arnold, D.L., Zawidzka, Z., Mes, J., Charbonneau, S., Wong, J., 1986. A mucosa of the mouse associated with long lasting unsurmountable histamine pilot study in adult rhesus monkeys (M. mulatta) treated with Aroclor 1254 for H2 blockade. Gut 27, 1338–1346. two years. Toxicol. Pathol. 14, 1–10. Prentice, D.E., Meikle, A.W., 1995. A review of drug-induced Leydig cell hyperplasia Tucker, M., 1997. The Nervous System, Diseases of the Wistar Rat. Taylor and and neoplasia in the rat and some comparisons with man. Hum. Exp. Toxicol. Francis, London, pp. 217–236. 14, 562–572. UK Committee on Mutagenicity of Chemicals in Food, C. P. a. t. E. 2000. Guidance on Preston, R.J., Williams, G.M., 2005. DNA-reactive carcinogens: mode of action and a Strategy for Testing of Chemicals for Mutagenicity. human cancer hazard. Crit. Rev. Toxicol. 35, 673–683. Vial, T., Descotes, J., 2003. Immunosuppressive drugs and cancer. Toxicology 185, Reddy, J.K., Qureshi, S.A., 1979. Tumorigenicity of the hypolipidaemic peroxisome 229–240. proliferator ethyl-alpha-p-chlorophenoxyisobutyrate (clofibrate) in rats. Br. J. Weinbren, K., Washington, S.L., 1976. Hyperplastic nodules after portacaval Cancer 40, 476–482. anastomosis in rats. Nature 264, 440–442. Rhomberg, L.R., Baetcke, K., Blancato, J., Bus, J., Cohen, S., Conolly, R., Dixit, R., Doe, J., Welsch, C., 1985. Host factors affecting the growth of carcinogen-induced rat Ekelman, K., Fenner-Crisp, P., Harvey, P., Hattis, D., Jacobs, A., Jacobson-Kram, D., mammary carcinomas: a review and tribute to Charles Brenton Huggins. Cancer Lewandowski, T., Liteplo, R., Pelkonen, O., Rice, J., Somers, D., Turturro, A., West, Res. 45, 3415–3443. W., Olin, S., 2007. Issues in the design and interpretation of chronic toxicity and Wen, P.Y., Kesari, S., 2008. Malignant gliomas in adults. N. Engl. J. Med. 359, 492– carcinogenicity studies in rodents: approaches to dose selection. Crit. Rev. 507. Toxicol. 37, 729–837. Westwood, F.R., Longstaff, E., Butler, W.H., 1979. Cellular progression of neoplasia in Richardson, B., Turkalj, I., Fluckiger, E., 1984. Bromocriptine. In: Laurence, D., the subcutis of mice after implantation of 3,4-benzpyrene. Br. J. Cancer 39, 761– McLean, A., Weatherall, M. (Eds.), Safety Testing of New Drugs, Laboratory 772. Predictions and Clinical Performance. Academic Press, London, pp. 19–63. Wheeler, M.W., Bailer, A.J., 2007. Properties of model-averaged BMDLs: a study of Rieth, J.P., Starr, T.B., 1989. Experimental design constraints on carcinogenic model averaging in dichotomous response risk estimation. Risk Anal. 27, 659– potency estimates. J. Toxicol. Environ. Health 27, 287–296. 770.

Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030 26 L. Edler et al. / Food and Chemical Toxicology xxx (2013) xxx–xxx

Wheeler, M., Bailer, A., 2008. Model averaging software for dichotomous dose Williams, G., Iatropoulos, M.J., Enzmann, H., 2008. Principles of testing for response risk estimation. J. Stat. Softw. 26, 1–15. carcinogenic activity. In: Wallace Hayes, A. (Ed.), Principles and Methods of Whysner, J., Steward III, R.E., Chen, D., Conaway, C.C., Verna, L.K., Richie Toxicology. Taylor and Francis, Philadelphia, PA, pp. 1265–1316. Jr., J.P., Ali, N., Williams, G.M., 1998. Formation of 8-oxodeoxyguanosine Wright, J.A., Marsden, A.M., Willets, J.M., Orton, T.C., 1991. Hepatocarcinogenic in brain DNA of rats exposed to acrylonitrile. Arch. Toxicol. 72, 429– effect of vinyl carbamate in the C57Bl/10J strain mouse. Toxicol. Pathol. 19, 438. 258–265. Williams, G.M., 1997. Chemicals with carcinogenic activity in the rodent liver; Yager, J.D., Davidson, N.E., 2006. Estrogen carcinogenesis in breast cancer. N. Engl. J. mechanistic evaluation of human risk. Cancer Lett. 117, 175–188. Med. 354, 270–282. Williams, G.M., Iatropoulos, M.J., 2002. Alteration of liver cell function and Young, S.S., Gries, C.L., 1984. Exploration of the negative correlation between proliferation: differentiation between adaptation and toxicity. Toxicol. Pathol. proliferative hepatocellular lesions and lymphoma in rats and mice– 30, 41–53. establishment and implications. Fundam. Appl. Toxicol. 4, 632–640.

Please cite this article in press as: Edler, L., et al. Selection of appropriate tumour data sets for Benchmark Dose Modelling (BMD) and derivation of a Margin of Exposure (MoE) for substances that are genotoxic and carcinogenic: Considerations of biological relevance of tumour type, data quality and uncertainty assessment. Food Chem. Toxicol. (2013), http://dx.doi.org/10.1016/j.fct.2013.10.030