Structure of the Guidance for Deriving Dnels, And/Or Dmels from Human Data

Working group on HD DNEL/DMEL derivation

Working document for the Expert Review Panel

Adapted by comments received

To be finalised by June 2008

by

Gerard Swaen & Chris Money (ECETOC) and Sandra Bausch & Dinant Kroese (TNO)

Version 2 – .. May 2008 1 Working group on HD DNEL/DMEL derivation

Introductory note

Background

Although HD are routinely used in many standard setting processes e.g. for the derivation of EU Indicative Limit

Values, the process by which HD are assessed and subsequently used in these has not been explicitly described. In order to provide the necessary clarity to the REACH TGD, therefore, TNO and ECETOC have worked to identify suitable sources of information which can provide suitable guidance to support each of the steps.

Following the ECETOC/TNO workshop in November 2007 and the Competent Authorities discussions on RIPs

3.2 and 3.3 (culminating in the adoption by the CAs of the consolidated 3.2/3.3 TGD in March 2008), TNO and

ECETOC have worked to incorporate the sentiments expressed by participants at the November workshop with the general structure and contents of the guidance to be adopted (and especially those parts relevant for setting

DNELs and DMELs). These are contained in the present document. This document is not thoroughly checked for spelling, internal consistency, and overlap/provided with cross referencing etc.

Views Sought

With these considerations in mind, your comments are sought on the following issues captured within the guidance:

 Evaluation of the quality of human data independently from animal studies but in a manner that parallels the

criteria used to categorise animal data;

 Comparison of the human and animal data to determine which data source is likely to be most suitable for

establishing the DNEL (or DMEL) not by using strict weight of evidence criteria, but rather, a transparent

process building off the IPCS human relevance framework data;

 Differentiation only on the basis of short term and/or chronic effects, not addressing separate human

endpoints;

 Proposal of a set of Assessment Factors that are judged to be appropriate for human data, based upon

experience from related areas of limit setting e.g. workplace exposures, food standards, air quality values,

etc.;

 Deriviation of DMELs for non-threshold carcinogens using HD;

With regard to the not yet incorporated scheme (i.e. into the guidance) as depicted on page 4 of this document, yet without a narrative to be developed – if requested: is the scheme understandable, does it support the processes described in Appendix R.8-15, and thus presents added value?

Proposed process

We would welcome all your comments and suggestion by May 15th at the latest. We will then evaluate your comments and probably arrange a conference call to discuss and settle the most important issues identified.

Depending on the time available before delivery to ECHA we will then circulate a second version for your review.

Part I

Below text is from the Reference TGD prepared for CA endorsement, scheduled for end March 2008. It consists of two parts: A. General introduction (from R4.1 & 4.3) B. Endpoint-specific guidance (from subsections 3, 4, and 5 of R7.2 to R7.7)

Blue text is newly inserted as compared to the text presented at the Brussels 2007 Workshop

A. General introduction

4.1 Relevance of information

Human data is in principle the most relevant source of information on human toxicity. Since there may be limitations with regard to the reliability of these studies, they are normally considered together with animal, in vitro and other information in order to be able to reach a conclusion about the relevance of the effects to humans.

4.3.3 Adequacy of information

The evaluation and use of information derived from studies in humans usually requires more elaborate and in-depth critical assessment of the reliability than animal data (WHO, 1983). Four major types of human data may be submitted (1) analytical epidemiology studies on exposed populations, (2) descriptive or correlation epidemiology studies, (3) case reports and (4) in very rare, justified cases controlled studies in human volunteers. Analytical epidemiology studies (1) are useful for identifying a relationship between human exposure and effects such as biological effect markers, early signs of chronic effects, disease occurrence, or mortality and may provide the best data for risk assessment. Study designs include: - Case-control (case-referent) studies, where a group of individuals with (cases) and without (controls/referents) a particular effect are identified and compared to determine differences in exposure in the recent or more distant past; - Cohort studies, where groups of variously exposed and non-exposed individuals are identified and differences between the groups in effect occurrence over time are studied; - Cross-sectional studies, where a population (e.g. a workforce) is studied, so that morbidity at a given point in time can be assessed in relation to concurrent exposure.

The strength of the epidemiological evidence for specific health effects depends, among other things, on the type of analyses and on the magnitude and specificity of the response. Confidence in the findings is increased when comparable results are obtained in several independent studies on populations exposed to the same agent under different conditions. In general, cohort studies provide stronger evidence than case- control studies, because exposure is assessed independently of the health status or outcome of the subjects in the study. Other characteristics that support a causal association are presence of a dose- response association, a consistent relationship in time and (biological) plausibility. Criteria for assessing the adequacy of epidemiology studies include the proper selection and characterisation of the case and control groups (in case-control studies), adequate characterisation of exposure, sufficient length of follow-up for disease occurrence (in cohort studies), valid ascertainment of effect, proper consideration of biases and confounding factors. Assessment of adequacy of the studies should be conducted by epidemiologists by training. Due to both uncertainties in epidemiological studies and true variability in the association between exposure and health outcomes within and among human populations, the available body of epidemiological evidence should be systematically reviewed and, if possible, combined. A Weight of Evidence approach is essential for risk assessment based on epidemiological data to (a) assess (sources of) heterogeneity across the studies and (b) increase statistical stability of the risk estimates. The best option to combine and summarise epidemiological data is a pooled analysis of the original data sets of the contributing studies. A meta- analysis based on published study results is a good, but somewhat more restricted alternative. A comprehensive guidance of both the evaluation and use of epidemiological evidence for risk assessment purposes is provided by Kryzanowski et al (WHO 2000). Descriptive epidemiology studies (2) examine differences in disease rates among human populations in relation to age, gender, race, and differences in temporal or environmental conditions. These studies are useful for identifying areas for further research but are not very useful for risk assessment. Typically these studies can only identify patterns or trends in disease occurrence over time or in different geographical locations but cannot ascertain the causal agent or degree of human exposure. Case reports (3) describe a particular health condition in an individual or a group of individuals who were exposed to a substance. They may be particularly relevant when they demonstrate effects which cannot be observed in experimental animal studies. In many such studies, information is lacking on critical aspects such as substance identity and purity, exposure, health status of the persons exposed and even the symptoms reported; thorough assessment of the reliability and relevance of case reports is therefore necessary. Case reports also trigger analytical studies. When they are already available, well-conducted controlled human exposure studies (4) in volunteers, including low exposure toxicokinetics studies, can also be used in risk assessment. However, few human experimental toxicity studies are available due to the practical and ethical considerations involved in deliberate exposure of individuals. Such studies, e.g. studies carried out for the authorisation of a medical product, have to be conducted in line with the World Medical Association Declaration of Helsinki, which describes the general ethical principles for medical research involving human subjects (World Medical Association, 2000).

Criteria for a well-designed experimental study include the use of a double-blind study design, inclusion of a randomised control group, sufficient duration of exposure and an adequate number of subjects to detect an effect. A meta-analysis of available similar, even small, studies is a good option. It is emphasised that testing with human volunteers is strongly discouraged, but when there are good quality data already available they should be used as appropriate, in well justified cases.

B. Endpoint-specific guidance

Skin- and eye irritation/corrosion and respiratory irritation (section 7.2)

Information & its sources Existing human data include historical data that should be taken into account when evaluating intrinsic hazards of chemicals. New testing in humans for hazard identification purposes is not acceptable for ethical reasons. Existing data can be obtained from case reports, poison information centres, medical clinics, and occupational experience or from epidemiological studies. Their quality and relevance for hazard assessment should be critically reviewed. However, in general human data can be used to determine a corrosive or irritating potential of a substance. Good quality and relevant human data have precedence over other data. However, lack of positive findings in humans does not necessarily overrule good quality animal data that are positive. Specifically with regard to respiratory irritation, there is a view in the occupational health literature that sensory irritation may be a more sensitive effect than overt tissue-damaging irritation, given that its biological function is to serve as an immediate warning against substances inhaled during a short period of time which could damage the airways, and that it triggers physiological reflexes that limit inhalation volumes and protect the airways. However, there is a lack of documented evidence to indicate that this is a generic position that would necessarily apply to all inhaled irritants.

Evaluation of available information Well-documented existing human data of different sources can often provide very useful information on skin and/or respiratory irritation, sometimes for a range of exposure levels. Often the only useful information on respiratory irritation is obtained from human experience (occupational settings). The usefulness of all human data on irritation will depend on the extent to which the effect, and its magnitude, can be reliably attributed to the substance of interest. Experience has shown that it is difficult to obtain useful data on substance-induced eye irritation, but data may be available on human ocular responses to certain types of preparations (e.g. Freeberg et al, 1986a). The quality and relevance of existing human data for hazard assessment should be critically reviewed. For example, in occupational studies with mixed exposure it is important that the substance causing the irritation or corrosion has been accurately identified. There may also be a significant level of uncertainty in human data due to poor reporting and lack of specific information on exposure. Examples of how existing human data can be used in hazard classification for irritancy are provided in a recent ECETOC monograph (ECETOC, 2002). Human data on local skin effects may be obtained from existing data on single or repeated exposure. The exposure could be of accidental nature or prolonged, for example in occupational settings. The exposure is usually difficult to quantify. When looking at the effects, corrosivity is characterised by destruction of skin tissue, namely visible necrosis through the epidermis and into the dermis. Corrosive reactions are typified by ulcers, bleeding and bloody scabs. After recovery the skin will be discoloured due to blanching of the skin, complete areas of alopecia and scars (see Chapter 3.2 of Guidance on Classification, Packaging and Labelling related to the future GHS system for further information), i.e. corrosivity is an irreversible damage. With this

Version 2 – .. May 2008 8 Working group on HD DNEL/DMEL derivation characterisation it should be possible to discern corrosive properties in humans. However, to distinguish between “Causes severe burns”, R35, and “Causes burns”, R34, (3 minutes’ and 4 hours’ exposure in rabbits, respectively) may not be so obvious in practice. A clear case for R35 classification would be an accidental splash which gave rise to necrosis of the skin. In cases where it is obvious that a prolonged exposure is needed (not to be mixed with delayed effects) before necrosis occurs, R34 seems more reasonable. If the distinction between R35 and R34 is not clearly apparent then the more stringent classification should be chosen. Discrimination between corrosives and skin irritants in rabbits is made on the effects caused after 4 hours’ exposure. Irritants to the skin cause a significant inflammation which is reversible. Severe eye irritants (R41) give more severe corneal opacity and iritis than eye irritants (R36). R41 compounds induce considerable tissue damage which can result in serious physical decay of vision. The effects normally do not reverse within 21 days (relates to animals); see Chapter 3.3 of the GHS. In contrast, the effects of R36 compounds are reversible within 21 days. In humans, a sight control by a physician would reveal a decay of vision. If it is not transient but persistent it implies classification with R41. If the discrimination between R41 and R36 is not obvious, then R41 should be chosen.

Human data for respiratory irritation Consideration should be given to real-life human observational experience, if this is properly collected and documented (Arts et al, 2006), e.g. data from well-designed workplace surveys, worker health monitoring programmes. For substances with an array of industrial uses and with abundant human evidence, the symptoms of respiratory irritation can sometimes be associated with certain concentrations of the irritants in the workplace air and might thus allow derivation of DNELs. However, the exposure details need to be well documented and due consideration should be given to possible confounding factors. Data on sensory irritation of the airways may be available from volunteer studies including objective measurements of respiratory tract irritation such as electrophysiological responses, data from lateralization threshold testing, biomarkers of inflammation in nasal or bronchoalveolar lavage fluids. Including anosmics as subjects could exclude odor as a bias.

Concluding on Dose response assessment1

(refered to R.8.8.9, where the following can be read:)

Identification of the dose descriptor Skin If human data are available, a NOAEL/C or a LOAEL/C may be identified from these data. The NOAEL/C for skin irritation/corrosion would be the highest dose/concentration that did not cause dermal irritation/corrosion in the relevant animal study or in the human cases/data. It is assumed that at the higher dose levels clear signs of skin irritation were observed. Note that since the relevant tests, i.e. acute, sub-acute and sub-chronic dermal toxicity studies are made to observe the systemic toxicity, the local effects on the skin may not always be reported in detail. If irritation or corrosion was observed in more than one study, the highest (relevant) NOAEL/C below the lowest (relevant) LOAEL/C should be selected.

1 These appendices in R.8. have as headings: Identification of the typical dose descriptor/ Modification of the dose descriptor / Application of assessment factors to the correct starting point ..(=modified dose descriptor)…

Respiratory system For irritant cytotoxic effects on the respiratory tract it may be possible to derive from the available data (either in humans or in animals) a dose descriptor, i.e. a non-irritant concentration (NOAEC) or the lowest irritant concentration (LOAEC) (expressed either as ppm or mg/m3) for setting the DNEL. Where data are available from both well-documented human experience and from adequately reported animal studies, due account shall be taken of the human data and all other relevant information in deriving the DNEL. In particular, where it is known that experimental animals express a differing sensitivity to the corrosive/irritant effects of a substance than humans, the data from humans shall take precedence in deriving the DNEL. If only animal data are available, a DNEL for irritation/corrosion of the respiratory tract may be derived from a NOAEC or a LOAEC which can occasionally be based on the acute, sub-acute or sub-chronic inhalation studies. The NOAEC would be the highest concentration that did not cause respiratory irritation in the acute, sub-acute and sub-chronic inhalation toxicity study or in the human cases/data. It is assumed that at the higher dose levels clear signs of respiratory irritation were observed. If irritation was observed in more than one study, the highest (relevant) NOAEC below the lowest (relevant) LOAEC should be selected.

Eye ……..Also, if human data are available, a NOAEC or a LOAEC may be identified from these data.

Human data Human data may sometimes by used for setting the irritant/corrosive concentration for skin or eye or irritant concentration in the respiratory tract. Concentration-response relations or threshold concentration in humans have been observed for some chemicals, for which there are sufficient evidence and case reports/clinical cases e.g. from occupational exposures. For some groups of chemicals, such as common solvents, peroxides and acids, the irritant concentration in a liquid or in the air has been characterised based on the human evidence. Also information provided by poison information centres can be useful. Open literature and the company-based occupational health surveillance of the relevant industries should be used to find out if data is available, which enable characterisation of these effects in quantitative terms (e.g. Medical Surveillance reviews of U.S Department of Labour (4) and 2666 Chemical Hazards review UK Health Protection Agency (5)).

Modification of the dose descriptor Skin If the data on dose descriptor is the highest concentration (based on e.g. human data), which does not irritate human skin or eye, the exposure estimation should address the concentration of the substance in the relevant use, and no modification is needed. Respiratory system If the dose descriptor is the highest air concentration, which does not cause respiratory irritation (NOAEC) or the lowest air concentration which causes respiratory irritation (LOAEC) from either an animal inhalation toxicity

Version 2 – .. May 2008 10 Working group on HD DNEL/DMEL derivation study or from human data, the exposure estimation should address the air concentration in work-places and in consumer uses, and no modification is needed. Eye Usually, quantitative assessment of the eye irritation/corrosion will not be possible, because only qualitative data from the relevant in vitro or in vivo studies are available. Only occasionally, signs of eye irritation/corrosion may be observed in animal inhalation toxicity studies or in humans. If a NOAEC or a LOAEC can be identified, this does not need to be modified.

Skin and respiratory sensitisation (section 7.3)

Skin sensitisation

Information & its sources Human data on cutaneous (allergic contact dermatitis and urticarial) reactions may come from a variety of sources: - consumer experience and comments, preferably followed up by professionals (e.g. diagnostic patch tests) - diagnostic clinical studies (e.g. patch tests, repeated open application tests) - records of workers’ experience, accidents, and exposure studies including medical surveillance - case reports in the general scientific and medical literature - consumer tests (monitoring by questionnaire and/or medical surveillance) - epidemiological studies - human experimental studies such as the human repeat insult patch test (Stotts, 1980) and the human maximisation test (Kligman, 1966), although it should be noted that new experimental testing for hazard identification in humans, including HRIPT and HMT, is not acceptable for ethical reasons.

Evaluation of available information When reliable and relevant human data are available, they can be useful for hazard identification and even preferable over animal data. However, lack of positive findings in humans does not necessarily overrule positive and good quality animal data. Well conducted human studies can provide very valuable information on skin sensitisation. However, in some instances (due to lack of information on exposure, a small number of subjects, concomitant exposure to other substances, local or regional differences in patient referral etc) there may be a significant level of uncertainty associated with human data. Moreover, diagnostic tests are carried out to see if an individual is sensitised to a specific agent, and not to determine whether the agent can cause sensitisation. For evaluation purposes, existing human experience data for skin sensitisation should contain sufficient information about: - the test protocol used (study design, controls) - the substance or preparation studied (should be the main, and ideally, the only substance or preparation present which may possess the hazard under investigation) - the extent of exposure (magnitude, frequency and duration) - the frequency of effects (versus number of persons exposed) - the persistence or absence of health effects (objective description and evaluation) - the presence of confounding factors (e.g. pre-existing dermal health effects, medication; presence of other skin sensitizers)

- the relevance with respect to the group size, statistics, documentation - the healthy worker effect - Evidence of skin sensitising activity derived from diagnostic testing may reflect the induction of skin sensitisation to that substance or cross-reaction with a chemically very similar substance. In both situations, the normal conclusion would be that this provides positive evidence of the skin sensitising activity of the chemical used in the diagnostic test. Human experimental studies on skin sensitisation are not normally conducted and are generally discouraged. Where human data are available, then quality criteria and ethical considerations are presented in ECETOC monograph no 32. Ultimately, where a very large number of individuals (e.g.105) have frequent (daily) skin exposure for at least two years and there is an active system in place to pick up complaints and adverse reaction reports (including via dermatology clinics), and where no or only a very few isolated cases of allergic contact dermatitis are observed then the substance is unlikely to be a significant skin sensitizer. However, information from other sources should also be considered in making a judgement on the substance's ability to induce skin sensitisation. It is emphasised that testing with human volunteers is strongly discouraged, but when there are good quality data already available they should be used as appropriate in well justified cases.

Respiratory sensitisation

Information & its sources Human data on respiratory reactions (asthma, rhinitis, alveolitis) may come from a variety of sources: - consumer experience and comments, preferably followed up by professionals (e.g. bronchial provocation tests, skin prick tests and measurements of specific IgE serum levels) - records of workers’ experience, accidents, and exposure studies including medical surveillance - case reports in the general scientific and medical literature - consumer tests (monitoring by questionnaire and/or medical surveillance) - epidemiological studies

Evaluation of available information Although human studies may provide some information on respiratory hypersensitivity, the data are frequently limited and subject to the same constraints as human skin sensitisation data. For evaluation purposes, existing human experience data for respiratory sensitisation should contain sufficient information about: - the test protocol used (study design, controls) - the substance or preparation studied (should be the main, and ideally, the only substance or preparation present which may possess the hazard under investigation) - the extent of exposure (magnitude, frequency and duration)

- the frequency of effects (versus number of persons exposed) - the persistence or absence of health effects (objective description and evaluation) - the presence of confounding factors (e.g. pre-existing respiratory health effects, medication; presence of other respiratory sensitizers) - the relevance with respect to the group size, statistics, documentation - the healthy worker effect Evidence of respiratory sensitising activity derived from diagnostic testing may reflect the induction of respiratory sensitisation to that substance or cross-reaction with a chemically very similar substance. In both situations, the normal conclusion would be that this provides positive evidence for the respiratory sensitising activity of the chemical used in the diagnostic test. For respiratory sensitisation, no clinical test protocols for experimental studies exist but tests may have been conducted for diagnostic purposes, e.g. bronchial provocation test. The test should meet the above general criteria, e.g. be conducted according to a relevant design including appropriate controls, address confounding factors such as medication, smoking or exposure to other substances, etc. Furthermore, the differentiation between the symptoms of respiratory irritancy and allergy can be very difficult. Thus, expert judgment is required to determine the usefulness of such data for the evaluation on a case-by-case basis. Although predictive models are under validation, there is as yet no internationally recognized animal method for identification of respiratory sensitisation. Thus human data are usually evidence for hazard identification. Where there is evidence that significant occupational inhalation exposure to a chemical has not resulted in the development of respiratory allergy, or related symptoms, then it may be possible to draw the conclusion that the chemical lacks the potential for sensitisation of the respiratory tract. Thus, for instance, where there is evidence that a large cohort of subjects have had opportunity for regular inhalation exposure to a chemical for a sustained period of time in the absence of respiratory symptoms, or related health complaints, then this will provide reassurance regarding the absence of a respiratory sensitisation hazard.

Concluding on Dose response assessment

(no further useful info in this section; in R.8.8.10, where the following can be read:)

Potency categorisation and identification of the typical dose descriptor Skin sensitisation Human data: Human data will normally take preference over animal data, although the reliability of such data should be carefully assessed, particularly if derived from old studies, and may be rejected in favour of animal data. Lack of positive findings in humans should not normally overrule positive and good quality animal data. Human data on induction threshold s is normally not available and testing for induction of sensitisation in humans is no longer conducted on ethical grounds. However, there may be data available from historical predictive testing to inform on potency, a threshold/NOAEL or a LOAEL which should be considered on a case- by-case basis. Due to standard exposure conditions, data from historical predictive tests (e.g. human repeat

Version 2 – .. May 2008 14 Working group on HD DNEL/DMEL derivation insult patch test (HRIPT) and human maximisation test (HMT)) may provide information on potency for induction. Thresholds from reliable historical human predictive tests can be used in combination with the LLNA data in a weight of evidence (WoE) approach to set a NOAEL/LOAEL for induction of sensitisation (12, 13, 14, 15, 19, 20). The NOAEL/LOAEL from human predictive tests should be calculated from the concentration of the substance tested, the patch size and the application volume and should be expressed as function of dose per unit skin area. The NOAEL from such tests should be the dose at which no sensitisation in the exposed people has occurred, while the LOAEL sometimes has been proposed to be the dose at which 5% (or < 8%) of the exposed people were sensitised.14

As already mentioned above, new experimental testing for hazard identification in humans, including HRIPT and HMT, is not acceptable for ethical reasons, therefore historical information from this type of studies will be available for a limited number of chemicals. Furthermore, the quality/reliability of the results from these studies should be carefully checked in particular in relation to the number of people tested (22).15

Potency information if available can be used in qualitative assessment and for recommendation of appropriate RMMs (See also Section E.3.4).

Testing humans with pre-existing contact allergy to determine sensitisation to a particular substance is done extensively as part of clinical examinations. Evidence of skin sensitising activity derived from such tests demonstrates the (previous) induction of skin sensitisation to that substance or cross-reaction with a chemically very similar substance. Moreover, clinical examinations are usually not designed to determine elicitation thresholds, as the dose used is the one giving response in the majority of sensitised subjects. However, clinical data should be used for qualitative assessment. The potency of a chemical could be evaluated by comparison of the incidence of skin sensitization in the human population with the exposure situation, if known. For example, if for a certain substance a high incidence of contact allergy is observed in an exposed population in relation to relatively low degree of exposure, this could be considered as an indication that the substance is a strong sensitizer while in cases where a low incidence of contact allergy among exposed individuals in relation to high degree of exposure would be observed, this would be an indication that a substance is a weaker sensitizer (21). However, normally exposure is not sufficiently well defined and positive findings in epidemiological studies (population based studies, data from contact dermatitis patients, studies/data from occupational groups and outbreak population studies) can only provide evidence sufficient for hazard assessment. Potency of induction cannot be directly derived from human elicitation threshold data from diagnostic clinical studies (e.g. patch test dose-response data, Repeated Open Application Test (ROAT)), however, a low elicitation threshold could indicate high potency and vice versa (21).

14 These "human LOAEL threshold values" are taken from two different publications (7, 8). 15 For the HRIPT a large number of people are required in each test, to reduce the 95% confidence interval for the test result (22). In different publications, different acceptable number of people tested in HRIPT can be found (e.g. 100 (19) or 150 – 200 (22))

Application of assessment factors.. See notes on page 126 of R.8 for AFs for matrix effects, and different exposure durations

Respiratory sensitisation Currently available methods do not allow determination of threshold and establishment of a DNEL. Therefore for substances classified as respiratory sensitizers only qualitative assessment as described in Section E.3.4 can be performed.

Acute toxicity (section 7.4)

Information & its sources Acute toxicity data on humans may be available from: - Epidemiological data identifying hazardous properties and dose-response relationships; - Routine data collection, poisons data, adverse event notification schemes, coroner’s report; - Biological monitoring/personal sampling; - Human kinetic studies – observational clinical studies; - Published and unpublished industry studies; - National poisoning centres. The main obstacles to the use of human data are their limited availability and often limited information on levels of exposure (ECETOC, 2004).

Evaluation of available information When available, epidemiological studies, case reports, information from medical surveillance or volunteer studies may be crucial for acute toxicity and can provide evidence of effects that are undetectable in animal studies (e.g. symptoms like nausea or headache). Nevertheless, the conduct of human studies is not recommended. Such data could also be useful to identify particular sensitive sub-populations like new born, children, patients with diseases (in particular with chronic respiratory, e. g. asthma, BPOC). Additional guidance should be provided on the reliability and the relevance of human studies because there are no standardised guidelines for such studies (except for odour threshold determination) and these are not usually conducted according to GLP. Such guidance is provided in Section 4.3.3.

Concluding on Dose response assessment If human data on acute toxicity is available, it is unlikely that this will be derived from carefully controlled studies or from a significant number of individuals. In this situation, it may not be appropriate to determine a DNEL from this data alone, but the information should certainly be considered in the WoE and may be used to confirm the validity of animal data. In addition, human data should be used in the risk assessment process to be able to determine DNEL for particular sensitive sub-populations like new-born, children or those in poor health (patients). More extensive guidance on the setting of DNELs for acute toxicity, see Section R.8, Appendix 8-8

Section R.8, Appendix 8-8: Identification of the typical dose descriptor Human evidence, such as epidemiological studies, case reports of poisoning or episodes of acute toxicity at

Version 2 – .. May 2008 17 Working group on HD DNEL/DMEL derivation work, or information from medical surveillance, can be very important for the assessment of acute toxicity and can provide evidence of effects that are undetectable in animal studies, for instance induction of symptoms such as headaches and nausea. There may be case-reports of human poisoning incidents, which are usually single-exposure events, either deliberate ingestion or during incidents/accidents. The reliability of exposure assessments in such reports needs careful consideration as there is often substantial uncertainty, but these data may give valuable information on the acute toxicity in humans, allow the identification of human NOAEC(L) or LOAEC(L) values and give some indication of the relative sensitivity between humans and animals.

In addition to acute systemic effects, some substances may cause local effects on the respiratory tract following a single exposure via the inhalation route. Acute local effects on the respiratory tract could be due to either or both of two different toxicological phenomena: sensory irritation or cytotoxicity/tissue damage. Only the derivation of a DNEL for acute cytotoxicity on the respiratory tract will be dealt with under this endpoint. The derivation of a DNEL for sensory irritation will be dealt with the endpoint of respiratory tract irritation. For acute cytotoxicity on the respiratory tract, the severity of the local effects is usually proportional to the concentration/dose level; in such a situation, therefore it may be possible to identify a NOAEC or LOAEC for these effects from pathology or clinical observations from either animal studies or human data.

Modification of the dose descriptor It should be noted that, as the reference period of the acute toxicity DNEL (e.g. 15 minutes) is likely to differ from the exposure duration in the experimental (animal or human) study from which the N(L)OAEC or LC(D)50 was identified, the derivation of such a DNEL, particularly for the inhalation route of exposure, might involve time scaling. Before correcting the starting point to account for time extrapolation, on a case-by-case basis it should be judged whether this is appropriate. For example, if the relevant effect is deemed to be more concentration rather than dose dependent (which is not always the default position to take for local cytotoxic effects), then the duration of exposure is likely to be of little consequence, and hence, time extrapolation would be inappropriate.

If time extrapolation is considered valid, then the most appropriate approach is to make use of the modified Haber’s law (Cn x t = k, where ‘C’ is the concentration, ‘n’ is a regression coefficient, ‘t’ is the exposure time and ‘k’ is a constant) according to which the relationship between exposure concentration and exposure duration for a specific effect is exponential. In order to estimate the value of the exponent n, empirical exposure concentration-exposure duration relationships for the relevant effect, which require the availability of good quality studies with several exposure durations, need to be established. In the absence of suitable data for deriving n, a default value of n=1 for extrapolating from shorter to longer exposure durations and a default value of n=3 for extrapolating from longer to shorter exposure durations should be used as these values lead to the most conservative estimates. These defaults are consistent with those laid out in US guidance on setting emergency standards for major accident hazards (US NRC, 2001) and are based on the observation that n lies in a range of 1 to 3 from an analysis of approximately 20 structurally diverse chemicals with established concentration-time relationships for lethality (ten Berge et al., 1986).

Comment to a Figure (R.8-5) in this appendix: Box 2 In some cases there may exist human data (e.g., occupational experience of CNS depression, epidemiology, case studies, or reports from poison centres) on the toxicity of the substance that will allow setting a DNEL. These data are generally surrounded by high uncertainties, for instance because of unclear exposure-situations or co-exposure to other chemicals. However, after a case-by-case evaluation of the data, they can sometimes still be used. In a first step, the dose descriptor may need to be time-scaled (by the modified Haber's law). In a second step, an AF for intra-species variation is usually needed, when setting the DNELacute.

Repeated dose toxicity (section 7.5)

Information & its sources Human data adequate to serve as the sole basis for the hazard and dose-response assessment are rare. When available, reliable and relevant human data are preferable over animal data and can contribute to the overall Weight of Evidence. However, human volunteer studies are not recommended due to practical and ethical considerations involved in deliberate exposure of individuals to chemicals. The following types of human data may already be available, however: - Analytical epidemiology studies on exposed populations. These data may be useful for identifying a relationship between human exposure and effects such as biological effect markers, early signs of chronic effects, disease occurrence, or long-term specific mortality risks. Study designs include case control studies, cohort studies and cross-sectional studies. - Descriptive or correlation epidemiology studies. They examine differences in disease rates among human populations in relation to age, gender, race, and differences in temporal or environmental conditions. These studies may be useful for identifying priority areas for further research but not for dose-response information. - Case reports describe a particular effect in an individual or a group of individuals exposed to a substance. Generally case reports are of limited value for hazard identification, especially if the exposure represents single exposures, abuse or misuse of certain substances. - Controlled studies in human volunteers. These studies, including low exposure toxicokinetic studies, might also be of use in risk assessment. Meta-analysis. In this type of study data from multiple studies are combined and analysed in one overall assessment of the relative risk or dose-response curve.

Evaluation of available information Human data in the form of epidemiological studies or case reports can contribute to the hazard identification process as well as to the risk assessment process itself. Criteria for assessing the adequacy of epidemiology studies include an adequate research design, the proper selection and characterisation of the exposed and control groups, adequate characterisation of exposure, sufficient length of follow-up for the disease as an effect of the exposure to develop, valid ascertainment of effect, proper consideration of bias and confounding factors, proper statistical analysis and a reasonable statistical power to detect an effect. These types of criteria have been described in more detail (Swaen, 2006 and can be derived from Epidemiology Textbooks (Checkoway et al, 1989; Hernberg, 1991; Rothman, 1998). The results from human experimental studies are often limited by a number of factors, such as a relatively small number of subjects, short duration of exposure, and low dose levels resulting in poor sensitivity in detecting effects. In relation to hazard identification, the relative lack of sensitivity of human data may cause particular difficulty. Therefore, negative human data cannot be used to override the positive findings in animals, unless it has been demonstrated that the mode of action of a certain toxic response observed in animals is not relevant for

Version 2 – .. May 2008 20 Working group on HD DNEL/DMEL derivation humans. In such a case a full justification is required. It is emphasised that testing with human volunteers is strongly discouraged, but when there are good quality data already available they can be used in the overall Weight of Evidence.

Concluding on Dose response assessment There is no concrete guidance here for this endpoint, neither is a specific Appendix available in chapter R.8.

Reproductive and developmental toxicity (section 7.6)

Information & its sources Epidemiological studies, conducted in the general population or in occupational cohorts, may provide information on possible associations between exposure to a chemical and adverse effects on reproduction. Clinical data and case reports (e.g. biomonitoring after accidental substance release) may also be available.

Evaluation of available information Epidemiological data require a detailed critical appraisal that includes an assessment of the adequacy of controls, the quality of the health effects and exposure assessments, and of the influence of bias and confounding factors. Epidemiological studies, case reports and clinical data may provide sufficient hazard and dose-response evidence for classification1 of chemicals as reproductive toxicants in Category 1 and for risk assessment, including the identification of a N(L)OAEL. In such cases, there will normally not be a need to test the chemical. However, convincing human evidence of reproductive toxicity for a specific chemical is rarely available because it is often impossible to identify a population suitable for study that is exposed only to the chemical of interest. Human data may provide limited evidence of reproductive toxicity that indicates a need for further studies of the chemical; the test method selected should be based on the potential effect suspected. When evidence of a reproductive hazard has been derived from animal studies it is unlikely that the absence of evidence of this hazard in an exposed human population will negate the concerns raised by the animal model. This is because there will usually be methodological and statistical limitations to the human data. For example, statistical power calculations indicate that a prospective study with well-defined exposure during the first trimester with 300 pregnancies could identify only those developmental toxins that caused at least a 10-fold increase in the overall frequency of malformations; a study with around 1000 pregnancies would have power to identify only those developmental toxins that caused at least a 2-fold increase (EMEA/CHMP Guideline, 2006). Extensive, high quality and preferable prospective, data are necessary to support a conclusion that there is no risk from exposure to the chemical.

Concluding on Dose response assessment There is neither concrete guidance here on HD for this endpoint, nor in a specific Appendix for this endpoint in chapter R.8.

1 The current system will be replaced with Guidance on Classification, Packaging and Labelling based on the Globally Harmonized System of Classification and Labelling of Chemicals (GHS) see Section R.7.1 .

Mutagenicity (section 7.7)

Information & its sources Occasionally, studies of genotoxic effects in humans exposed by, for example, accident, occupation or participation in clinical studies (e.g. from case reports or epidemiological studies) may be available. Generally, cells circulating in blood are investigated for the occurrence of various types of genetic alterations.

Evaluation of available information Human data have to be assessed carefully on a case-by-case basis. The interpretation of such data requires considerable expertise. Attention should be paid especially to the adequacy of the exposure information, confounding factors, co-exposures and to sources of bias in the study design or incident. The statistical power of the test may also be considered.

Concluding on Dose response assessment This is not so much an issue for this endpoint. For quantitative mutagenic risks the reader is referred to the chapter on carcinogenicity.

Carcinogenicity (section 7.7)

Information & its sources Human data may provide direct information on the potential carcinogenicity of the substance. Relevant human data of sufficient quality, if available, are preferable to animal data as no extrapolations between species, or from high to low dose are necessary. Epidemiological data will not normally be available for new substances but may well be available for substances that have been in use for many decades. For substances in common use prior to the implementation of modern occupational hygiene measures, the intensity of human exposures to some carcinogens was sufficient to produce highly significant, dose-dependent increases in cancer incidence. A number of basic epidemiological study designs exist and include cohort, case-control and registry based correlational (e.g. ecological) studies. The most definitive epidemiological studies on chemical carcinogenesis are generally cohort studies of occupationally exposed populations, and less frequently the general population. Cohort studies evaluate groups of initially healthy individuals with known exposure to a given substance and follow the development of cancer incidence or mortality over time. With adequate information regarding the intensity of exposure experienced by individuals, dose dependent relationships with cancer incidence or mortality in the overall cohort can be established. Case-control studies retrospectively investigate individuals who develop a certain type of cancer and compare their chemical exposure to that of individuals who did not develop disease. Case control studies are frequently nested within the conduct of cohort studies and can help increase the precision with which excess cancer can be associated with a given substance. Correlational or ecological studies evaluate cancer incidence/mortality in groups of individuals presumed to have exposure to a given substance but are generally less precise since measures of the exposure experienced by individuals are not available. Observations of cancer clusters and case reports of rare tumours may also provide useful supporting information in some instances but are more often the impetus for the conduct of more formal and rigorous cohort studies. Besides the identification of carcinogens, epidemiological studies may also provide information on actual exposures in representative (or historical) workplaces and/or the environment and the associated dose- response for cancer induction. Such information can be of much value for risk characterisation. Although instrumental in the identification of known human carcinogens, epidemiology studies are often limited in their sensitivity by a number of technical factors. The extent and/or quality of information that is available regarding exposure history (e.g. measurements of individual exposure) or other determinants of health status within a cohort is often limited. Given the long latency between exposure to a carcinogen and the onset of clinical disease, robust estimates of carcinogenic potency can be difficult to generate. Similarly, occupational and environmentally exposed cohorts often have co-exposures to carcinogenic substances that have not been documented (or are incompletely documented). This can be particularly problematic in the study of long established industry sectors (e.g. base metal production) now known to entail co-exposures to known carcinogens (e.g. arsenic) present as trace contaminants in the raw materials being processed.. Retrospective hygiene and exposure analyses for such sectors are often capable of estimating exposure to the principle materials being produced, but data documenting critical co-exposures to trace contaminants may not be available. Increased cancer risk may be observed in such settings, but the source of the increased risk can be difficult to determine. Finally, a variety of lifestyle confounders (smoking and drinking habits, dietary patterns and ethnicity) influence the incidence of cancer but are often inadequately documented for purposes of

Version 2 – .. May 2008 24 Working group on HD DNEL/DMEL derivation adequate confounder control. Thus, modest increases in cancer at tissue sites known to be impacted by confounders (e.g. lung and stomach) can be difficult to interpret. Techniques for biomonitoring and molecular epidemiology are developing rapidly. These newly developed tools promise to provide information on biomarkers of individual susceptibility, critical target organ exposures and whether effects occur at low exposure levels. Such ancillary information may begin to assist in the interpretation of epidemiology study outcomes and the definition of dose response relationships. For example, monitoring the formation of chemical adducts in haemoglobin molecules (Birner et al., 1990; Albertini et al., 2006), the urinary excretion of damaged DNA bases (Chen, H.J. and Chiu, W.L. (2005), and the induction of genotoxicity biomarkers (micronuclei or chromosome aberrations; Boffetta et al., 2007) are presently being evaluated and/or validated for use in conjunction with classical epidemiological study designs. Such data are usually restricted in their application to specific chemical substances but such techniques may ultimately become more widely used, particularly when combined with animal data that defines potential mechanisms of action and associated biomarkers that may be indicative of carcinogenic risk. Monitoring of the molecular events that underlie the carcinogenic process may also facilitate the refinement of dose response relationships and may ultimately serve as early indicators of potential cancer risk. However, as a generalisation, such biomonitoring tools have yet to demonstrate the sensitivity requisite for routine use.

Evaluation of available information Epidemiological data may potentially be used for hazard identification, exposure estimation, dose response analysis, and risk assessment. The degree of reliability for each study on the carcinogenic potential of a substance should be evaluated using accepted causality criteria, such as that of Hill (1965). Particular attention should be given to exposure data in a study and to the choice of the control population. Often a significant level of uncertainty exists around identifying a substance unequivocally as being carcinogenic because of inadequate reporting of exposure data. Chance, bias and confounding factors can frequently not be ruled out. A clear identification of the substance, the presence or absence of concurrent exposures to other substances and the methods used for assessing the relevant dose levels should be explicitly documented. A series of studies revealing similar excesses of the same tumour type, even if not statistically significant, may suggest a positive association, and an appropriate joint evaluation (meta-analysis) may be used in order to increase the sensitivity, provided the studies are sufficiently similar for such an evaluation. When the results of different studies are inconsistent, possible explanations should be sought and the various studies judged on the basis of the methods employed. Interpretation of epidemiology studies must be undertaken with care and include an assessment of the adequacy of exposure classification, the size of the study cohort relative to the expected frequency of tumours at tissue sites of special concern and whether basic elements of study design are appropriate (e.g. a mortality study will have limited sensitivity if the cancer induced has a high rate of successful treatment). A number of such factors can limit the sensitivity of a given study – unequivocal demonstration that a substance is not a human carcinogen is difficult and requires detailed and exact measurements of exposure, appropriate cohort size, adequate intensity and duration of exposure, sufficient follow-up time and sound procedures for detection and diagnosis of cancers of potential concern. Conversely, excess cancer risk in a given study can also be difficult to interpret if relevant co-exposures and confounders have not been adequately documented. Efforts are ongoing to improve the sensitivity and specificity of traditional epidemiological methods by combining

Version 2 – .. May 2008 25 Working group on HD DNEL/DMEL derivation cancer endpoints with data on established pre-neoplastic lesions or molecular indicators (biomarkers) of cancer risk. Once identified as a carcinogenic substance on the basis of human data, well-performed epidemiology studies may be valuable for providing information on the relative sensitivity of humans as compared to animals, and/or may be useful in demonstrating an upper bound on the human cancer risk. Identification of the underlying mode(s) of action – needed for the subsequent risk assessment (see Section Error: Reference source not found) – quite often depends critically on available testing and/or non-testing information.

Concluding on Dose response assessment

(only text referring to HD:) Though mainly derived from animal data, epidemiological data may also occasionally provide dose descriptors that allow derivation of a DNEL or DMEL, e.g. Relative Risk (RR) or Odds Ratio (OR).

Part II

The next pages give guidance on applying HD for DNEL and DMEL derivation with the aim of inclusion into the Ref TGD as currently endorsed by March 08 CA meeting.

It consists of two parts:

Part A describes the present contents of R.8 (of RefTGD), and a proposed contents with HD paragraphs (and appendices) inserted.

Part B gives the proposed Guidance for inserted red paragraph & appendices.

Part A Overview of contents of R.8 (of RefTGD with intended locations of proposed Guidance text for HD

GUIDANCE ON INFORMATION REQUIREMENTS AND CHEMICAL SAFETY ASSESSMENT

Chapter R.8: Characterisation of dose/concentration-response for human health

PRESENT CONTENTS R.8.1 Introduction R.8.1.1 Overview of legislative requirements R.8.1.2 Overview of aspects to be considered in derivation of DNEL(s) / DMEL(s) R.8.1.3 Overview of DNEL/DMEL-derivation and selection of the critical DNEL(s)/DMEL and/or other measures of potency R.8.2 Step 1: Gather typical dose descriptors (e.g. N(L)OAEL, BMD, LD50, LC50, OR, RR, T25, BMD(L)10....) from all available studies on the different human health endpoints ...... R.8.2.1 Derivation of typical dose descriptor for acute toxicity, irritation/corrosion, skin sensitisation, and reproductive toxicity R.8.3 Step 2: Decide on mode of action (threshold or non-threshold) and which next step(s) to choose ...... R.8.4 Step 3-1: Derive DNEL(s) for threshold endpoints ...... R.8.4.1. a) Select the relevant dose-descriptor(s) for the endpoint concerned, i.e. LD50, LC50, N(L)OAEL, BMD, … R.8.4.2 b) modify, when necessary, the relevant dose descriptor(s) per endpoint to the correct starting point R.8.4.3 c) apply, when necessary, assessment factors to the correct starting point to obtain endpointspecific DNEL(s) for the relevant exposure pattern (duration, frequency, route and exposed human population). R.8.5 Step 3-2: If possible, derive DMEL(s) for non-threshold endpoints ...... R.8.5.1 Deriving a DMEL for a non-threshold carcinogen, with adequate human cancer data ...... R.8.5.2 Deriving a DMEL for a non-threshold carcinogen, with adequate animal cancer data ...... R.8.5.3 Deriving a DMEL for a non-threshold carcinogen/mutagen, without adequate substancespecific cancer data R.8.6 Step 3-3: Follow qualitative approach when no dose descriptor is available for an endpoint R.8.7 Step 4: Select the leading health effect(s) and the corresponding DNEL/DMEL and/or other qualitative/semi- quantitative description R.8.7.1 Selection of the critical DN(M)EL ...... R.8.7.2 Endpoints for which no DNEL/DMEL can be derived ...... R.8.7.3 Using DN(M)EL for human exposure scenarios ......

APPENDICIES Appendix R.8-1 ......

Appendix R.8-2 Bioavailability, route-to-route extrapolation and allometric scaling ...... Appendix R.8-3 Assessment factors suggested from different research groups and regulatory bodies Appendix R.8-4 PBPK Modelling and the derivation of DNELs/DMELs ...... Appendix R.8-6 Animal dose descriptors for non-threshold carcinogenic responses ...... Appendix R.8-7: Derivation of a DMEL for Non-Threshold Carcinogens: Comparison of the “linearised” and the “large assessment factor” approach Appendix R.8-8 Specific guidance for ACUTE TOXICITY ...... Appendix R.8-9 Skin and eye irritation/corrosion and respiratory irritation ...... Appendix R.8-10 Skin sensitisation...... Appendix R.8-11 Respiratory sensitisation ...... Appendix R.8-12 Reproductive toxicity ...... Appendix R.8-13 What to do when deriving DNELs, when a community or a national occupational exposure limit (OEL) is available ...... Appendix R.8-14 Evaluating carcinogenicity risk levels; a review of decision points that are used or have been discussed in some different countries, organizations, and committees.

PROPOSED CONTENTS R.8.1 Introduction R.8.1.1 Overview of legislative requirements R.8.1.2 Overview of aspects to be considered in derivation of DNEL(s) / DMEL(s) R.8.1.2.8 Human Data as source for derivation of DNEL and/or DMEL (or in R.8.2.2) R.8.1.3 Overview of DNEL/DMEL-derivation and selection of the critical DNEL(s)/DMEL and/or other measures of potency ...... R.8.2 Step 1: Gather typical dose descriptors (e.g. N(L)OAEL, BMD, LD50, LC50, OR, RR, T25, BMD(L)10....) from all available studies on the different human health endpoints ...... R.8.2.1 Derivation of typical dose descriptor for acute toxicity, irritation/corrosion, skin sensitisation, and reproductive toxicity ...... R.8.2.2 Human Data as source for derivation of DNEL and/or DMEL (or in R.8.1.2.8) R.8.3 Step 2: Decide on mode of action (threshold or non-threshold) and which next step(s) to choose ...... R.8.4 Step 3-1: Derive DNEL(s) for threshold endpoints ...... R.8.4.1 Derive DNEL(s) for threshold endpoints, with adequate Human Data (note: both long-term and acute!) R.8.4.1.1 a) Select the relevant dose-descriptor(s) for the endpoint concerned, i.e. LD50, LC50, N(L)OAEL, BMD, …...... R.8.4.1.2 b) modify, when necessary, the relevant dose descriptor(s) per endpoint to the correct starting point R.8.4.1.3 c) apply, when necessary, assessment factors to the correct starting point to obtain endpointspecific DNEL(s) for the relevant exposure pattern (duration, frequency, route and exposed human population). R.8.4.2 Derive DNEL(s) for threshold endpoints, with adequate Animal Data a), b), and c) etc R.8.4.3 Derive DNEL(s) for threshold endpoints, based on Human and Animal Data R.8.5 Step 3-2: If possible, derive DMEL(s) for non-threshold endpoints ...... R.8.5.1 Deriving a DMEL for a non-threshold carcinogen, with adequate human cancer data ...... R.8.5.2 Deriving a DMEL for a non-threshold carcinogen, with adequate animal cancer data ...... R.8.5.3 Deriving a DMEL for a non-threshold carcinogen, based on Human and animal cancer data R.8.5.4 Deriving a DMEL for a non-threshold carcinogen/mutagen, without adequate substancespecific cancer data R.8.6 Step 3-3: Follow qualitative approach when no dose descriptor is available for an endpoint R.8.7 Step 4: Select the leading health effect(s) and the corresponding DNEL/DMEL and/or other qualitative/semi- quantitative description R.8.7.1 Selection of the critical DN(M)EL ...... R.8.7.2 Endpoints for which no DNEL/DMEL can be derived ...... R.8.7.3 Using DN(M)EL for human exposure scenarios ......

APPENDICIES Appendix R.8-1 ......

Appendix R.8-2 Bioavailability, route-to-route extrapolation and allometric scaling ...... Appendix R.8-3 Assessment factors suggested from different research groups and regulatory bodies Appendix R.8-4 PBPK Modelling and the derivation of DNELs/DMELs ...... Appendix R.8-6 Animal dose descriptors for non-threshold carcinogenic responses ...... Appendix R.8-7: Derivation of a DMEL for Non-Threshold Carcinogens: Comparison of the “linearised” and the “large assessment factor” approach Appendix R.8-8 Specific guidance for ACUTE TOXICITY ...... Appendix R.8-9 Skin and eye irritation/corrosion and respiratory irritation ...... Appendix R.8-10 Skin sensitisation...... Appendix R.8-11 Respiratory sensitisation ...... Appendix R.8-12 Reproductive toxicity ...... Appendix R.8-13 What to do when deriving DNELs, when a community or a national occupational exposure limit (OEL) is available ...... Appendix R.8-14 Evaluating carcinogenicity risk levels; a review of decision points that are used or have been discussed in some different countries, organizations, and committees. Appendix R.8-15 How dose descriptors for DNEL and DMEL derivation can be obtained from human data Appendix R.8-16 An integrative approach to combining Human Data and Animal data for DNEL and DMEL derivation (adaptation from ECETOC report)

Part B Proposed Guidance for indicated paragraph & appendices

R.8.1.2.8 or 8.2.2

Human data as source for derivation of DNEL and/or DMEL

Since DNELs and DMELs apply to humans the most appropriate basis for their derivations are human data. Human data differ from animal data in that they are mostly derived from observational (non experimental) studies in contrast to strictly controlled experimental animal studies. This implies that the process to arrive at a dose descriptor is different for human and animal data, although the steps are globally the same; Appendix R.8- 15 clearly outlines the steps needed to arrive dose descriptors for human data. Though the guidance acknowledges the difference between a DNELcaute and a DNELlong term based on human data, derivation of the respective dose descriptor is hardly different (apart from the duration of exposure criterium). The term “dose descriptor” is used to designate the exposure level (dose) that corresponds to a quantified health effect or quantified level of risk of a health effect in a specific study or combination of data from multiple studies. In animal studies common dose descriptors for threshold chemicals are NOAEL (No Observed Adverse Effect Level) or LOAEL (Lowest Observed Adverse Effect Level), while examples of dose descriptors of non- threshold chemicals are TD25 (), ED10 (), BMD10 (). Though for epidemiological data, dose descriptors for threshold chemicals are NOAEL or LOAEL (or NOAEC etc.) as well, dose descriptors of non-threshold chemicals usually are a Relative Risk (RR) or comparable measures such as Standardised Mortality Ratio (SMR) or Standardized Incidence Ratio (SIR), for a given exposure contrast; relationships between these risk ratios is described in Appendix R.8-15, paragraph 5.

The process leading to the identification of the key health effect and the associated exposures (dose descriptors) is described in Appendix R.8-15.

For many chemicals human data as well as animal data are available and an integrated approach is required for DNEL derivation. In this approach the critical criterium is the quality of the available data. The data source providing the best quality information should form the basis for DNEL derivation. ECETOC has developed a framework in which human data and animal data are integrated and which builds on quality criteria for human and animal data (see Appendix R.8-15, paragraph 3). In the situation that the human data are of similar quality as animal data, it is decided that human data should take precedence as the default.

R.8.4.1 Derive DNEL(s) for threshold endpoints, with adequate human data

Human data differ from animal data in that they are mostly derived from observational (non experimental) studies in contrast to strictly controlled experimental animal studies. This implies that the process to arrive at a dose descriptor is different for human and animal data, although the steps are globally the same; Appendix R.8- 15 clearly outlines the steps needed to arrive dose descriptors for human data. The second crucial difference between human data and animal data is that the former is intrinsically relevant for humans.

a) selection of relevant endpoint and dose descriptors

Provided there are sufficient human data on the health endpoints of a certain chemical, either establishing an acute or a long-term health effect, the hazard identification process, following the steps 1 through 7 (see Appendix R.8-15), will result in a description of the reported endpoints in combination with a dose descriptor.

For a chemical there can be multiple reported endpoints in combinations with multiple dose descriptors. These can best be summarized in the form of a table containing the dose descriptors for each of the reported health effects. The table will include dose descriptors for long term endpoints as well as for acute endpoints. A hypothetical example is provided in Appendix R.8-15, paragraph 4.

b) Modify, when necessary, the most relevant dose descriptors to the correct starting point

In a few situations, the effects assessment condition is not directly comparable to the exposure assessment condition in terms of exposure route, units and/or dimensions. In these situations, it is necessary to convert the dose descriptor into a correct starting point (i.e. corrected RR). This applies to the following situations:

1. If epidemiological data derive from another exposure route than the route to which the risk assessment has to be applied, a route-to-route extrapolation is necessary.

2. Differences in exposure conditions between the source population and the target population, e.g. differences in respiratory volumes, or intermittent versus continuous exposures etc.

It should be noted that modification is not appropriate in cases where human exposure is evaluated based on biological monitoring data. In such cases (availability of valid biomonitoring data), the calculation of DNEL values can be straightforward if studies in humans are available which relate the effect directly or indirectly to the biomonitoring metric.

Ad 1.

If no adequate effect data are available on the relevant route of exposure for the population under consideration, route-to-route extrapolation might be an alternative, however only for systemic effects, not for local effects (e.g. irritation of the lungs following inhalation of a substance).

Even for systemic effects route-to-route extrapolation is considered appropriate only under certain conditions (e.g. no first pass effects). Guidance on route-to-route extrapolation of toxicity data when assessing health risks of chemicals has for example been produced by IGHRC (2006). When route to-route extrapolation is considered appropriate, corrections should be made for differences in kinetics and metabolism. In general, it is difficult to quantify differences in metabolism, excretion and distribution, so in practice only differences between the different routes as determined by the percentages of absorption into the systemic circulation can be accounted for.

It is to be noted that route-to-route extrapolation is associated with a high degree of uncertainty and should be conducted with caution relying on expert judgment.

Default absorption values have been proposed for the different routes of exposure (see Section R.7.12. on toxicokinetics), but substance-specific data on absorption via the different routes are to be preferred. Such information may for instance be generated based on considerations of the chemical structure.

In the absence of these data for both the starting route and the end route (the route to which the extrapolation is being made), worst case assumptions have to be made. Worst case in this context will be obtained assuming a limited absorption for the starting route, leading to a low (conservative) internal NOAEL. To secure a conservative external NOAEL a maximum absorption should there after be assumed for the end route, leading to a low external NOAEL. It is proposed, thus, in the absence of route-specific information on the starting route, to include a default factor of 2 (i.e. the absorption percentage for the starting route is half that of the end route) in the case of oral-to-inhalation extrapolation.

No default factor should be introduced (i.e. factor 1) in case of inhalation-to-oral extrapolation, because a two times higher oral compared to inhalation absorption appears on empirical grounds not justified.

On the assumption that, in general, dermal absorption will not be higher than oral absorption, no default factor (i.e. factor 1) should be introduced when performing oral-to-dermal extrapolation.

The other possible, but less usual, situations of route-to-route extrapolation (i.e. inhalation-to- dermal and vice versa) should be handled on a case-by-case basis.

Ad 2.

The exposure conditions for the source population may differ from that of target populations. For example, exposure for workers (assumed 8 hours per day), differs from that for humans exposed via the environment (assumed 24 hours per day), and consumers (assumed 1-24 hours per day, depending on exposure scenario). If the toxic effect is driven by the total (accumulated) dose, or depends on both total dose and the exposure

Version 2 – .. May 2008 35 Working group on HD DNEL/DMEL derivation concentration, concentration–time corrections (i.e. time scaling) have to be applied. Time scaling is not appropriate when the toxic effect is mainly driven by the exposure concentration (as for irritation). A useful tool for time scaling is the modified Haber’s law (Cn x t = k, where ‘C’ is the concentration, ‘n’ is a regression coefficient, ‘t’ is the exposure time and ‘k’ is a constant) (see Section R.7.4 and Appendix R.8-8 for further explanations). However, it should also be considered that an exposure duration based on 8 hours daily exposure also includes 16 hours daily recovery, whereas there is no recovery during continuous 24 hours exposure; corrections may be needed for continuous exposure. c) apply, when necessary, assessment factors to the correct starting point to obtain endpointspecific DNEL(s) for the relevant exposure pattern (duration, frequency, route and exposed human population)

Assessment factors

Several aspects are involved in the use of human data for DNEL derivation, in particular factors associated with intra-species variation and differences in exposure conditions. As is the case with experimental animal findings, there is no need to consider inter-species variation when using human data for DNEL development. Furthermore, in many instances, there will also not be a need to determine the relevance of the findings as available human experience data will be considered directly applicable and relevant for the wider population group to which they are being applied. Where human data are considered to be the most suitable starting point for the derivation for a DNEL, then a similar set of considerations apply to those applied to experimental data (section R8.4.3.1 .. check). These aspects will be discussed under the following headings;

 intraspecies differences;

 differences in duration of exposure;

 issues related to dose-response;

 quality of available human database

Intraspecies Differences

Humans differ in sensitivity to toxic insult due to a multitude of biological factors such as genetic polymorphism affecting e.g. toxicokinetics/metabolism, age, gender, health status and nutritional status. These differences can be the result of genetic and/or environmental influences. Provided suitable human study data are available that adequately address such considerations then there is no need to apply uncertainty factors, since the intraspecies differences intrinsically form a part of the human study database, provided the sample size is sufficiently large.

It is recognised that in order to protect the most sensitive person exposed to any chemical could require a large default assessment factor. Where representative (i.e. a study of sufficient sample size and based on a suitably heterogenic population group) worker data are available, then it is usually assumed that a default assessment

Version 2 – .. May 2008 36 Working group on HD DNEL/DMEL derivation factor of 3 is sufficient to protect the general population, including e.g. children and the elderly. For threshold effects, this factor of 3 is the standard procedure, as a default, when establishing exposure guidelines for the general population. If the study is based on the general population, then no additional assessment factor is required.

For some endpoints such as respiratory irritation (or eye irritation and similar acute effects), biotransformation is thought not to be relevant and hence does not need to be accounted for when evaluating intra-species differences. Such a consideration is also likely for some acute neurologic (CNS) effects e.g. benzene has the same neurological effects as other aromatics, because metabolism is not a relevant factor in the mechanism.

Duration of exposure

For human studies, there is generally no need to introduce a factor to account for differences in the duration of exposure for the population and scenario under consideration, unless there is evidence that cumulative exposure is the more biological exposure metric than exposure concentration. Provided human studies are properly conducted over a sufficient timescale (in the same order of magnitude as the exposure actually occurs), the nature of human observational studies accounts for whether the NOAEL for the effect of interest may decrease with increasing exposure times and whether other more serious adverse effects might appear over increasing exposure times.

If a reliable NOAEL for a chronic endpoint is available, this is the preferred starting point for a DNEL long-term and no assessment factor for duration extrapolation is needed. An NOAEL for an acute endpoint (NOAEL following short term exposure only) should not be used as the basis for the derivation of a DNEL long-term . If the study design is not sufficient to adequately address any latency of the observed effect, then these data should not be used for deriving a DNEL.

Dose-response relationship

Many human studies will provide (either by themselves or when seen in the context of other studies) an indication of the shape of the dose-response curve for the endpoint of interest. Unlike experimental animal data, however, the magnitude of response will be displayed not at discrete exposure concentrations, but by the ability of the study to characterise population responses within exposure categories.

The fact that effects are examined within an exposure range affects how any LOEL or NOEL can be determined from epidemiological studies. The size of any assessment factor should take into account the shape and slope of the dose-response curve (assuming this be can derived from the data source) and the extent and severity of the effect seen at the LOEL. An assessment factor of 2 as default is considered

Version 2 – .. May 2008 37 Working group on HD DNEL/DMEL derivation appropriate to account for the uncertainties that may be associated in determining the 'true' LOEL/NOEL. Where data indicates that a steep dose response curve exists, then a larger AF should be considered to account for the most substantial impact of any uncertainties in the identification of likely magnitude of the LOAEL/NOAEL. It must be kept in mind that in a dose response analysis by means of exposure ranges the LOAEL and NOAEL are located within the lowest range in which an adverse effect in the form of a higher disease incidence than background is observed. The lower boundary of the exposure range in which the first effect is observed therefore should be considered to be below the LOAEL and the NAEL.

When the starting point for the DNEL derivation is a NOAEL, the default assessment factor, as a standard procedure, is 1 (although it should be noted that in human studies the NOAEL is not observed per se, but rather is determined from an exposure range in which the first effect is seen). When the starting point for the DNEL calculation is a LOEL (rather than the NOEL), it is suggested to use an assessment factor of 2, although a higher AF will be necessary if the study design is insufficient to identify the lowest LOEL for a given effect.

Quality of the Database

An assessment factor on the quality of the whole database should, if justified, be applied to compensate for the potential remaining uncertainties in the derived DNEL.

Firstly, the evaluation should include an assessment whether the available human information are sufficient to address the endpoints consistent with the tonnage driven data requirements necessary to fulfil the REACH obligations, or whether the knowledge provided by the human information still presents significant data 'gaps' (when compared to the expected breadth of understanding implied by Annexes VII - X of the Regulation). Where no significant gaps are present (e.g. the human data adequately addresses chronic and/or acute effects), then no additional factor is necessary. Where the human data set is incomplete or of too poor quality, however, then either a DNEL should be established based upon other data sources and/or a larger AF should be applied.

If information from an occupational study is being applied to an occupational setting, then no additional assessment factor for intraspecies variation is required unless the study size is small with low discriminatory power, when a higher AF (e.g. 3) should be considered.

When there are deficiencies in the human studies considered crucial to provide useful information for establishing the dose descriptor, extra caution should be taken to address this scientific uncertainty in deriving the DNEL. Further, the assessor should consider the nature of the effect occurring in particular organ systems, as well as at different life stages. Where human data are inconsistent and not in concordance, then the data should not be used for DNEL derivation.

Secondly, the hazard data should be assessed for the reliability and consistency across different studies (including available animal data) and endpoints and taking into account the quality of the study protocol/methodology, size and power of the study design, biological plausibility, dose-response relationships and statistical association (adequacy of the database). Assuming that the human data are robust and of good quality, then no additional assessment factor should be applied. However, where the human data set is incomplete, then either a DNEL should be established based upon other data sources and/or a larger AF should be applied.

Nature of Assessment Factor1 Default value Interspecies - Not applicable – Intraspecies2 - worker to worker3 1 - worker to general 3 population4 - general population to 1 general population5 Duration of - sub/semi-chronic to 2 Exposure chronic - chronic to lifetime6 1 Dose-response - LOAEL / NOAEL extrapolation7 2 (issues related to reliability of - shape of the dose response 2 the dose- curve8 response) Quality of whole - issues related to completeness 1 database9 of the available data10

- issues related to the consistency 1 of the available data11 - issues related to reliability 1 of any alternative data12

Table R.8.? : Default Assessment Factors for HD based DNELs

1 The Table should not be viewed as the basis for the automatic application of Assessment Factors. Rather, it should be seen as an aid to help structure how HD can help to contribute to the process of DNEL development through a data-driven process. 2 For certain direct biologic effects, such as respiratory irritation, there is no need to consider the "pharmacokinentic" component of the intra-species uncertainty factor. 3 If HD study size is small with low discriminatory power, then a higher AF might be considered 4 Where effects are manifest over shorter time periods; and where it is comparatively straightforward to ascertain the relationship between any exposure and the observed effect e.g. allergy, nausea, irritations, etc., then a reduced AF may be appropriate 5 A higher AF may be warranted if the HD study size is small or homogenous when compared to the wider population 6 A higher AF would be appropriate if the study design may not be sufficient to adequately address any latency of the observed effect 7 A higher AF will be necessary if the study design is insufficient to identify the lowest LOAEL for a given effect. 8 Where data indicates that a steep dose response curve exists, then a higher AF should be considered to account for any uncertainties in the identification of likely magnitude of the LOAEL/NOAEL 9 The starting assumption is that only good quality human data will be used for the determination of DNELs. Further guidance on how such data should be evaluated can be found in ECETOC Report ???. 10 Where the human data set is incomplete, then either a DNEL should be established based upon other data sources and/or a higher AF should be applied. 11 Where HD data are inconsistent and not in concordance, then a higher AF may need to be applied to account for any inherent uncertainties. 12 Relates to the biological plausibility of the identified effect i.e. if other (animal or human data) indicate the effect may occur at lower exposure levels, then a higher AF may be justified.

Overall assessment factor and its application to the correct starting point

The overall assessment factor is obtained by simple multiplication of individual assessment factors discussed in the previous paragraphs. Care should be taken to avoid double counting several aspects when multiplying the individual factors.

Table R.8-? presents an overview of the individual default assessment factors (described above), which should

be used. These apply to both the development of a DNELlong-term and a DNELacute (assuming that a substance exhibits both types of effect).

R.8.4.3 Derive DNEL(s) for threshold endpoints, based on human and animal data

For many chemicals human data as well as animal data are available and an integrated approach is required for DNEL derivation. In this approach the critical criterium is the quality of the available data. The data source providing the best quality information should form the basis for DNEL derivation. ECETOC has developed a framework in which human data and animal data are integrated and which builds on quality criteria for human and animal data (see Appendix R.8-15, paragraph 3).

The quality of animal data can be assessed by applying the Klimisch criteria, which results in allocation into one of four categories: i.e. so-called Codes of Reliability: ‘Reliable without restrictions’, ‘Reliable with restrictions’, ‘Not reliable’, and ‘Not assignable’ (Klimisch et al, 1997). Although descriptions of how to assess the quality of human data (WHO 2000 and others) are available, the ECETOC document is the only report proposing a set of four quality categories for human data. Application of the Klimisch criteria for animal adata, and the ECETOC categorization for human data quality enables a categorization into relative quality of both human and animal data and the identification of the most suitable data to derive a dose descriptor from which an acute or long-term DNEL(s) can be derived. Thus, the overall framework involves 3 steps that consider the human risk question:

1. The assessment of the collective weight of evidence of the human data resulting is a weight of evidence score for the quality of human data, based on a five point scale (I-IV plus X), 2. The assessment of the collective weight of evidence of the animal data resulting in a weight of evidence score for the quality and relevance of animal data, on a five point scale (I-IV plus X) as well, and 3. Integration of the available evidence from human and animal sources. This approach is described in detail in Appendix R.8-16.

In the situation that the human data are of similar quality as animal data, it is decided that human data should take precedence as the default, because the relevance of human data for DNEL derivation is evident, while this is not as evident for animal data.

Even after having derived a DNEL from the most appropriate dose descriptor, either on the basis of an animal or human dataset, an assessment should be made to verify if the proposed DNEL protects from any other endpoint included in the dose descriptor table (see Appendix R.8-15, paragraph 4).

R.8.5.1 Deriving a DMEL for a non-threshold carcinogen, with adequate human cancer data (Note: previous version once reviewed by SEG III, March 07)

1. Introduction

Before a DMEL can be derived from a dose descriptor established for human data, the major part of the evaluation process of HD should already have been conducted. This first part is described in Appendix R.8-15.

Starting point for DMEL derivation is the dose descriptor for the most critical effect. The dose descriptor represents a Relative Risk (RR) or comparable measure (SMR or OR) at a quantified exposure contrast. In its most simple form, the dose descriptor (RR) represents the risk observed in an exposed population (with a specified average exposure level) compared to an unexposed one. Ideally, it represents the slope of the exposure-response function derived for the whole range of exposure levels observed in the studies and based on the pooled data from all available adequate studies by modelling.

Two quantitative risk assessment formats can be followed to derive DMELs: the ‘Linearised’ approach, or the ‘Large Assessment Factor’ approach. Both formats are based on the same principal elements of risk extrapolation or risk evaluation, using as dose-descriptor a risk estimate (a RR or a comparable measure such as an OR or a SMR). Because of different perceptions of the uncertainties involved in quantitative risk assessment and risk evaluation and of different approaches to risk communication, there may be preference for one of these formats.

2 The ‘Linearised’ approach

Many regulatory agencies including US EPA and the Dutch Health Council basically follow this approach (US EPA, Dutch Health Council, 1989; see also Goldbohm et al., 2006).

Version 2 – .. May 2008 43 Working group on HD DNEL/DMEL derivation a) Select the most relevant dose-descriptor

From epidemiological studies usually RRs (Relative Risks) (or, depending on the study design, the comparable measures ORs or SMRs; see Appendix R.8-15, paragraph 5) for specified exposure levels (either indicated in the study or estimated with hygiene input) are obtained. In its most simple form, a dose descriptor designates the RR or SMR observed in a study where an exposed population (with an estimated or measured average exposure level) is compared to a non-exposed population. The unexposed population may be an “internal” reference population (e.g. non-exposed workers from the same plant) or an “external” population, such as the general population for which nationwide statistical mortality or incidence data are available. In the latter situation a Standardised Mortality (or Incidence) Ratio (SMR) is used instead of a RR. If possible, the dose descriptors or risk estimate should be derived from a linear relative risk model fitted to the data of the study or fitted to the data from a pooled or meta-analysis and chosen at an exposure level within the observed range of the data (see Appendix R.8-15, paragraph 5). In this way, only a single RR per unit of exposure (i.e. slope factor) is obtained for a substance. Occasionally, a dose descriptor or risk estimate may be derived by fitting to the observed data-points a non-linear relative risk model. When this is the case, the selection of the dose- response model should be clearly justified.

In a few situations, the effects assessment condition is not directly comparable to the exposure assessment condition in terms of exposure route, units and/or dimensions. In these situations, it is necessary to convert the dose descriptor into a correct starting point (i.e. corrected RR). This applies to the following situations:

1. If epidemiological data derive from another exposure route than the route to which the risk assessment has to be applied, a route-to-route extrapolation is necessary.

2. Differences in exposure conditions between the source population and the target population, e.g. differences in respiratory volumes, or intermittent versus continuous exposures etc.

Basically, the corrections for situations 1 and 2 are performed in the same way as those described in R.8.4.1 for the derivation of a DNEL.

As the exposure metric used in the analysis of the epidemiologic data most often is a cumulative exposure value including years of exposure, e.g. ‘ppm-years’, a correction for duration of exposure is not needed.

It must be noted that in most instances the most accurate epidemiological data on long term cancer risks from chemicals are derived from epidemiological studies on occupationally exposed cohorts. These risks need to be converted to continuous (24 hours per day 365 days per year and 75 years long) exposure for the general population.

Version 2 – .. May 2008 44 Working group on HD DNEL/DMEL derivation c) Apply assessment factors to the correct starting point, and perform high to low dose extrapolation to obtain an adequate DMEL

c1 Application of assessment factors

The next step in the calculation of a DMEL is to address variability and uncertainty in the differences between the effect assessment dataset and the real human exposure situation.

Clearly, the use of epidemiological data has advantages over the use of animal data, namely there is no need for interspecies extrapolation, and extrapolation from high to low exposure levels is usually much less extreme.

Nevertheless, some assessment factors still need to be considered.

For DMEL derivation based on epidemiological studies, the following assessment factors will still need to be considered:

1. Quality of the database (amount ánd quality of available information)

2. Intraspecies differences

Ad 1. Quality of the database

 The amount of available data, i.e. the size of the database with adequate data, determines to a large extent the amount of random error in the estimated dose descriptor. This uncertainty is usually represented by the confidence intervals that are routinely derived for such estimates. A pooled or meta-analysis, when based on a substantially large database, has relatively small confidence intervals. An assessment factor may be applied if the selected dose descriptor has large confidence intervals.

 Another source of uncertainty is derived from uncontrolled biases (e.g. confounding bias or healthy worker effect) in the data. Guidance on the different types of biases and on whether they would under-estimate or over-estimate the risk is available (see Appendix R.8-15, paragraph 3), and in WHO (2000) and Goldbohm et al, 2006. Evidently, data likely to be subject to serious bias should not be used for quantitative risk assessment at all. However, in less serious cases, the impact of a possible bias on the dose descriptor may be estimated1 and compensated by an assessment factor.

 If there is reason to assume that the quantitative exposure-response relationship based on the epidemiological data is probably an underestimation or overestimation of the true association an appropriate assessment factor should be applied. An example of such a situation is when quantitative exposure estimates are lacking from a study and exposure level(s) were estimated from other sources to obtain a dose descriptor. It is also important to consider whether the available epidemiological data are

1 A practical approach to assess the effect of possible uncontrolled biases on the risk estimate can be to apply sensitivity analyses postulating different levels of bias. A more sophisticated and reliable approach is to use probabilistic simulations to estimate bias, e.g. [Steenland and Greenland, 2004].

relevant to the EU situation. In some cases, the available data could be from countries outside the EU where the exposure conditions are dramatically different from those in the EU, or the detected increased risk is due to some specific genetic characteristics of that population.

Ad 2. Intraspecies differences

There may be differences in terms of age distribution, race, physical condition, self-selection of healthy individuals, etc. between the population studied for the effect assessment (worker populations, patients groups etc.) and that for which the risk assessment has to be performed. To address these differences an intraspecies adjustment factor should be applied. For example, in risk assessments based on animal studies, a default intraspecies factor of 5 is generally considered sufficient for workers, whereas a default factor of 10 is used for the general population (see DNEL derivation, section R.8.4.2). From this it follows that when risk assessment is based on epidemiological data and there is no data to inform on differences between the source population (mostly workers) and the target population (general public), the application of a default intraspecies factor of 2 (=10/5) could be considered to address the differences between these populations and their exposure conditions.

c2. High to low dose extrapolation

The RR (whether or not corrected in the preceding steps) must be projected onto the target population (workers or general population) to derive an Excess Lifetime Risk (ELR) at a given level of exposure. There are two options here:

i) a simple direct method as described by van Wijngaarden and Hertz-Picciotto (2004) or the Dutch Health Council (1989), and

ii) a more sophisticated method including the use of a life table approach as described by e.g. Steenland et al., 1998.

The direct method results in some overestimation of the lifetime risk, in particular if the background risk in the target population is high. The life-table method gives a more accurate estimate and can incorporate specific requirements, such as changing exposure patterns over a lifetime, competing risks due to effects of exposure on other endpoints, etc. The life-table method may be used if there is a need to calculate the risk more accurately, in a sort of tiering strategy.

If the RR was calculated from a linear relative risk model, the derived ELR for a given exposure can directly be converted to a DMEL, i.e. an exposure at a given risk level considered from a societal point of view to be of very low concern (e.g. 10-5 or 10-6).

If the RR value was based on model other than the linear model, low dose extrapolation should be performed according to one of the following two options. If there is additional evidence (e.g. based upon available

Version 2 – .. May 2008 46 Working group on HD DNEL/DMEL derivation experimental data of good quality) that the dose response outside the observable range is non-linear, a non- linear model may also be used to assess the risks associated with these lower exposure levels. Otherwise, if there is no information on the shape of the dose-response in the low dose range, as a default, linear extrapolation should be applied. The application of a non-linear model to low dose extrapolation should be performed on a case-by-case basis, and should be extensively documented and justified.

3 The ‘Large assessment factor’ approach

a) Select the most relevant dose-descriptors or risk estimates, i.e. RR, OR, or SMR

From epidemiological studies usually RRs (Relative Risks) (or the comparable measures ORs or SMRs; see Appendix IX A) for specified exposure levels (either indicated in the study or estimated with hygiene input) are obtained. These dose descriptors or risk estimates may be derived from a linear relative risk model fitted to the data of the study or fitted to the data from a pooled or meta-analysis, and chosen at an exposure level within the observed range of the data. Occasionally, these dose descriptors may be derived by fitting to the observed data points a non-linear relative risk model. When this is the case, the selection of the dose-response model should be clearly justified. From these data the exposure level that represents an excess relative risk of 10% (i.e. an exposure representing an increased risk for cancer of 10%) should be calculated.

Same as for ‘Linearised’ approach.

c) Apply large assessment factors to the correct starting point to obtain an adequate DMEL

The next step in the calculation of a DMEL is to address variability and uncertainty in the differences between the effect assessment dataset and the real human exposure situation (see 3.8.3 for a more detailed description). Clearly, the use of epidemiological data has advantages over the use of animal data, namely there is no need for interspecies extrapolation and extrapolation from high to low exposure levels is usually much less extreme.

Similar to the overall assessment factor applied to derive a DMEL on the basis of animal data, assessment factors for the following uncertainties should be considered when deriving a DMEL on the basis of human data:

1. quality of whole database

2. intraspecies differences

3. the nature of the carcinogenic process (uncertainties about differences in cell cycle control and DNA repair)

4. the starting point on the dose-response curve is not equivalent to a NOAEL and the dose-response relationship below the starting point is not known

Ad 1. Quality of whole database

Ad 2. Intraspecies differences

Ad 3. The nature of the carcinogenic process

The mode of action for substances that are both genotoxic and carcinogenic includes irreversible steps, such as the fixation of DNA lesions into permanent and inheritable mutations. The consequences of irreversible steps are amplified by clonal expansion of a single mutated cell, accumulation of genetic changes and progression of the mutated cells into cancer.

Genetic factors modulate the individual risk of cancer associated with environmental exposures (Shield and Harris, 2000). The probability of genetic alterations at critical targets following exposure to exogenous or endogenous genotoxic substances may be dependent on the efficiency of repair of DNA damage and cell cycle control. Candidate genes which may influence individual cancer risk by counteracting fixation of DNA-lesions into mutations include DNA repair genes, immune function genes, and genes controlling cell-cycle and apoptosis (Brennan, 2002). For further details see Appendix IX B.

In the absence of any relevant information on this uncertainty, a default assessment factor of 10 is proposed.

Ad 4. The starting point on the dose-response curve is not equivalent to a NOAEL and the dose-response relationship below the starting point is not known

The starting point on the dose-response curve (an excess relative risk of 10%) relates to a small but measurable response and so cannot be regarded as a surrogate for a threshold in the case of a substance that is both genotoxic and carcinogenic. In addition, the dose effect relationship below the reference point, and the dose level below which cancer incidence is not increased are unknown, representing additional uncertainties.

In the absence of any relevant information on this uncertainty, a default assessment factor of 10 is proposed.

Overall, therefore, in the absence of any relevant information on the uncertainties involved, a DMEL for the general population by this approach is derived by applying an overall default AF of 1000 to the corrected dose descriptor (corrected excess relative risk of 10%) when the starting point has been determined in a worker population:

Excess Relative Risk of 10% corr Excess Relative Risk of 10% corr

DMELgeneral public =  = 

AF1 x AF2 x .. AFn 1000

Excess Relative Risk of 10% corr Excess Relative Risk of 10% corr

DMELworkersc =  = 

AF1 x AF2 x .. AFn 500

On the other hand, a DMEL for workers by this approach is derived by applying an overall default AF of 500 to the corrected dose descriptor (corrected excess relative risk of 10%) when the starting point has been determined from an occupational cohort.

Excess Relative Risk of 10%corr: Excess Relative Risk of 10% possibly corrected in step b.

R.8.5.3 Deriving a DMEL for a non-threshold carcinogen, based on human and animal cancer data

For deriving a DMEL for a non-threshold carcinogen, based on human and animal cancer data, the reader is referred to section R.8.4.3, as the guidance presented there applies to deriving a DMEL on this basis as well.

Appendix R.8-15 How dose descriptors for DNEL and DMEL derivation can be obtained from human data

1 Introduction

In Risk Assessment, the term “dose descriptor” is used to designate the exposure level (dose) that corresponds to a quantified health effect or quantified level of risk of a health effect in a specific study or combination of data from multiple studies. The dose descriptor is the starting point for derivation of a DNEL (for chemicals causing health affects only if exposure exceeds a threshold) or a DMEL (for chemicals, such as genotoxic carcinogens, causing health effects presumably already at the lowest exposure levels). In animal studies common dose descriptors for threshold chemicals are NOAEL (No Observed Adverse Effect Level) or LOAEL (Lowest Observed Adverse Effect Level), while examples of dose descriptors of non-threshold chemicals are TD25 (), ED10 (), BMD10 (). For epidemiological data, dose descriptors for threshold chemicals are NOAEL or LOAEL as well, while examples of dose descriptors of non-threshold chemicals usually is a Relative Risk (RR) for a given exposure contrast. In principle, a relevant dose-descriptor should be based on all available human (often epidemiological) data, which should be systematically reviewed. Human data differ from animal data in that they are mostly derived from observational (non experimental) studies in contrast to strictly controlled experimental animal studies. This implies that the process to arrive at a dose descriptor is different for human and animal data, although the steps are globally the same. The second crucial difference between human data and animal data is that the former is intrinsically relevant for humans. Therefore, the “mode of action” and “human relevance” issues, so important for animal data play only a minor role in human data.

These steps required can be summarized as follows:

1. Collect the available human data from all relevant data sources; 2. Evaluate the quality of the available human data; 3. Determine causality 4. Identify the key (critical) effect; 5. Determine possible quantitative use of the available human data; 6. Identify most appropriate data source(s); 7. Extract the appropriate dose descriptor for the critical effect.

These steps will described in more detail in paragraph 2, while step 2 on evaluating the quality of human data will be described in detail in paragraph 3, as this quality assessment is underlying a quality categorisation that is used to allow combining it to animal data (see Appendix 8-16).

2 Derivation of a dose descriptor from human data

Below the steps for deriving a dose descriptor are described more in detail. It is noted that this approach holds for deriving a dose descriptor for acute effects, long-term effects – both threshold effects – as well as for deriving a dose descriptor for non-threshold effects, i.e. for genotoxic carcinogens.

Step 1. Collect the available human data from all relevant data sources

Human data available in the open literature can be collected by using available literature search strategies. These searches can be further supplemented with data from publicly available reports and data from Poison Centres and Disease Registries. Not publicly available sources of human data should be included in the literature search, where these are made available.

Step 2 Evaluate the quality of the available human data

The quality evaluation should focus on exposure data, effect data, the appropriateness of the study design and statistical analysis. Criteria for assessing the adequacy of epidemiology studies include the proper selection and characterisation of the case and control groups (in case-control studies), adequate characterisation of exposure, sufficient length of follow-up for disease occurrence (in cohort studies), valid ascertainment of effect, proper consideration (in study design and data analysis) of biases and confounding factors. Poor quality sources of human data should be given a lower weight, if any, than high quality data sources. Assessment of adequacy and quality of the human data should be conducted by epidemiologists by training. Specific guidance on how to assess the quality of human data is available from several sources. A comprehensive guidance of both the evaluation and use of epidemiological evidence for risk assessment purposes is provided by Kryzanowski et al (WHO 2000). A recent initiative to strengthen the reporting of observational studies in epidemiology, STROBE, can be used to assess quality (Ref2). Glasziou et al (3) stress that different types of study objectives require different types of evidence. A quality assessment tool is extensively described in a recent ECETOC document (ref….), and for a major part is included in this guidance as well (see next paragraph).

Step 3 Determine causality

An assessment of the likelihood of a causal association must be made for all endpoints or health effects identified in steps 1 and 2. The best available guidance on causal inference are the criteria described by Bradford Hill (4). These criteria should not be used in a stringent manner in the sense that they all must be met. A too stringent causal inference approach will lead to false negative conclusions. A too loose application of the Bradford Hill criteria will lead to false positives. Steps 1 to 3 constitute the hazard identification process.

Step 4 Identify the key (critical) effect

The key effect is the health effect that is observed at the lowest exposure level in terms of an exposure concentration or a cumulative exposure metric. A table with all reported potential adverse health effects and an estimate of the NOAEL or other relevant dose descriptor for each can facilitate this step in the derivation process. An example of how such a table is prepared and applied, can be found in paragraph 4 of this appendix.

Step 5 Determine possible quantitative use of the available human data

Human data of sufficient quality can be used for DNEL or DMEL derivation, if the study provides quantitative information on both the exposure level and the critical health effect. If quantitative exposure information is not available one has to evaluate whether other databases containing quantitative exposure information can be used. Such an external source has to be linked to the qualitative exposure categories from the epidemiological study. The variable common to both the external exposure database and the epidemiological study that enables the link between exposure levels and health effects is for instance job title or type of tasks performed by the study subjects. Obviously, the appropriateness of this linkage has an impact on the validity of the exposure–response modeling. The impact of choices with respect to linking databases can be explored by means of sensitivity analyses using different sets of assumptions. The results of sensitivity analyses may show whether (and to what extent) risk estimates diverge under different assumptions.

Step 6 Identify the most appropriate data source(s)

A Weight of Evidence approach is essential for risk assessment based on epidemiological data to (a) assess (sources of) heterogeneity across the studies and (b) increase statistical stability of the risk estimates. Ideally, a meta-analysis of published studies or a pooled analysis of original raw data provide the best basis for deriving an overall dose-descriptor. Meta- and pooled analyses can also take into account small studies, which - on their

Version 2 – .. May 2008 53 Working group on HD DNEL/DMEL derivation own - are not suitable for deriving dose-descriptors due to statistical instability. If a good summary of all evidence is not available, using relatively large studies may be an acceptable, but less accurate alternative. For some substances, a dose descriptor on the dose-response curve may be derived from a single good quality epidemiology study, if this is the only adequate study.

Step 7 Extract the appropriate dose descriptor for the critical effect

The appropriate type of dose descriptor depends on the nature of the critical health effect, i.e.:

Step 7A Threshold effects

For threshold chemicals, i.e. chemicals that induce a health effect only above a certain exposure level, the aim is to find a NOAEL or LOAEL, more or less analogous to the procedure using animal data. The dose- response relationship is usually reported as occurrence of health effects for several exposure categories. The NOAEL is considered to lie within the lowest exposure category in which a response is seen. The shape of the whole dose-response curve may provide an indication where the NOAEL lies within that exposure category. This procedure applies to chemicals causing acute as well as those causing long-term health effects (i.e. effects with a longer latency period).

Step 7B Non-threshold effects

For non-threshold chemicals, i.e. notably genotoxic carcinogens, the dose descriptor is usually derived from cohort or case-control studies reporting Relative Risks (RR) or comparable measures to describe a dose-response association. The RR is the ratio between the risk of the health effect in the exposed divided by the risk in the unexposed population. Comparable measures are the standardised ratio, such as standardised mortality ratio (SMR) or standardized incidence ratio (SIR), which are conventionally used in cohort studies if the unexposed reference group is the general population. The odds ratio (OR), which is derived from case-control studies, is also a measure of relative risk. See paragraph 5 for a further explanation. The dose descriptor of interest for derivation of a DMEL is a RR (or comparable measure) specified for a specific exposure contrast. In its most simple form, the dose descriptor (RR) represents the risk observed in an exposed compared to an unexposed population. Ideally, it represents the slope of the exposure-response function derived for the whole range of exposure levels observed in the studies and based on the pooled data from all available adequate studies by modelling. More details of the procedures are in paragraph 5. A background and further explanation can be found in Goldbohm et al (2006).

3 Evaluation and categorisation of human data quality (extension to Step 2; adapted from ECETOC, 2008)

3.1 Human data quality evaluation

Apart from their form human data can vary with respect to their quality. The selected study design, the applied methodology to collect information on the risk factors and the disease entities can vary in the degree of reliability, quality and validity. The applied statistical analysis technique can be more or less valid and more or less appropriate for the database and finally the conclusions drawn from the study can be more or less appropriate. All these factors will have their impact on the study quality and ultimately on its weight of evidence. The overall quality of human data depends on the quality of the following components, that will be further detailed in the next sections of this paragraph:  The quality of the study design.

 The quality of the exposure information.

 The quality of the health outcome data.

 The quality and generalisability of the conclusions.

3.1.1 The quality of the study design

The quality of the study design depends on its appropriateness to study the hypothesis under investigation, the appropriateness of the comparison group, the longitudinal component, the adjustment for other risk factors, the statistical power of the study and the appropriateness of the statistical analysis. The appropriateness of the study design depends on the type of hypothesis to be tested. In general terms the more specific the hypothesis, the more focussed the study can be with respect to the type of data that need to be collected. The hypothesis should specify which dependent and independent variable(s) will be investigated. The study should be designed in such a way that adequate data will be collected to test the hypothesis.

A typical characteristic of epidemiological research is the use of comparisons. Disease incidence or prevalence in an exposed population is compared to that in a non exposed population. The frequency of past exposure in a group of cases is compared to that of a control group free of the disease. The comparison group is used to estimate the occurrence of disease in the exposed population, had there not been any exposure to the risk factor under investigation. Particularly in studies of multi-causal diseases it is crucial that the disease occurrence in the comparison group is a proper reflection of the background disease incidence from which the exposed population has been sampled. If certain other risk factors are thought to be very important a matching procedure can be used to guarantee comparability between the exposed and non exposed population or adjustment for confounding factors can take place in the statistical analysis. The representativeness of the

Version 2 – .. May 2008 55 Working group on HD DNEL/DMEL derivation comparison group can be compromised if the response rates in the study population are low, particularly if there is differential non response between the study groups.

An important aspect is the temporal relationship between exposure and health effect. Many types of effects are known not to occur immediately after exposure. Tissue damage may need to accumulate over time before it is expressed in clinically observable disease. Other long term effects, such as cancer are thought to have a latency period in which an initiated effect on the cellular level needs to go through a promotion phase in order to produce cancer. Cancer cells need to multiply to form tumours and infiltrative disease. Therefore cross- sectional studies are regarded to be suited to study acute effects or effects that do not lead to serious overt disease that would result in affected subjects leaving the exposure environment. For instance it has been reported that ex-smokers have reduced pulmonary function. This is associated of course with their past smoking habits, but it is also a result of smokers who develop respiratory complaints being more likely to quit smoking. In this case a cross-sectional study could indicate that ex-smokers have poorer respiratory function than current smokers. Furthermore if there are more risk factors known to be relevant for a specific disease, adjustments can be made in the analysis to take these effects into account. Once the data are collected according to the procedures described in the study protocol the statistical analysis is performed. The first aim of the statistical analysis is to describe the data in terms of meaningful entities such as means, medians, standard deviations and risk metrics. Secondly, the statistical significance of differences between the means of groups is tested or the risk parameters such as odds ratios, SMRs or RRs are tested for their statistical significance to exclude with a certain probability the likelihood of a chance finding. A third objective of the statistical analysis is to adjust for confounders resulting in adjusted risk metrics.

3.1.2 The quality of exposure information

Information on exposure can have many forms. It can vary from categorical information indicating the likelihood of exposure (for instance having successfully taken a course in the safe handling on pesticides, a positive response to the question “have you ever used or been in contact with substance X”, or ever having been employed at a certain company in which a certain chemical was handled) to very extensive individual exposure data based on the systematic collection of air samples. If the exposure data are rather of a qualitative character the study results can still be relevant for hazard identification, but not so much for quantitative risk assessment. In many longitudinal epidemiological studies the construction of a job-exposure matrix has shown to be a valuable means of using exposure information. It is only useful to construct a job-exposure matrix if (semi-)quantitative exposure information is available. The job-exposure matrix is based on homogeneous exposure groups, consisting of those jobs that are thought to be characterised by comparable exposure conditions. For each homogeneous exposure group the exposure intensity is estimated. Historical changes in the production process or work practices, resulting in changes in exposure, are taken into account and form a dimension of the matrix. The job-exposure matrix allows calculating cumulative exposure, but can also serve to stratify the study groups into subgroups with certain exposure characteristics, such as ever exposed over a

Version 2 – .. May 2008 56 Working group on HD DNEL/DMEL derivation certain concentration. Exposure measurement error can lead to misclassification and have a significant effect on the results of epidemiological studies. It has been argued that exposure misclassification intrinsically lead to relative risk estimates closer towards 1. However, it has also been argued that this is not always the case. Therefore it cannot be concluded that misclassification of exposure always leads to an underestimation of the risk (Jurek et al, 2006).

Exposure circumstances can vary substantially. The degree of variability of the exposure conditions strongly determines the extensiveness of exposure information to adequately describe exposure conditions. If exposure is stable, with no variation over the workday, the season or between time periods, a few sample points can be adequate to characterise the exposure situation. However in reality, exposures vary from place to place, from task to task. They may change over time, because of differences in production process, exposure reduction measures, and use of personal protection equipment.

Next the type of exposure information can also vary. Sometimes the only exposure parameter is that a person has been employed in a particular industry. More specific information would be the type of job the person has been doing in that industry and over what time period. A more specific exposure characterisation can only be made if industrial hygiene measurements have been taken. In general, industrial hygiene measurements can be done for various purposes. They can be done to identify sources or tasks with high exposure. In that case the results constitute an overestimate of general exposure at the workplace. Second the industrial hygiene measurements can be conducted to provide a reliable picture of the exposure conditions at a specific work place.

If the exposure measurements are collected by means of a systematic approach they are more valuable. It must be clear how and why and where samples were taken.

Exposure patterns can be characterised according to their temporal variability, spatial variability and variability due to individual behaviour. Work rosters and task/activity schedules can have a large impact on exposure.

The precision of exposure measurements in estimating true exposure is not only determined by the number of measurements but also by the variability of exposure. Two aspects of the exposure data are important for the final interpretation of the findings. First, the internal validity should be satisfactory, meaning that the exposure data adequately describe the actual exposure situation. Internal validity depends on the sampling strategy and sampling frequency. Second, external validity should also be satisfactory. It relates to the comparability between the exposure conditions under investigation and the exposure conditions in other situations.

Money and Margary (2002) described a number of core principles to derive reliable and robust exposure assessments. They essentially describe three types of exposure data: actual data, analogous data and personal exposure data collected in a systematic manner. All three types of data can vary in quality and reliability.

3.1.3 The quality of health effect data

The types of health effects for which human data exist may vary from acute effects to chronic long term effects. The occurrence of health effects can be determined in various ways. Self filled-in questionnaires can be used. Already existing databases, for instance causes of death databases of cancer registries can be queried. On occasion studies are performed in which specific diagnostic procedures are performed to establish disease status. Two aspects determine the overall reliability of health effect data, being their quality and their completeness. In many epidemiological studies the occurrence of disease is expressed in a relative measure. The incidence of the disease in an exposed population is divided by the incidence in a non exposed population resulting in a relative risk metric (van den Brandt et al, 2002). The occurrence of disease as such is not regarded to be informative and the advantage of the relative risk metric is that it reflects the strength of the association between exposure and effect.

The quality of the health effect data depends on the used data collection methods. Has standardised and validated data collection or diagnostic techniques been used with satisfactory sensitivity and specificity? It is crucial that the reliability of health effect data collection techniques is the same for the exposed and non- exposed. If not, bias can be introduced, which makes the study useless.

Even if a reliable diagnostic procedure is used the health effect data may not be reliable, because of differences in completeness. Identified cases may be missed because of poor tracking or cases may not be diagnosed as such. Ideally completeness would be 100% for exposed and for non exposed groups. However this is hardly ever reached in reality, but even incomplete data can still be sufficient for a reliable study. The risk metrics, such as odds ratios and relative risks are relative metrics and even incomplete health effect data will results in correct risk estimates, provided the completeness in the exposed is the same as in the non exposed.

If human data do not completely meet these criteria they can still be important for risk assessment. For instance, it is clear that case reports most likely do not represent all occurring cases of that disease. However, they can still be very important in the hazard assessment process and if the health effect is specific this information can still be important in risk assessment.

3.1.4 The generalisability of the conclusions

Conclusions from a study can only be drawn if they are substantiated by the data and the statistical analysis. In this respect the statistical significance plays a major role, but also the internal consistency. For instance, are all the analyses in support of an association? Was there a dose response relationship? Was the association found for all subgroups? Were known confounding factors taken into account? Generalisability indicates the

Version 2 – .. May 2008 58 Working group on HD DNEL/DMEL derivation extent to which the results of the study may also be applicable to larger populations. In case of descriptive studies such as surveys the study population needs to be representative for the larger base population. However in case of analytical studies, producing relative risk metrics the study population does not necessarily need to be representative of the general population (Rothman and Greenland, 1998).

To a certain extent the quality requirements of a study depend on the type of association under investigation. If a health effect is specific, i.e. can only be caused by the exposure under investigation no adjustments need to be made for potential confounding factors. On the other hand, if it is known that other factors can play an important role in the aetiology of a non specific health effect (such as smoking and pulmonary disease) the results of a study are thought not to be reliable if no adjustment for the other risk factor is made. Similarly, the investigation of acute health effects can be studied by means of cross-sectional studies and does not require a longitudinal study design. Therefore quality requirements for long term health effects are more stringent in terms of study design than in case of acute effects. Case reports on specific health effects can provide sufficient and reliable information for risk assessment, whereas non specific long term health effects require complex research designs including adjustment for potential confounding factors.

3.2 Scheme for scoring human data with regard to its quality

3.2.1 Introduction

In this section, a simple scoring scheme is described for human data based on its quality, the nature and specificity of the effect. The aim is to characterise whether the human data are a reliable source for the dose response assessment (or another) phase of a risk assessment. The aim is to create a small number of categories to characterise human data together with animal or in vitro data. This allows a transparent way to justify the basis of a risk assessment when both human and animal data exist. The choice is driven by the inherent strengths and weaknesses of each data source, rather than the results (be it positive or null) of a particular body of data. If human data cover all ranges of quality for a given effect (e.g. from case reports to experimental studies), the scheme below would categorise the highest quality human data. The whole scheme is shown in the Figure below.

Figure: Components for scoring the quality of human data.

Non- Pre-requisites Nature of effect Specific Specific •Exposure occurs •Health effect occurs Sub-chronic and Chronic no yes Acute

Assess quality

Highest I I I I

Good II I I I

Compro- III II II I mised

Poor IV III III II

No X X X X Info.

The scheme involves the following steps. There are certain pre-requisites which have to be met by the human data. If these are met, then the intrinsic quality of the data is assessed in detail and assigned to a category. Then the nature of the health effect is considered, and combined with the category to produce a quality score on a scale of I (strongest) to IV (weakest), or X (no information). In certain circumstances adjustments may be made to the quality score based on whether the findings are positive or null.

3.2.2 Pre-requisites

There are at least two basic criteria for human data that must usually be satisfied across all scenarios and situations. These are minimum criteria which are mostly obvious, but need to be stated since criteria for specific scenarios considered below build upon these basic scenarios. These are:

1. Exposure to the substance in question should be present. Ideally, at least the presence of exposure (not necessarily the degree) should be documented or inferred with near certainty based on the scenario in which exposure occurred. If the study population contains a mix of exposed and unexposed subjects, the unexposed group should not be so large (e.g. > 2/3 of the population) as to mask any effects of exposure. 2. The health effect should be based on objective criteria. This could include records that are expected to document disease occurrence (e.g. hospital records, incident reports, death certificates). Self-reports of disease should usually be validated by observation, often by qualified medical personnel or records systems.

When a disease is chronic and not specific to the chemical being evaluated, a third criterion is:

3. Account of potentially intervening factors that have a large influence on disease occurrence should be made. For example, exposure to a potential dermal irritant could be made reliably, and squamous skin cancer could be ascertained through a registry, but if the exposed population lives at low latitude, while the unexposed at high latitudes, the large affect of differential sunlight exposure on squamous cell skin cancer must be accounted for.

These three basic quality criteria can be summarised as: (a) exposure must have occurred, (b) the health effect must be measured adequately, and (c) for chronic effects, major factors known to have very strong effects (e.g. RR > 2) other than exposure must be accounted for.

3.2.3 Assessing the intrinsic quality of the data

Categories that encompass different degrees of intrinsic quality for human data are proposed as follows:

Highest quality. Several well-conducted studies, primarily of the cohort, case control and/or experimental design. A large majority (2/3 or greater) shows consistent results with one another, and consistency with biological evidence. The majority of the studies have: (a) quantified exposure data linked to individuals, (b) well-validated health outcome measurements, and (c) control of potentially large intervening factors (confounders). A relevant selection bias, which may affect the validity of case-control studies in particular, needs to be excluded. For positive data, a monotonic dose response gradient exists either for the body of data (supported by pooled or meta-analyses), or in at least 2/3 of the studies. For null data, the lack of a monotonic dose response exists for at least 2/3 of the studies. In some cases, especially for experimental designs, one particularly strong, large study may suffice, especially if there are no reasonably strong conflicting human data. Confidence intervals for the highest quality human data should generally span less than one order of magnitude.

Good quality. The above situation is weakened by no more than three of the following limitations: (a) the consistency between studies is not high, yet still suggestive (>50% of the studies are concordant); (b) exposure data is not always quantifiable or linked to individuals; (c) there is not a good biological understanding underpinning the results; (d) not all strong intervening factors can be ruled out in the majority of the studies; (e) health outcome measurements are not well-validated; (f) confidence intervals are more than an order of magnitude, and (g) a monotonic dose response relationship (or lack thereof, for null data) does not exist for all studies, but it still does for the majority of the studies.

Compromised quality. This category of data is either of sufficient design but lacking in more than three of the above criteria, or of a compromised design (e.g. cross sectional, time series, case reports for chronic, multi- causal endpoints, possible selection bias), but possessing at least three of the following items: consistency, strong exposure data, biological gradient, biological plausibility, and/or adequate control of intervening variables.

Poor quality. This is the weakest category for suggesting a causal association or safe vs. harmful concentrations. Case reports for chronic disease usually fall into this category because there is no study design. Reports from regularly collected data (mortality trends, poisoning centres) may first fall into this category, before subsequent corroborating information is found. This category is above the situation where there is no data, and is also above a situation in which there is a large body of discordant data.

No information. No valid data, or a large body of clearly discordant data.

3.2.4 Considering the nature of the health effect and deriving a score for data quality

The nature of the health effect has a large influence on quality requirements. Effects that are chronic and with a long latency period require stronger and more complex human data than acute effects (Swaen, 2006). Effects that are systemic and chronic in nature (e.g. pulmonary fibrosis, bladder cancer, renal failure) may take years to develop. In the development of disease, other attendant lifestyle factors and time-related confounders may also play a significant role and may also vary in time. This makes it difficult to observe causal associations without well designed and conducted studies. Some chronic health effects are exceptions as they are specifically caused by the chemical in question, such as angiosarcoma caused by vinyl chloride.

For local effects that are acute, sophisticated epidemiologic designs are not necessary. One can immediately experience coughing and wheezing in dusty environments and immediately be aware of what has caused these symptoms. Also effects that are very rare and/or specific can be associated with exposures without a formal study design (e.g. relatively few Pneumocystis carinii cases from government surveillance systems were sufficient to signal the start of the AIDS epidemic in the U.S.).

The table below accounts for the nature of effect as well as intrinsic data quality in assigning a overall quality score to human data.

Table: Combining intrinsic data quality and the nature of the health effect to produce a human data quality score

Intrinsic Long term effect Acute or short term effect human data quality Effect: Non-specific Specific (e.g. vinyl Non-specific (e.g. Specific (e.g. (e.g. diesel chloride/ solvents/ reactive dye/ exhaust/ lung angiosarcoma) headache) respiratory cancer) sensitisation) Highest I I I I Good II I I I Compromised III II II I Poor IV III III II No information X X X X

Thus, for very specific effects or acute effects, quality categories I, II and III are possible often without sophisticated epidemiologic designs. It is also expected that, in these cases, there will not be a large body of data that will need to be evaluated with respect to its consistency, since hazardous concentrations will be more readily detectable.

4 Identify the key (critical) effect – an example (extension to Step 4)

Provided there are sufficient human data on the health endpoints of a certain chemical, the earlier described hazard identification process, following the steps 1 through 6, will result in a description of the reported endpoints in combination with a dose descriptor. For a chemical there can be multiple reported endpoints in combinations with multiple dose descriptors. These can best be summarized in the form of a table containing the dose descriptors for each of the reported health effects. The table will include dose descriptors for long term endpoints as well as for acute endpoints. A hypothetical example of a dose descriptor table is given below:

Table.. Dose descriptors for the endpoints associated with chemical X. Only data sources of sufficient quality and with sufficient evidence for causality are included.

Endpoint Dose descriptor A/L1 Qual cat2 Lethality In a railway accident two workers died from over A exposure. Peak exposures of 10000 ppm were estimated headeaches During a spill in a production plant 4 out of 20 exposed A workers reported acute headaches. Peak exposures of 500-800 ppm were measured. No other endpoints noted. Peak exposure measurements under normal conditions range between 0 and 100 ppm Decline of A study reported marginally reduced respiratory function in L FEV1 workers with current exposure ranging from 20 to 80 ppm. Decline of A study reported normal respiratory function in workers L FEV1 exposed between 0.1 and 5 ppm Liver function A study on liver function in workers exposed to L concentrations up to 5 ppm showed no effect on liver function. 1 Acute or long term effect 2 Quality category assessment, based on approach described in Appendix ..

The dose descriptor table should include all endpoints for which there is human data of sufficient quality. From the dose descriptor table the key endpoint can be identified. In this example, the dose descriptor for acute endpoints indicates that headaches can be anticipated at peak exposures between 500 and 800 ppm. The key dose descriptor for long term endpoints indicates that decline in respiratory function can be expected in the 20- 80 ppm range. b). modification when necessary, of relevant dose descriptors to correct the starting point. In certain instances the human data may be in a form that modification is required. For instance past exposure may have been higher. In the dose descriptor table above the workers who experienced decrease respiratory

Version 2 – .. May 2008 64 Working group on HD DNEL/DMEL derivation function may have experienced higher exposure in the past, which is likely to be the actual cause of the effects. In this case the high past exposure needs to be taken into account. The available human data are in such a form that they do not allow to identify the NOAEL, despite that the study shows an effect. The exposure range in which the NOAEL must lie is between 5 and 80 ppm. Since effect in the 20 to 80 ppm group were marginal it can be concluded that the NOAEL is likely to be between 20 and 80 ppm. On the other hand the dose descriptor in the human study can be of a short character (for instance 2 years). If there is evidence that longer exposure will result in a lower effect threshold then the dose descriptor must be modified accordingly.

5 Issues for derivation of a dose descriptor for non-threshold effects (extension to Step 7B)

5.1 Relationships between RR, OR, and SMR For non-threshold chemicals, i.e. notably genotoxic carcinogens, the dose descriptor is usually derived from cohort or case-control studies reporting Relative Risks (RR) or comparable measures to describe a dose- response association. The RR is the ratio between the risk of the health effect in the exposed divided by the risk in the unexposed population. Comparable measures are the standardised ratio, such as standardised mortality ratio (SMR) or standardized incidence ratio (SIR), which are conventionally used in cohort studies if the unexposed reference group is the general population. The odds ratio (OR), which is derived from case- control studies, is also a measure of relative risk. For relationships between RR, OR, and SMR see Table 1 below.

Table 1. Comparison of parameters to describe incidence and mortality and their derived measures of association

Parameter Synonyms Type Unit Range Proxies Risk Absolute risk Probability, - [0,1] Cumulative rate Cumulative proportion incidence/mortality Incidence/mortality proportion Rate Incidence/mortality Number of cases / time-1 [0,>1] Incidence density rate person.years Hazard rate Relative risk (RR) Risk ratio Ratio of risks - [0,∞] Rate ratio Incidence density ratio Standardised ratio (SIR, SMR) Odds ratio (OR)

Excess risk Risk difference Difference of risks - [-1,1] Added risk Additive risk

5.2 Use of mathematical modelling to find the best dose descriptor The purpose of modelling is to relate the exposure to the rate of disease in the population studied, i.e., establishing an exposure–response association. The exposure may be expressed as e.g., average exposure to a substance, peak exposure, or alternatively, as a cumulative exposure for each unit of person-time of observation (e.g., ppm·year), depending on the assumed mode of action. For genotoxic carcinogens the cumulative exposure in terms of ppm-years is the exposure metric usually available. The simplest form of this association is based on two categories of exposure, i.e., exposed versus unexposed. However, more than one exposure category is strongly preferable because it allows assessment of the consistency and shape of the association. Although it is possible to use multiple exposure categories as such for most purposes of risk estimation, the data are most efficiently used when a mathematical model is fitted to the data, provided the model fits the data well. Such models can have a variety of forms, ranging from linear to a more complex mathematical function. The linear relative risk model, fitted to the available data, is the most appropriate default model to be used for deriving a dose-descriptor [Goldbohm et al., 2006], as it is acknowledged that: a. the fit of mathematical models representing different shapes of dose-response curves (including curves with a threshold) for a given dataset does not often differ sufficiently to decide which curve fits best; b. simplicity, comprehensibility and alignment with the paradigm for risk assessment based on animal data (where possible and justified) is important; c. robustness and precaution are requirements of the risk assessment process, Only if the human data (preferably supported with experimental data of good quality) clearly indicate deviation from this model, another model suitably describing the dose-response curve could be applied. An additional advantage of application of the linear model is that procedures for extrapolation (to higher and to lower exposures) from data with only one data point and those with a range of data points are essentially the same.

Appendix R.8-16 An integrative approach to combining Human Data and Animal data for DNEL and DMEL derivation (adaptation from ECETOC report)

1. Introduction

This Appendix specifically deals with the question how to integrate human and animal data, how to select the most appropriate data source for DNEL and DMEL derivation. It does not describe how to derive DNELs and DMELs but provides guidance on how to select the most appropriate database. In many instances both human data and animal data regarding toxic effects from chemicals are available. These two data sources should not be evaluated separately but should be integrated as much as possible to form one overall assessment. However, ultimately one point of departure for DNEL and DMEL derivation must be selected. It is obvious that the most important criterion for the decision between using animal data or human data will strongly rely on their quality and for animal data on their relevance for human exposure. The evaluation of human data and their classification into four categories based on their quality has been described earlier ( see Appendix 8-15). First the classification of animal data will be described and then the classification of human data. Finally a scheme will be presented that integrates animal data and human data on the basis of their qualities.

2 Classification of animal data

For animal studies it is not the quality per se, but also the quality of the information available to the reviewer of the study that impact the quality assessment. For this reason many good studies are let down by the poor quality of information available. Criteria for systematically reviewing the reliability of reported animal studies are routinely used by some assessment authorities. Examples of such criteria commonly in use are those published by Klimisch et al (1997). These criteria may be used for example in determining whether existing data are of sufficient quality or whether further testing may be needed in the context of the OECD’s Existing Chemicals Programme that works to ensure all High Production Volume chemicals have sufficient quality data for a set of minimum toxicity endpoints. The OECD has described three terms used by Klimisch when referring to data quality: reliability, relevance and adequacy (OECD, 2007). Reliability refers to the quality of a test report or publication and takes into account whether standardised methodologies were used to generate the report as well as the way the experimental procedure and results are described. Relevance refers to the extent to which data and tests are appropriate for a particular hazard; and Adequacy refers to the usefulness of the data for hazard and/or risk assessment purposes.

Criteria for assigning animal studies into four reliability categories have been described by on Klimisch et al (1997). The reliability criteria and categories provide a practical systematic approach for quality evaluation of experimental toxicological data, considering such factors as the use of a standardised test method, the availability of a description of the relevant details of the study design. The following criteria are proposed to classify animal data (see table) Adapted from Klimisch et al (1997).

Code of

Reliability Category of reliability (CoR) 1 Reliable without restriction, with‘Good laboratory practice’ guideline study (OECD, EC, EPA, FDA, etc.) or comparable to guideline study Test procedure in accordance with national standard methods (AFNOR, DIN, etc.) or test procedures according to well accepted standards described in sufficient detail 2 Reliable with restrictions, and guidelines without detailed description, or guideline study with acceptable restriction, and test procedure in accordance with national standard methods with acceptable restrictions or a well documented study meeting generally accepted scientific principles, acceptable for assessment. 3 Not reliable, with insufficient documentation, significant methodological deficiencies and a lesser suitable test system 4 Not assignable, abstract only, secondary literature, or the original reference is not yet available or a document insufficient for assessment

Irrespective of the quality of animal studies themselves, the process of extrapolating these results to man is a very uncertain one. Many animal studies use a very high dose level (maximum tolerated dose) to evaluate potential adverse effects, however, this information may be irrelevant if the health of the animals is compromised and rather contribute to hazard identification.

The Mode of Action (MoA) is defined to be “a plausible hypothesis, supported by observations and experimental data, regarding events leading to a toxic endpoint” (Meek et al, 2003) and its importance in risk assessment is discussed in ECETOC (2006a). It is valuable to take the MoA analysis onto a more formal level. This involves the identification of a chain of key cellular and biochemical events that result in the observed toxicity. The evidence for these key events from animal studies can then be reviewed against an adapted form of the Bradford Hill criteria (Seed et al, 2005; Boobis et al, 2006). The default position is that the animals finding are considered relevant to man unless it can be demonstrated otherwise. Knowledge of the dosimetry associated with the critical effect can be an important component of this consideration. In the absence of human data, it is difficult to discount animal findings for their relevance to man. A formal procedure termed the IPCS Human Relevance Framework has been established to make judgements about the relevance to man of findings in animal studies for both cancer

Version 2 – .. May 2008 68 Working group on HD DNEL/DMEL derivation endpoints (Boobis et al, 2006), and non-cancer endpoints (Seed et al, 2005). This procedure involves describing key events leading to the toxicity observed, and establishing the MoA in animals. Then each key event in animals is evaluated for its plausibility in man. The IPCS Human Relevance Framework can be extended by adding the criteria for data availability and quality. It has also been modified so that working through the framework results in the collective body of animal data being allocated to one of five categories in terms of the weight of evidence provided for human risk assessment (Figure below).

Figure : Completed scheme for categorising the relevance of animal data for use in human risk assessment a

No No relevant data. Are relevant studies available? Category X Yes

Are Klimisch category No Data unreliable. 1 or 2 studies available? Category IV Yes

No Assume relevant to man. Is the MoA established in animals? Category III Yes

No Not relevant to man. Are the key events plausible in man? Category X Yes

Taking into account kinetic and dynamic No Not relevant to man. factors, is the animal MoA plausible in man? Category X

Yes Yes, with a Maybe directly sensitivity difference

Relevant to man. Relevant to man. Assume relevant to man. Category I Category II Category III a The shaded area represents the IPCS human relevance framework

These 5 categories are:

I. Reliable animal findings exist that are directly relevant to man. II. Reliable animal findings exist, which are relevant to man with the inclusion of a correction for a sensitivity difference between animals and man. III. Reliable animal findings exist. Their relevance to man is uncertain, but they should be assumed to be relevant. IV. Animal findings unreliable. X. No relevant animal data exist, or the animal findings are not relevant to man.

3 Classification of human data

A simple scoring scheme for categorizing human data based on their quality is described in Appendix 8-15.

4 Combining the quality classifications of human data and animal data

The framework for the use of human and animal data builds upon previous text and uses all available human and animal data. The overall framework involves three steps which are shown schematically in Figure .. and consists of the following: 2. The assessment of the collective weight of evidence of the human data to the human risk question being considered. This results in a weight of evidence score for the quality of human data, based on a five point scale (I-IV plus X) as described earlier. 3. The assessment of the collective weight of evidence of the animal data to the human risk question being considered. This results in a weight of evidence score for the quality and relevance of animal data, on a five point scale (I-IV plus X) described earlier. 4. Integrating the available evidence from human and animal sources. This is undertaken using the matrix shown in more detail in Figure below.

Figure: Framework for the use of human and animal data in chemical risk assessment

Quality of human data

Non- Pre-requisites Nature of effect Specific Specific •Exposure occurs •Health effect occurs Sub-chronic and Chronic no yes Quality and relevance of Acute animal data Assess quality

Highest I I I I

Good II I I I

Compro- III II II I mised

No No relevant data. Poor IV III III II Are relevant studies available? Category X Yes No X X X X Info. Are Klimisch category No Data unreliable. 1 or 2 studies available? Category IV Yes

No Assume relevant to man. Is the MoA established in animals? Category III Yes

No Not relevant to man. Are the key events plausible in man? Category X Yes

Taking into account kinetic and dynamic No Not relevant to man. factors, is the animal MoA plausible in man? Category X

Yes Yes, with a Maybe directly sensitivity difference

Relevant to man. Relevant to man. Assume relevant to man. Category I Category II Category III Quality of human data I II III IV X

I Animal data

II takes precedence Quality and relevance of III animal data Human data IV takes

X precedence

Interpretation framework for human data

The figure shows the components of the matrix that allow for the comparison and integration of human and animal data. The intention behind the framework is not to generate a prescriptive process but, rather, to serve as a consistent basis for the comparison of available findings. In this respect, two basic principles adopted are:  The stronger evidence should be used to form the basis of any decision, whether that stronger evidence comes from human or animal data, and  Where human data of the highest quality are available, then these take precedence in all circumstances.

Figure: Matrix for integrating human and animal data

Quality of human data I II III IV X

I Animal data take precedence II Quality and relevance III of animal data IV Human data take precedence X

Positive data take precedence (be it animal or human). If data are not concordant, the data with a steeper slope or lower safe level should be used, but should be moderated by the upper risk level of the “less positive” data (see text).

When identical quality scores are achieved for discordant human and animal data, then, apart from in the instance of Category I human data (which because of its quality is given precedence regardless), then the Framework confers a protective approach, consistent with normal practice in regulatory risk assessment. This operates as follows:

a) For Hazard Assessment: in the case where human and animal data scores are the same (viz. II/II, III/III, or IV/IV). The data which suggest a hazard should generally take precedence over those which do not. b) For Dose-Response Assessment: Here, the animal data may suggest a lower safe level (read this as ‘more positive’) than the human data, or vice-versa, while the scores are again the same (i.e. II/II, III/III, or IV/IV). In this case, the data resulting in the ‘most positive’ (i.e. lower) safe concentration should take precedence. However, the other data source should be used as the upper bound. In this way, one is not inherently ‘ignoring’ one body of data, but rather searching for a range that is consistent with both, while allowing the most positive data to be used in a protective manner.

It is impossible to construct a framework which gives appropriate outcomes in all circumstances. Therefore there should always be the scope to deviate from the procedures outlined above as long as there is a soundly based and well argued scientific case for doing so. If data quality neither from human nor from animal studies is better than category III, hazard and risk assessments in general need to be considered with great caution, in particular if data are non-concordant.

References

R.8 from R.8.1.2.8 or 8.2.2: from R.8.4.1: from R.8.4.3: from R.8.5.1:

Dutch Health Council, developed jointly by the members of the committee on the evaluation of carcinogenic substances (1989) Carcinogenic risk assessment of benzene in outdoor air. Regul. Toxicol Pharmacol 9: 175-185

Goldbohm RA, Tielemans EL, Heederik D, Rubingh CM, Dekkers S, Willems MI, Kroese ED (2006) Risk estimation for carcinogens based on epidemiological data: A structured approach, illustrated by an example on chromium. Regul Toxicol Pharmacol 44: 294-310

Rothman KJ, Greenland S. (1998) Modern Epidemiology. Lipincott-Raven Publishers Philadelphia USA. ISBN:0-316-75780-2

Steenland K, Greenland S (2004) Monte Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. Am J Epidemiol 160: 384-392

Steenland K, Spaeth S, Cassinelli R, Laber P, Chang L, Koch K (1998) NIOSH life table program for personal computers. Am J Ind Med 34: 517-518

van Wijngaarden E, Hertz-Picciotto I (2004) A simple approach to performing quantitative cancer risk assessment using published results from occupational epidemiology studies. Sci Total Environ 332: 81-87

Guidelines for Carcinogen Risk Assessment (2005), Risk Assessment Forum U.S. Environmental Protection Agency Washington, DC. EPA/630/P-03/001F, March 2005

WHO Working Group report (2000) Evaluation and use of epidemiological evidence for environmental health risk assessment: WHO Guideline Document. Environm Health Perspect.,108: 997-1002.

EFSA references:

Brennan, 2002 Cloos et al., 1999 Collins, 2003 Goode et al., 2002 Gu et al., 1999 Hu et al., 2002 Mohrenweiser and Jones, 1998 Mohrenweiser et al., 2003 Mohrenweiser, 2004

Palli et al., 2003 Powell et al., 2002 Qiuling et al., 2003 Shield and Harris, 2000 Tedeschi et al., 2004 Wang et al., 2002 Wei et al., 2003 from R.8.5.3:

Appendix R.8-15

Appendix R.8-16

Version 2 – .. May 2008 75