Available online at ScienceDirect www.sciencedirect.com

´

Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225

Point of view

Randomized clinical trials and observational studies in the assessment of

drug safety

Essais cliniques randomise´s et e´tudes observationnelles dans l’e´valuation de la se´curite´

des me´dicaments

a, a b

J. Sawchik *, J. Hamdani , M. Vanhaeverbeek

a

Federal Agency for Medicines and Health Products, place Victor-Horta-40/40, 1060 Brussels, Belgium

b

Groupe d’e´piste´mologie applique´e et de clinique rationnelle, hoˆpital Ve´sale, CHU de Charleroi, 6110 Montigny-le-Tilleul, Belgium

Received 25 October 2016; accepted 13 March 2018

Available online 20 April 2018

Abstract

Randomized clinical trials are considered as the preferred design to assess the potential causal relationships between drugs or other medical

interventions and intended effects. For this reason, randomized clinical trials are generally the basis of development programs in the life cycle of

drugs and the cornerstone of evidence-based medicine. Instead, randomized clinical trials are not the design of choice for the detection and

assessment of rare, delayed and/or unexpected effects related to drug safety. Moreover, the highly homogeneous populations resulting from

restrictive eligibility criteria make randomized clinical trials inappropriate to describe comprehensively the safety profile of drugs. In that context,

observational studies have a key added value when evaluating the benefit-risk balance of the drugs. However, observational studies are more prone

to bias than randomized clinical trials and they have to be designed, conducted and reported judiciously. In this article, we discuss the strengths and

limitations of randomized clinical trials and of observational studies, more particularly regarding their contribution to the knowledge of medicines’

safety profile. In addition, we present general recommendations for the sensible use of observational data.

# 2018 Elsevier Masson SAS. All rights reserved.

Keywords: Epidemiological methods; Evidence-based medicine; Observational studies; Randomized controlled trials; Risk-benefit assessment

Re´sume´

Les essais cliniques randomise´s sont conside´re´s comme le mode`le pre´fe´re´ pour e´valuer les relations causales potentielles entre me´dicaments ou

d’autres interventions me´dicales et les effets attendus. Pour cette raison, les essais cliniques randomise´s sont ge´ne´ralement la base des programmes

de de´veloppement dans le cycle de vie des me´dicaments et la pierre angulaire de la me´decine fonde´e sur les preuves. Cependant, les essais cliniques

randomise´s ne constituent pas le mode`le de choix pour la de´tection et l’e´valuation des effets inattendus lie´s a` la se´curite´ des me´dicaments. De plus,

les populations tre`s homoge`nes re´sultant de crite`res d’e´ligibilite´ restrictifs rendent les essais cliniques randomise´s inapproprie´s pour de´crire de

fac¸on exhaustive le profil de se´curite´ des me´dicaments. Dans ce contexte, les e´tudes observationnelles ont une valeur ajoute´e majeure lors de

l’e´valuation du rapport be´ne´fice-risque des me´dicaments. Cependant, les e´tudes observationnelles sont plus sujettes aux biais que les essais

cliniques randomise´s et elles doivent eˆtre conc¸ues, mene´es et rapporte´es tre`s judicieusement. Dans cet article, nous discutons les points forts et les

limites des essais cliniques randomise´s et des e´tudes observationnelles, plus particulie`rement eu e´gard a` leur contribution a` la connaissance du

profil de se´curite´ des me´dicaments. En outre, nous exposons des recommandations pour une utilisation rationnelle des donne´es observationnelles.

# 2018 Elsevier Masson SAS. Tous droits re´serve´s.

´

Mots cle´s : Me´thodes e´pide´miologiques ; Me´decine fonde´e sur les preuves ; Etudes observationnelles ; Essais cliniques randomise´s ; Be´ne´fice-risque

* Corresponding author.

E-mail address: [email protected] (J. Sawchik).

https://doi.org/10.1016/j.respe.2018.03.133

0398-7620/# 2018 Elsevier Masson SAS. All rights reserved.

´

218 J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225

1. Introduction 2010 brings significant changes in the safety monitoring of

medicines across the European Union. By enabling and

In the modern era, the slow rise towards a rational clinical promoting the conduct of post-authorization safety studies

medicine has been built around two pillars: the progressive (PASS), the European Union legislation strengthened the post-

integration of the basic sciences in clinical practice and the rise authorisation monitoring of medicines in Europe [18]. The

of clinical [1,2]. In the beginning of the 20th main aims for conducting PASS are:

century, the Flexner report [3] recommended the integration of

basic sciences in medical schools’ core curriculum. Eighty  the evaluation of safety concerns associated to marketed

years later, a beautiful series of articles discussing a new vision medicinal products;

for teaching clinical medicine enabled the rise of clinical  the description of the patterns of drug use, which may impact

epidemiology. The first of these papers [4] launched the the safety profile of the drug, and;

evidence-based medicine era in clinical medicine, which  the evaluation of the effectiveness of risk management

emphasized the importance of the in measures.

the assessment of causal evidence.

Around the end of the 19th century and the beginning of the Specific objectives related to the evaluation of safety include

20th, pharmaceutical industry–sometimes in close contact with the quantification of important risks, the assessment of risks

academic researchers [5] – developed powerful ‘‘magic associated with long-term use, the investigation of the potential

bullets’’: antiserum against diphtheria toxin, arsphenamine risks in populations for which safety information is limited or

(the first partially effective treatment for syphilis) [6] and missing (e.g. children, elderly, pregnant women) and the

insulin [7]. Physicians and researchers in , confirmation of the absence of particular risks of concern.

preoccupied with the assessment of efficacy of their treatments PASS are usually designed in the form of observational

developed progressively sophisticated methods [8], until studies (OS) also referred as non-randomized or non-

coming up the current ‘‘’’, the randomized interventional studies [19]. However, OS play different roles

(RCT). and are conducted at different stages of the drug development.

Although, it seems today quite natural to say that rational For example, and studies are needed to

prescribing in the context of contemporary medical sciences describe the epidemiology and to assess the burden of the

supposes the identification of a biological path between the of interest. These studies could be conducted before a

drug and its observed effects as well as a control of the clinical new medication is introduced into the market or even before the

efficacy through RCTs, but it was not always quite as evident start of the clinical development programme. They may be

[9,10]. important for estimating background risks in the population of

Throughout drug history, and from a drug safety perspective, interest, allowing sensible comparisons with the observed risks

it was only after a series of disasters, that pharmacovigilance after the introduction of new drugs into the market [20]. In

came into being [11]. For instance, the St Louis incident addition, OS designed for signal refinement can be imple-

(diphtheria anti-toxin serum contaminated by tetanus) [12] and mented in active surveillance programmes during the post-

a disaster that occurred in 1937 from the use of diethylene authorisation phase. OS studies may be also needed for

glycol as a solvent for sulfanilamide [13], did not result in any assessing the effectiveness of marketed drugs in real-practice

change in industry practice. The change eventually occurred settings. In this paper, the main focus is on OS performed in

only after the notorious case of thalidomide that upset the the post-authorisation phase for the assessment of potential

international community in 1961 [14]. adverse drug reactions (ADRs) of interest. Other usual

In response to the thalidomide disaster, the World Health pharmacovigilance activities like individual case reviews and

Organization (WHO) established its programme for Inter- disproportionality analyses of spontaneous reports are not

national Drug Monitoring promoting in this way pharmaco- discussed.

vigilance. The WHO defines pharmacovigilance as ‘‘the This opinion paper aims at reviewing the limitations of the

science and activities relating to the detection, assessment, RCTs and contributing to the discussion about the role of the

understanding and prevention of adverse effects or any other OS in the assessment of the safety profile of drugs.

drug-related problem’’.

Two recent examples highlight the need to continue 2. RCTs: strengths and weaknesses

advances in pharmacovigilance. In 2004, five years after its

introduction on the US market and more than 20 million users, When looking at evidence strength through the hierarchy of

rofecoxib, a cox-2 inhibitor, doubled the risk of myocardial evidence, RCTs are more regarded than OS, because of the

infarction and stroke [15]. Again, in 2014, France suffered from methods RCTs usually employ to handle potential biases [21–

the benfluorex tragedy [16]. Although these drugs followed 23]. Randomization and allocation concealment bypass the

strict RCT guideline, they point to the need to strengthen the possibility of a and theoretically warrant a

post-marketing surveillance, to monitor prevailing patterns of balanced distribution of confounders between the compared

drug-use within the real-life setting [17]. groups [24,25]. Blinding, when it is applicable, allows to

Recently, the adoption of new Directive and Regulation by control for several potential sources of information bias (at the

the European Parliament and Council of Ministers in December level of the patient, the investigator, the care provider and/or the

´

J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225 219

analyst). At the analysis level, the intention to treat (ITT) to as well as to various forms of selection and

principle is generally followed to minimize bias due to potential information biases [37]. For example, indication bias (i.e.,

post-randomization exclusions (related to dropouts, confounding by indication or by severity) induced by selective

deviations and missing data). This principle dictates that the prescribing, can lead to compare populations that are

analysis should be conducted respecting the original randomi- essentially non-comparable because of different levels of

zed allocation. However, this principle is far from being disease severity or potential susceptibility to ADRs. In other

homogeneously applied, and a modified ITT approach, not words, in OS, where treatment is not randomly allocated across

always clearly defined, is frequently used [26]. patients, the clinical indication and its level of severity can have

It is also common to see that the meta-analyses of RCTs are an effect on the selected treatment as well as on the outcome of

placed in the top of this [27,28]. However, interest, channeling different kind of patients to different

meta-analyses are per se observational in nature and are treatments [20,29,38,39]. Restriction by indication and new

particularly sensitive to industry sponsorship bias and other user designs are some approaches that can be used to alleviate

forms of dissemination bias [29,30]. Thus, the results of meta- the impact of confounding by indication [40,41].

analyses should be interpreted with caution [28]. Confounding resulting from the imbalance in prognostic

In the framework of the efficacy assessment, RCTs certainly variables that are also associated with the exposure of interest

represent the unquestionable gold standard [27]. However, has a pervasive influence on OS [42]. A plethora of different

RCTs may still suffer from biases because of differential potential confounders may distort the observed associations.

attrition, lack of adherence, subverted randomization and The selection and measurement of confounders is a complex

imperfect masking [25,31,32]. Moreover, conclusions drawn task. The widespread practice of adjusting by an extensive list

from RCTs are usually limited by their reduced sample size, of potential confounders in a regression model does not

short duration, strict standardized protocols and homoge- guarantee getting unbiased association measures [43]. In fact,

neously selected populations [25,33]. With respect to the latter this strategy can lead to the inclusion of intermediate variables

point, stringent criteria are usually applied to restrict the study and colliders (i.e., common effects) and introduce bias

population (e.g., excluding subjects at high risk of developing regardless [42]. The common approach of covariate selection

ADRs), which may not accurate reflect the target population. based on statistical significance (e.g., using stepwise algo-

These characteristics result in low power (inability to detect rithms) does not really adjust for confounding and usually gives

rare events), inability to detect long-term effects and low unstable results [43,44]. An alternative and more sensible

transportability (poor reflection of real-world clinical approach is one based on subject-matter knowledge and

settings). Also, many RCTs rely on surrogate endpoints as a systematic literature search for the preliminary selection of

primary criterion, and clinical relevance is not always potential confounders, assisted by causal graphs (directed

guaranteed [20,29,34]. At times RCTs may even be impractical, acyclic graphs or DAG) [45–47]. DAGs are used to formalize a

unethical or unfeasible for assessing a drug’s effectiveness and set of a priori hypotheses regarding the causal relationships in a

safety [25]. set of variables, allowing a structured exploration and

Some authors have suggested that the classical hierarchy of visualization of the relationships among these variables. The

evidence may not be appropriate for every situation graphs should be constructed based on the most complete and

[22,35]. Currently, the ranking supposes a hierarchy of updated information available provided by previous studies,

evidence based on a drug’s intended effects [36]. However, literature review and expert opinion. An illustrative example is

if the ranking takes into consideration a more holistic given by Staplin et al. [48] where DAG are used to identify and

assessment of drug’s effect, a differing hierarchy arises. Under evaluate the potential impact of adjusting for a mediator

the current paradigm, efficacy is assessed with approaches (urinary albumin-to-creatinine ratio) on the observed associa-

considered strong and objective while several safety concerns tion between smoking and progression to end-stage renal

are addressed only by ‘‘soft’’ methodologies. Then we have to disease. We acknowledge that DAGs are not free of limitations.

come to terms with the fact that as a result some important In particular, the value of the DAG depends on the background

safety-related questions will not be answered with the same knowledge for the subject of interest. For some topics,

robustness. Although, we must admit that some questions do knowledge may be very limited and a comprehensive set of

not allow an approach that can provide the same level of confounders cannot be predefined. Moreover, the selection of

confidence as RCTs, these questions should be addressed using the variables to be included in the graph, as well as their roles, is

the best methodology available to generate valid findings, and dependent on investigators’ choices and so, susceptible to

particularly in the regulatory setting. For these reasons, the biases. However, DAGs remain a valuable tool for clearly

assessment of the benefit-risk balance should be based on the formalizing the assumed relationships between the variables of

whole evidence including RCTs, OS and other pharmacovi- interest, identifying proper sets of confounders, exploring

gilance activities [27]. potential sources of bias and limiting the effects of ‘‘fishing

expeditions’’ of statistical significance. Additionally, to face the

3. Observational studies: strengths and weaknesses challenges linked to bias and lack of knowledge for some

topics, DAGs may be implemented in combination with

It is broadly recognized that inferring causal relationships change-in-estimate statistics for refining the selection and

from OS is a sensitive and complex task because they are prone exploring alternative sets of confounders [49].

´

220 J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225

In any case, we can only adjust for known and measured [53,54]. There is a large diversity of available designs for

confounders, and the quality of the adjustment depends on the OS; some important study designs used for analytic studies are

proper measurement of the variables as well as on correct shown in Table 2. These designs have various underlying

specification of the functional form assumed in the statistical assumptions, are more or less vulnerable to different biases and

models [50]. can target different estimands (e.g., transient effect vs.

In spite of the aforementioned limitations, OS can be the cumulative effect). Specific designs that minimize the impact

only realistic alternative to answer some questions that cannot of the main potential biases of concern should be used for

be addressed by RCTs. In particular, OS may provide valuable specific topics. For example, self-controlled designs are

data for the assessment of rare and delayed adverse reactions. particularly interesting for studying the potential effect of

Hence, it does not seem sensible to automatically downgrade transient, well-defined exposures and acute onset outcomes

the evidence brought by OS denying the possibility that they [55]. For this reason, the choice of the most appropriate design

address some important questions. We think that well-designed should be done carefully taking into account the multiple facets

and well-conducted OS should be able to properly fill relevant of each specific study. The interpretability of the findings of OS

gaps in the overall benefit-risk evaluation. Concato showed that is in some way hindered by their observational nature. In this

the results of well-designed OS do not necessarily produce context, replication of key results using different designs (and/

biased estimates of effects when compared with the results of or different data sources/populations/settings) is generally

RCTs [21]. Furthermore, the pooled results from OS can even needed to reinforce the weight of evidence. However,

show a lower level of heterogeneity than those obtained by performing multiple studies with the same limitations do not

pooling RCTs. This observation could be due to the fact that have a significant added value and are more likely a waste of

RCTs were performed with strict eligibility criteria, which can resources and a source of workload burden. Conversely,

differ across trials while OS target broader conditions of relevant sensitivity analyses should be generally implemented

utilization and populations [21,51]. The strengths and in order to check the robustness of the primary results against

weaknesses of RCT and OS were largely discussed in the possible violations of the underlying assumptions of the

literature (e.g., [36,51,52]). Table 1 shows a short comparison primary analysis [56–58].

of these approaches. The potential impact of indication bias should be carefully

taken into account at the time of protocol development

4. Observational data: methodological framework [25,59,60]. Depending on the study design and the quality of

the data, other potential sources of systematic error include

An OS with a rigorous methodological framework allows protopathic bias, recall bias, differential misclassification bias,

the study to generate more reliable answers. Depending on the non-participation bias in surveys, selection bias in case-control

study design, the data sources and the characteristics of the studies, among others. These different biases can distort

variables of interest, different OS are more or less affected by the interpretation of OS and result in misleading findings

different potential biases and can give different results [61–66]. Note also that non-differential misclassification does

Table 1

Comparison between randomized clinical trials and observational studies.

Study design Characteristics Strengths Limitations

Randomized clinical trials Randomization and allocation Good internal validity Limited external validity

concealment – Exposure of interest defined by protocol – Patients and practitioners not

Blinding (when feasible) – Standardized measures for the representative of real clinical setting

Focus on efficacy (in general) outcomes (minimizes misclassification) – Subpopulations not or under

Primary analysis based on the ITT – Strict experimental settings (avoids represented (elderly, children, pregnant

principle bias) women, patients with comorbidities)

– Control (theoretical) for measured and Potential bias due to post-randomization

unmeasured confounders events (protocol deviations, dropouts)

Frequent use of surrogate variables

(restricts clinical interpretability)

Limited sample size (restricts detection of

rare effects)

Limited follow-up (restricts detection of

delayed effects)

Observational studies Exposure of interest not randomly Good external validity Limited internal validity

selected – Data observed in naturalistic settings – Prone to selection and information bias

Focus on effectiveness or safety Ability to study rare events – Vulnerable to confounding by

Several methodological approaches Ability to study events with long latency indication and unmeasured/residual

(cohort, case-control, cross-sectional, Potential use of multiple sources of confounding

case-only design) existing data

´

J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225 221

Table 2

Comparison between different study design types for observational studies.

a

Study design Advantages Disadvantages

Cohort Temporal relationship between variables well Long follow-up

established Prone to attrition and other forms of selection bias

Better definition/description of exposure status and Expensive

baseline characteristics (if prospective) Time-consuming

Allow study of multiple outcomes

b

Case-control Relatively quick and inexpensive Prone to recall and other forms of misclassification bias

Appropriate for rare and long-latency outcomes Prone to selection bias (for the selection of controls)

Allow study of multiple exposures Less control of potential confounders

Incidences cannot be estimated

c

Self-controlled (case-only) Intrinsic control for time-constant confounders and Do not control for time-varying confounders

selection factors Not appropriate for cumulative/sustained effects and

Appropriate for transient/intermittent exposures effects with insidious onset

Appropriate for acute onset outcomes Exposure status (including the definition of the window

risk) should be well defined

May be vulnerable to time trends in exposure

a

Note that the list is not exhaustive (e.g., descriptive designs like cross-sectional were not included).

b

Refers to traditional case-control studies (variants like the nested case-control design have some particularities).

c

Some differences exist between different subtypes (e.g., self-controlled vs. case-crossover).

not always result in bias towards the null, as it is sometimes reasonable bias parameters. However, it should be noted that

assumed [67,68]. even with limited knowledge about these parameters, an

An important observation is that indication and selection educated guess would be more informative than simply stating

bias would theoretically have limited impact when evaluating that results should be interpreted with caution (without

the risk for unintended or unanticipated effects like unexpected presenting giving more details about the possible direction

ADRs because the assignment to the treatment is not expected and magnitude of bias) [74]. If QBA is not deemed as necessary

to be affected by these factors [19,69,70]. However, this or feasible, a deep qualitative bias analysis should be conducted

observation should be nuanced because frail patients may be nonetheless.

more prone to ADRs and being treated differently according to The issue of potential confounding should be deeply

their risk profile as perceived by the prescribers. investigated before starting the study. As previously stated,

There are different analytical approaches when handling literature review and subject-matter expert opinion are needed

bias and confounding [25,42]. For example, high-dimensional to define a preliminary list of important potential confounders

propensity scores and marginal structural models can be [47]. The selection of the data sources should take into account

considered when respectively controlling for a huge number of the possibility of properly measuring the key potential

confounders and time-dependent confounders. [71,72]. Howe- confounders. The basic strategies for control of confounders

ver, some biases may be very difficult to adjust for after data should be defined in the protocol. Stepwise selection and

collection if the study was not appropriately designed similar strategies purely based on statistical testing are not

[56]. Therefore, a proper implementation of the statistical recommended for model selection. A sensible approach is the

analyses is heavily dependent on the design of the study and the combination of change-in-estimate methods and causal graphs

capacity to pre-specify these analyses. [49]. Caution should be taken in order to prevent the inclusion

The use of quantitative bias analysis (QBA), which of mediators or colliders in the model as confounders. If some

comprises a broad range of methods, from relatively simple important confounders cannot be properly measured, then the

sensitivity analyses as well as complex, multiple-bias assessment of the impact of residual confounding may be

probabilistic analyses, can be a significant improvement when explored by QBA [72]. Moreover, the potential effect of

assessing the impact of the different kind of biases. unmeasured confounding should be also discussed. If deemed

QBA provides clear figures on the impact these biases can feasible, adjustment using external validation data may be also

have on the magnitude and the direction of the observed considered [75,76].

association measures [73]. Hence, the level of uncertainty Data dredging may also be problematic. In some studies

linked to potential systematic errors and the robustness of the many variables are defined as secondary or even tertiary

results can be assessed. For example, different plausible values outcomes. Multiplicity of tests is accentuated by subgroup and

can be assigned for some bias parameters. These values can be sensitivity analyses. In purely descriptive studies, a huge

selected using different sources of information (e.g., external number of potential associations are explored in order to find

validation studies, expert opinion, literature review). Then, a potential signals. In these cases, the results should be

range of bias-corrected estimates compatible with the data and interpreted with extreme caution because the probability of

the assumptions made can be calculated [72]. In order to finding false positives (i.e. chance findings) increases with the

be properly performed, QBA requires the availability of degree of multiplicity [77–79]. Furthermore, the multiplicity of

´

222 J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225

OS of low-quality can produce contradictory and confusing The was matched for age, sex and practice, and

results impeding a proper evaluation. Exploratory studies are several covariates were adjusted in the analysis (including

important for hypothesis generation (i.e. signal detection), but relevant medical history and co-medication). A case-crossover

data mining exercises should be avoided for signal evaluation analysis was performed as a sensitivity analysis and

because they can produce false alarms, add confusion and erode confirmed the positive association between SCD and dompe-

the credibility of epidemiologic studies. Data mining can be ridone (OR = 3.33; 95% CI: 1.87–5.92). Subgroup analyses

used as a tool for signal detection but due to several important suggested that the risk was higher for the highest dose category

limitations it cannot be used for assessing causality. Hence, the (> 30 mg/day).

signals originated with these techniques always require further Following the aforementioned findings and taking into

in-depth evaluation. In this context, the development of consideration the whole of evidence, the European Medicines

rigorous protocols with specific questions, clear objectives, Agency made recommendations concerning the use of

well-defined exposures and outcomes and pre-specified domperidone, in particular with respect to the maximum dose,

statistical methodology is an essential step in order to generate the duration of use and the use in high-risk groups [94].

scientifically valid conclusions [18,19,61]. It is also crucial to Additionally, a recent study conducted in Taiwan using a

be cautious with possible over-interpretation and reporting case-crossover approach reported a statistical significant

biases that could be linked with conflicts of interest or reflect association between domperidone use and ventricular arrhyth-

cognitive interpretive biases [80]. Hence, transparent and mia [95]. The observed effect was stronger with the highest

comprehensive reporting has a critical importance [81]. The daily dose category (> 30 mg/day).

STROBE Statement gives recommendations that can be

considered as good reporting practices for OS [82,83]. 6. General recommendations

When neither classical RCTs nor OS are sensible approa-

ches for a specific topic (e.g., comparing the effectiveness of We provide below a summary of points that should be taken

drugs for the same indication in a clinical equipoise scenario), into account to optimize the use of evidence generated by

pragmatic trials can be considered [84–86]. observational data, in particular concerning the assessment of

the safety profile of drugs.

5. Use of observational data in regulatory actions: an

illustrative example  it is acknowledged that RCTs cannot provide all the

necessary information for logistical or ethical reasons. In

Domperidone is a peripherally acting dopamine 2-receptor particular, RCTs have some inherent limitations and are not

antagonist marketed in Europe since 1978 for the relief of optimal for assessing rare, long latency, or unexpected ADRs

symptoms of nausea and vomiting. In the 80s, case reports [19]. Instead, evidence generated by observational data

revealed an association between the IV formulation of should be considered and carefully integrated in the

domperidone and QTc prolongation [87]. As a result, the IV assessment. OS similarly present some typical weaknesses

formulation was withdrawn from the market in 1986. However, and so, the limitations should be explored, clearly exposed

the risk of serious cardiac events associated with the and the findings should be cautiously interpreted [19]. How-

oral formulation was unknown. Consequentially some ever, the lack of a ‘‘perfect’’ approach does not justify a lack

population-based case-control studies suggested a potential of action. Decisions should be taken based on the whole

association between domperidone and serious cardiac events available evidence all while acknowledging that at a certain

[88–90]. These findings acted as a trigger for the pharmaco- level of uncertainty is unavoidable. Obviously, different

vigilance working party to request the conduct a thorough sources of evidence have different levels of quality, and

QT/QTc (TQT) study and a PASS to provide important should be weighted accordingly. The GRADE approach for

additional evidence about the observed association. The results rating the quality of evidence provides a useful framework in

of the TQT study conducted in 2012 were reassuring because this area [96]. The consideration of different dimensions

they did not show an association between the risk of QT (risk of bias, imprecision, inconsistency, indirectness,

prolongation and the use of domperidone at the recommended publication bias) allows to upgrade or downgrade the quality

doses [91]. It should be noted that a supra-therapeutic dose, as of evidence of a body of evidence, either observational or

typically defined in TQT studies (i.e., a 5-fold increase over the experimental [97];

recommended dose), was not included in this study. The  the potential bias linked to the lack of concealed

inclusion of the supra-therapeutic dose is recommended in randomization is expected to be of lower concern when

order to explore the effect at higher exposure levels that can be studying unintended effects. Hence, OS are a better option for

potentially attained in real practice [92]. assessing the potential risks that were not a priori suspected to

The main goal of the PASS conducted in 2013 was to assess be linked to the drug of interest [19];

the potential association between the risk of out-of-hospital  there are a vast number of study designs and analytical

sudden cardiac death (SCD) and the current use of domperidone methods available for use with regards to observational data.

[93]. The primary results, using a nested case-control design, Some approaches can be particularly useful for attenuating

showed an increase in the risk of sudden cardiac death with the the impact of confounding and selection bias (e.g., self-

current use of domperidone (OR = 2.09; 95% CI: 1.16–3.74). controlled designs, new user designs, propensity scores). It is

´

J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225 223

important to develop standards and guidelines (based on  given the potentially high vulnerability to multiple sources of

theoretical development and empirical evaluations) to enable biases, it is extremely important to stimulate as much as

researchers to apply the best approaches in different scenarios possible the highest level of transparency across all the steps

(depending on the setting, outcome, exposure and sources). of the research [100]. We consider that a protocol should be

Indeed, simulations can be performed to explore the developed as soon as possible and submitted allowing proper

operational characteristics of the different scenarios. The assessment of its feasibility and appropriateness. Protocols

widespread availability of electronic data sources (electronic should be also registered (e.g., in The European Union

medical records, claims databases, registries) represents an electronic Register of PASS) to mitigate dissemination bias.

opportunity to further investigate these methodological Operational definitions of the variables of interest, data

topics [98]; sources and methods for measuring the variables should be

 different designs may be more or less vulnerable to certain clearly and comprehensively defined. A pre-specified

biases in different situations. If multiple studies are statistical analysis plan (SAP) should be also included. If a

performed, it is important to not replicate results with the more detailed SAP is needed, then it should be developed

same sources of biases. This can lead to an excess of burden before the start of data analysis and database lock. It is

in the assessment process and misleading overconfidence on acknowledged that some modifications will probably be

the findings. For example, meta-analyses of similar OS needed; all protocol amendments and changes in the SAP

probably will tend to produce estimates with falsely increased should be tracked and justified. These good practices will

precision. It is more informative to hypothesize about help to increase the confidence on the findings, reducing the

reasonable alternative explanations and explore them using risk of ‘‘picking-the-cherry’’ exercises.

different approaches (study design, data source and analytical

method);

 some methodological approaches have a great potential for

7. Conclusion

use in pharmacoepidemiological research but are still

underused (e.g., DAG and QBA). QBA is an interesting

Downgrading systematically the results from OS is not an

alternative to the overused sentences like ‘‘residual

option. We need to learn from the history of pharmacovigilance

confounding cannot be excluded’’ or ‘‘case-control studies

and pharmaceutical development which stresses the important

are prone to selection bias’’. These statements, even if

contribution made by OS to improve our knowledge of drug

irrefutable, have low informative value if they are not safety.

accompanied by an analysis of their potential effects.

The new European pharmacovigilance legislation, launched

Unfortunately, sometimes the discussion of the limitations

in July 2012, provides the regulatory agencies and the

is relatively simplistic and uninformative. We think that a

pharmaceutical industry with the possibility to conduct PASS.

deep assessment of the potential effect of bias should be

When properly conducted, these studies can contribute to

systematically done in order to bring relevant added value in

address specific safety questions such as long-term safety and

the framework. QBA provides a formal

can capture a better picture about the actual use of the drug

framework that boosts the critical thinking about the potential

through Drug Utilization Studies. This opportunity should be

impact of the biases and offers a range of plausible values for

thoroughly exploited and encouraged, without prejudice to the

this impact [72]. Recently, some efforts have been made to

quality of RCTs. Both tools have to be viewed as

develop and stimulate the use of QBA tools in regulatory

complementing each other.

settings [99];

However, in order to enhance the contribution of OS to

 DAGs, developed in the framework of causal inference, can

regulatory actions, it is important to develop the proper

be extremely valuable in the selection of potential

guidelines and quality standards in regards to their characte-

confounders to be included in the models as well as for

ristics (design, conduct, analysis, reporting). Such rules would

the exploration of potential sources of bias [46]. This

allow the proper assessment of OS and their contribution to the

approach allows an explicit and reasoned selection of an

hierarchy of evidence.

optimal minimal set of confounders, based on theoretical

rules and subject-matter specific knowledge. On the contrary,

Disclosure of interest

when following automatic selection algorithms, confounders

not identified statistically are not kept in the models while

The authors declare that they have no competing interest.

mediators and colliders may be included. It can be said that Acknowledgements

behind the elaboration of DAGs (as well as of QBA) some

subjective decisions should be taken. We think it is at some

We gratefully acknowledge Zineb Belghiti for reading the

level unavoidable but at least, it represents an important step

manuscript and suggestions given.

forward in the formalization of the recognized limitations of

the findings. The use of clear and well-justified methodolog- References

ical approaches, of systematic literature reviews and of

clearly stated educated guesses, all contribute to improve the [1] Bynum WF. Science and the practice of medicine in the nineteenth

transparency and the credibility of the results; century; 1994 [Cambridge University Press].

´

224 J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225

[2] Whelton PK, Gordis L. Epidemiology of clinical medicine. Epidemiol upload/docs/application/pdf/2011-01/guide_methodologique_

Rev 2000;22:140–4. recommandations_pour_la_pratique_clinique.pdf].

[3] Flexner A. Medical education in the United States and Canada: a report to [29] Gibbons RD, Amatya AK. Statistical methods for drug safety. 1st ed, 2015

the Carnegie Foundation for the advancement of teaching, 4. Boston: [Chapman and Hall/CRC].

Updike, The Merrymount Press; 1910. [30] Song F, Parekh S, Hooper L, Loke YK, Ryder J, Sutton AJ, et al.

[4] Evidence-Based Medicine Working Group. Evidence-based medicine. A Dissemination and publication of research findings: an updated review

new approach to teaching the practice of medicine. JAMA of related biases. Health Technol Assess 2010;14:1–193.

1992;268:2420–5. [31] Hammad TA, Pinheiro SP, Neyarapally GA. Secondary use of randomized

[5] Liebenau J. Paul Ehrlich as a commercial scientist and research adminis- controlled trials to evaluate drug safety: a review of methodological

trator. Med Hist 1990;34:65–78. considerations. Clin Trials 2011;8:559–70.

[6] Schwartz RS. Paul Ehrlich’s magic bullets. N Engl J Med [32] Marx RE. The deception and fallacies of sponsored randomized prospec-

2004;350:1079–80. tive double-blinded clinical trials: the bisphosphonate research example.

[7] Goldfine ID, Youngren JF. Contributions of the American Journal of Int J Oral Maxillofac Implants 2014;29:e37–44.

Physiology to the discovery of insulin. Am J Physiol 1998;274:E207–9. [33] Avorn J. In defense of pharmacoepidemiology – embracing the yin and

[8] Opinel A, Tro¨hler U, Gluud C, Gachelin G, Smith GD, Podolsky SH, et al. yang of drug research. N Engl J Med 2007;357:2219–21.

Commentary: the evolution of methods to assess the effects of treatments, [34] Bejan-Angoulvant T, Cornu C, Archambault P, Tudrej B, Audier P,

illustrated by the development of treatments for diphteria, 1825–1918. Int Brabant Y, et al. Is HbA1c a valid surrogate for macrovascular and

J Epidemiol 2013;42:662–76. microvascular complications in type 2 diabetes? Diabetes Metab

[9] Kennedy HL. The importance of randomised clinical trials and evidence 2015;41:195–201.

based medicine: a clinician’s perspective. Clin Cardiol 1999;22:6–12. [35] Lapeyre-Mestre M, Sape`de C, Moore N, Bilbault P, Blin P, Chopy D, et al.

[10] Marks H. The Progress of . Science and therapeutic reform Pharmacoepidemiology studies: what levels of evidence and how can they

in the United States. [Cambridge University Press]In: 1900-1990, ; 1st ed, be reached? Therapie 2013;68:241–52.

2000. [36] Vandenbroucke JP. Observational research, randomised trials, and two

[11] Avorn J. Two centuries of assessing drug risks. N Engl J Med views of medical science. PLoS Med 2008;5(3):e67.

2012;367:193–7. [37] Carlson MDA, Morrison RS. Study design, precision, and validity in

[12] Dehovitz RE. The 1901 St Louis incident: the first modern medical observational studies. J Palliat Med 2009;12:77–82.

disaster. Pediatrics 2014;133:964–5. [38] MacDonald TM, Morant SV, Goldstein JL, Burke TA, Pettitt D. Channel-

[13] Routledge P. 150 years of pharmacovigilance. Lancet ling bias and the incidence of gastrointestinal haemorrhage in users of

1998;351:1200–1. meloxicam, coxibs, and older, non-specific non-steroidal anti-inflamma-

[14] Diggle GE. Thalidomide: forty years ago. Int J Clin Pract 2001;55:627–31. tory drugs. Gut 2003;52:1265–70.

[15] Avorn J. Dangerous deception – Hiding the evidence of adverse drug [39] Hajian Tilaki K. Methodological issues of confounding in analytical

effects. N Engl J Med 2006;355:2169–71. epidemiologic studies. Caspian J Intern Med 2012;3:488–95.

[16] Szymanski C, Andre´jak M, Peltier M, Mare´chaux S, Tribouilloy C. [40] Joseph KS, Mehrabadi A, Lisonkova S. Confounding by indication and

Adverse effects of benfluorex on heart valves and pulmonary circulation. related concepts. Curr Epidemiol Rep 2014;1:1–8.

Pharmacoepidemiol Drug Saf 2014;23:679–86. [41] Yoshida K, Solomon D, Kim SC. Active-comparator design and new-user

[17] Borg JJ. Strengthening and rationalizing pharmacovigilance in the EU: design in observational studies. Nat Rev Rheumatol 2015;11:437–41.

where is Europe heading to? A review of the new EU legislation on [42] Kamangar F. Confounding variables in epidemiologic studies: basics and

pharmacovigilance. Drug Saf 2011;34:187–97. beyond. Arch Iran Med 2012;15:508–16.

[18] Giezen TJ, Mantel-Teeuwisse AK, Straus SMJM, Egberts TCG, Black- [43] Greenland S, Daniel R, Pearce N. Outcome modelling strategies in

burn S, Persson I, et al. Evaluation of post-authorization safety studies in epidemiology: traditional methods and basic alternatives. Int J Epidemiol

the first cohort of EU risk management plans at time of regulatory 2016;45:565–75.

approval. Drug Saf 2009;32:1175–87. [44] Walter S, Tiemeier H. Variable selection: current practice in epidemio-

[19] Kiri VA. A pathway to improved prospective observational post-authori- logical research. Eur J Epidemiol 2009;24:733–6.

zation safety studies. Drug Saf 2012;35:711–24. [45] Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic

[20] Evans SJW. An agenda for UK clinical pharmacology: pharmacoepide- research. Epidemiology 1999;10:37–48.

miology. BJCP 2012;73:973–8. [46] Shrier I, Platt RW. Reducing bias through directed acyclic graphs. BMC

[21] Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observa- Med Res Methodol 2008;8:70. http://dx.doi.org/10.1186/1471-2288-8-70.

tional studies, and the hierarchy of research designs. N Engl J Med [47] Uddin MJ, Groenwold RH, Ali MS, de Boer A, Roes KC, Chowdhury MA,

2000;342:1887–92. et al. Methods to control for unmeasured confounding in pharmacoepi-

[22] Concato J. Observational versus experimental studies: what’s the evidence demiology: an overview. Int J Pharm 2016;38:714–23.

for a hierarchy? NeuroRx 2004;1:341–7. [48] Staplin N, Herrington W, Judge P, Reith C, Haynes R, Landray M, et al.

[23] Hoppe DJ, Schemitsch EH, Morshed S, Tornetta P, Bhandari M. Hierarchy Use of causal diagrams to inform the design and interpretation of

of evidence: where observational studies fit in and why we need them. J observational studies: an example from the study of heart and renal

Bone Joint Surg Am 2009;91(3):2–9. protection (SHARP). Clin J Am Soc Nephrol 2016. http://dx.doi.org/

[24] Kunz R, Oxman AD. The unpredictability paradox: review of empirical 10.2215/CJN.02430316CJASN.

comparisons of randomised and non-randomised clinical trials. BMJ [49] Evans D, Chaix B, Lobbedez T, Verger C, Flahault A. Combining directed

1998;317:1185–90. acyclic graphs and the change-in-estimate procedure as a novel approach

[25] Lu CY. Observational studies: a review of study designs, challenges and to adjustment-variable selection in epidemiology. BMC Med Res Metho-

strategies to reduce confounding. Int J Clin Pract 2009;63:691–7. dol 2012;12:156. http://dx.doi.org/10.1186/1471-2288-12-156.

[26] Abraha I, Montedori A. Modified intention to treat reporting in random- [50] Fewell Z, Davey Smith G, Sterne JA. The impact of residual and

ised controlled trials: . BMJ 2010;340:c2697. http:// unmeasured confounding in epidemiologic studies: a simulation study.

dx.doi.org/10.1136/bmj.c2697. Am J Epidemiol 2007;166:646–55.

[27] Grootendorst DC, Jager J, Zoccali C, Dekker W. Observational studies are [51] Jane-wit D, Horwitz RI, Concato J. Variation in results from

complementary to randomized controlled trials. Nephron Clin Pract randomized, controlled trials: stochastic or systematic? J Clin Epidemiol

2010;114 [c173-7]. 2010;63:56–63.

´

[28] Haute Autorite´ de Sante´. Elaboration de recommandations de bonne [52] Booth CM, Tannock IF. Randomised controlled trials and population-

pratique. In: Me´thode. « Recommandations pour la pratique clinique ». based observational research: partners in the evolution of medical evi-

Guide me´thodologique.; 2010 [HAS http://www.has-sante.fr/portail/ dence. Br J Cancer 2014;110:551–5.

´

J. Sawchik et al. / Revue d’Epide´miologie et de Sante´ Publique 66 (2018) 217–225 225

[53] Reynolds RF, Kurz X, de Groot MC, Schlienger RG, Grimaldi-Bensouda [78] Ioannidis JP. Why most discovered true associations are inflated. Epide-

L, Tcherny-Lessenot S, et al. The IMI PROTECT project: purpose, miology 2008;19:640–8.

organizational structure, and procedures. Pharmacoepidemiol Drug Saf [79] Smith GD, Ebrahim S. Data dredging, bias, or confounding. BMJ

2016;25(1):5–10. 2002;325:1437–8.

[54] Klungel OH, Kurz X, de Groot MC, Schlienger RG, Tcherny-Lessenot S, [80] Seshia SS, Makhinson M, Young GB. Cognitive biases plus:

Grimaldi L, et al. Multi-centre, multi-database studies with common covert subverters of healthcare evidence. Evid Based Med 2016;21:41–5.

protocols: lessons learnt from the IMI PROTECT project. Pharmacoepi- [81] Hemkens LG, Benchimol EI, Langan SM, Briel M, Kasenda B, Januel JM,

demiol Drug Saf 2016;25(1):156–65. et al. The reporting of studies using routinely collected health data was

[55] Hallas J, Pottega˚rd A. Use of self-controlled designs in pharmacoepide- often insufficient. J Clin Epidemiol 2016. http://dx.doi.org/10.1016/j.jcli-

miology. J intern Med 2014;275:581–9. nepi.2016.06.005.

[56] Geneletti S, Mason A, Best N. Adjusting for selection bias: why sensitivity [82] von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vanden-

analysis is the only ‘‘solution’’. Epidemiology 2011;22:36–9. broucke JP. Strengthening the reporting of observational studies in

[57] Morris TP, Kahan BC, White IR. Choosing sensitivity analyses for epidemiology (STROBE) statement: guidelines for reporting observation-

randomised trials: principles. BMC Med Res Methodol 2014;14:11. al studies. Int J Surg 2014;12:1495–9.

[58] Thabane L, Mbuagbaw M, Zhang S, Samaan Z, Marcucci M, Ye C, et al. A [83] Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD,

tutorial on sensitivity analyses in clinical trials: the what, why, when and Pocock SJ, et al. Strengthening the reporting of observational studies in

how. BMC Med Res Methodol 2013;13:92. http://dx.doi.org/10.1186/ epidemiology (STROBE): explanation and elaboration. Int J Surg

1471-2288-13-92. 2014;12:1500–24.

[59] Jepsen P, Johnsen SP, Gillman MW, Sørensen HT. Interpretation of [84] Roehr B. The appeal of large simple trials. BMJ 2013;346:f1317.

observational studies. Heart 2004;90:956–60. [85] Ford I, Norrie J. Pragmatic trials. N Engl J Med 2016;375:454–63.

[60] Joffe MM. Confounding by indication: the case of calcium channel [86] Schwartz D, Lellouch J. Explanatory and pragmatic attitudes in thera-

blockers. Pharmacoepidemiol Drug Saf 2000;9:37–41. peutical trials. J Chronic Dis 1967;20:637–48.

´

[61] Auzerie J. Etudes observationnelles en pharmaco-e´pide´miologie: retour [87] Rossi M, Giorgi G. Domperidone and long QT syndrome. Curr Drug Saf

d’expe´riences et propositions pour une approche pragmatique de la 2010;5:257–62.

conception, de la conduite et de l’analyse des e´tudes. La lettre du [88] Van Noord C, Dieleman JP, van Herpen G, Verhamme K, Sturkenboom

pharmacologue, 23. 2009;p. 20–8. MC. Domperidone and ventricular arrhythmia or sudden cardiac death:a

[62] Baena A, Garce´s-Palacio IC, Grisales H. The effect of misclassification population-based case-control study in the Netherlands. Drug Saf

error on risk estimation in case-control studies. Rev Bras Epidemiol 2010;33:1003–14.

2015;18:341–56. [89] De Bruin ML, Langendijk PN, Koopmans RP, Wilde AA, Leufkens HG,

[63] Coughlin SS. Recall bias in epidemiologic studies. J Clin Epidemiol Hoes AW. In-hospital cardiac arrest is associated with use of non-

1990;43:87–91. antiarrhythmic QTc-prolonging drugs. Br J Clin Pharmacol

[64] Delgado-Rodrı´guez M, Llorca J. Bias. J Epidemiol Community Health 2007;63:216–23.

2004;58:635–41. [90] Johannes CB, Varas-Lorenzo C, McQuay LJ, Midkiff KD, Fife D. Risk of

[65] Horwitz RI, Feinstein AR. The problem of ‘‘protopathic bias’’ in case- serious ventricular arrhythmia and sudden cardiac death in a cohort of

control studies. Am J Med 1980;68:255–8. users of domperidone: a nested case-control study. Pharmacoepidemiol

[66] Jurek AM, Maldonado G, Greenland S, Church TR. Exposure-measure- Drug Saf 2010;19:881–8.

ment error is frequently ignored when interpreting epidemiologic study [91] Biewenga J, Keung C, Solanki B, Natarajan J, Leitz G, Deleu S, et al.

results. Eur J Epidemiol 2006;21:871–6. Absence of QTc prolongation with domperidone: a randomized, double-

[67] Pekkanen J1, Sunyer J, Chinn S. Nondifferential disease misclassification blind, placebo- and positive-controlled thorough QT/QTc study in healthy

may bias incidence risk ratios away from the null. J Clin Epidemiol volunteers. Clin Pharmacol Drug Dev 2015;4:41–8.

2006;59:281–9. [92] Dmitrienko !, Beasley C, Mitchell M. Design and analysis of thorough

[68] Strom BL. Pharmacoepidemiology. 5th edition, 2012 [Wiley-Blackwell]. QT studies; 2008 [Biopharmaceutical Network http://www.biopharmnet.

[69] McMahon AD, MacDonald TM. Design issues for drug epidemiology. Br com/cardiac.html].

J Clin Pharmacol 2000;50:419–25. [93] Navarro AA. Post-authorisation safety study: risk of out-of-hospital

[70] Vandenbroucke JP. When are observational studies as credible as ran- sudden cardiac death in users of domperidone, users of proton pump

domised trials? Lancet 2004;363:1728–31. inhibitors and users of metoclopramide; 2014 [Version 1.0. ENCEPP/

[71] Rassen JA, Schneeweiss S. Using high-dimensional propensity scores to SDPP/3590].

automate confounding control in a distributed medical product safety [94] Pharmacovigilance Risk Assessment E.M.A. Committee. Domperidone

surveillance system. Pharmacoepidemiol Drug Saf 2012;21(1):41–9. assessment report; 2014 [EMA/152501/2014].

[72] Robins JM, Herna´n MA, Brumback B. Marginal structural models and [95] Chen HL, Hsiao FY. Domperidone, cytochrome P450 3A4 isoenzyme

causal inference in epidemiology. Epidemiology 2000;11:550–60. inhibitors and ventricular arrhythmia: a nationwide case-crossover study.

[73] Lash TL, Fox MP, Fink AK. Applying quantitative bias analysis to Pharmacoepidemiol Drug Saf 2015;24:841–8.

epidemiologic data. 1st ed, New York: Springer; 2011. [96] Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P,

[74] Phillips CV. Quantifying and reporting uncertainty from systematic errors. et al. GRADE: an emerging consensus on rating quality of evidence and

Epidemiology 2003;14:459–66. strength of recommendations. BMJ 2008;336:924–6.

[75] Stu¨rmer T, Glynn RJ, Rothman KJ, Avorn J, Schneeweiss S. Adjustments [97] Schneeweiss S. Learning from big health care data. N Engl J Med

for unmeasured confounders in pharmacoepidemiologic database studies 2014;370:2161–3.

using external information. Med Care 2007;45:S158–65. [98] Lash TL, Fox MP, Cooney D, Lu Y, Forshee RA. Quantitative bias analysis

[76] Schneeweiss S. Sensitivity analysis and external adjustment for unmea- in regulatory settings. Am J Public Health 2016;106:1227–30.

sured confounders in epidemiologic database studies of therapeutics. [99] Rotelli MD. Ethical considerations for increased transparency and repro-

Pharmacoepidemiol Drug Saf 2006;15:291–303. ducibility in the retrospective analysis of health care data. Ther Innov

[77] Ioannidis JP. Why most published research findings are false. PLoS Med Regul Sci 2015;49:342–7. 2005;2(8):e124.