<<

Epidemiology and Econometrics: Two Sides of the Same Coin or Different ?

Marcella Vigneri1 , Edoardo Masset1 , Josephine Exley2, Howard White3 1CEDIL, 2London School of Hygiene and Tropical Medicine, 3Campbell Collaboration

Centre of Excellence in Impact Evaluation and Learning (CEDIL) Inception Paper

Draft version: 19 January 2018

Abstract This paper discusses differences observed in the practice, conduct and reporting of evaluation studies conducted by and clinical epidemiologists. There has been much informal debate around the reasons for these differences and the missed opportunities that they lead to, but we are not aware of any formal review of the key areas where these disciplines diverge and why. This paper fills this gap, drawing on a workshop organized by the Centre of Excellence in Impact Evaluation and Learning (CEDIL) with evaluation specialists from both disciplines, published papers and a systematic approach to illustrate idiosyncrasies from each discipline. The paper also includes an - evaluation glossary that speaks to both disciplines to facilitate cross learning and synergies between the evaluations that they do 1. Introduction Economics and epidemiology are disciplines that share a research in investigating and measuring the impact of interventions in international development. In a 2016 contribution to the American Journal of , Spiegelman (Spiegelman, 2016) claimed that similarities in the conduct of impact evaluations across disciplines like epidemiology and economics overwhelm the differences, and that the distinctions that have been made confuse the underlying common ground. She broadly defined impact evaluation as a method for assessing the efficacy and effectiveness of an intervention in terms of intended and unintended health, social, and economic outcomes, and involving the explicit statement of a counterfactual.

While the principles of Spiegelman’s definition are clear, it is also important to recognize and understand the boundaries that separate how economists and epidemiologists conduct evaluations. Although the principles of their analytical methods are similar (by research questions, type of intervention, intended goal of the intervention), their practice for conducting and reporting evaluations is often different. This paper addresses these boundaries by reviewing evaluation practices of economists and epidemiologists. For the purposes of this paper, we would include the work done by health economists within epidemiological evaluations (for instance to estimate the cost effectiveness of an intervention) as “epidemiology” rather than “economics”.

The paper was initially motivated by two debates around differences in these practices. The first debate - led by CEDIL Intellectual Leadership Team members - looked at how the increase in research on global health and international development by scholars in economics (who typically have greater influence on LMIC1 government policymakers in using evidence than researchers in clinical epidemiology), has generated a critical gap in the research methods and practice of evaluation tools between the impact evaluations they do and those within the discipline of clinical epidemiology. This paper builds on this debate to compare the two different approaches, highlighting similarities and differences, and areas for cross learning. The second debate comes from a research paper written by economists and epidemiologists based at the London School of Hygiene and Tropical Medicine (LSHTM) (Powell-Jackson et al., forthcoming 2018) on the practice of randomized trials by the two disciplines. Those authors suggest that evidence-based public health can be strengthened by a better understanding of the differences in the use, practice and reporting of results from trials, and in so doing promote cross-disciplinary learning.

In response to these debates, the CEDIL organized a workshop in November 2017 gathering economists and epidemiologists specialising in evaluation research to discuss areas of overlap and of . In preparation for the workshop we collected some evidence of differences in approaches between economists and epidemiologists in two ways. First, we selected a number of systematic reviews of interventions in early child development/nutrition specific programmes and in Conditional Cash Transfer (CCT) programmes, and used the discussion and results of the reviews to reflect on approaches and differences between primary researchers and reviewers in the two disciplines. Second, we identified a few publications of project types evaluated by economists and by epidemiologists to identify main methodological differences in the conduct of impact evaluations.

1 Low and Middle-Income Countries

1

This paper resulted as the critical elaboration of what emerged in the workshop. Section 2 identifies areas where and epidemiologist differ in the practice of evaluation studies. After a brief description of the issues relevant in each area, examples from peer reviewed papers in each discipline are used to illustrate differences in practice. The common themes selected for this exercise are conditional cash transfers (CCT), bednets, WASH (Water, Sanitation and Hygiene), anti- retroviral treatment (ART) for the reduction of HIV incidence, and gender interventions to reduce the risk of Intimate Partner Violence (IPV). Section 3 draws reflections from comparing systematic reviews in health insurance, school feeding and conditional and unconditional cash transfers programmes to illustrate further differences in the evaluation methods used in the two disciplines. Section 4 explains the rationale for collating a unified “epi-econ” evaluation glossary (included in the annex) to compare terms used by the two disciplines and try to reduce the clashes which often cloud cross-discipline understanding of evaluation studies. This glossary will help evaluation researchers to identify a common lexicon of their different approaches. Section 5 concludes the paper, reflecting on learning opportunities across the two approaches.

2. Comparing economic and epidemiology approaches to evaluation How do economists and epidemiologists conduct evaluation studies? In general, economists are interested in measuring the success of a program—where success can be broadly or narrowly defined. Their goal is to differentiate less effective interventions from successful ones, and help improve existing programs. The primary purpose of impact evaluation for economists is to determine whether a program has an impact (on a set of key outcomes), and more specifically, to quantify how large that impact is (insert ref. from J-Pal). On the other hand, epidemiologists operating in public health services conduct evaluations as a process for determining, as systematically and objectively as possible, the likely future relevance, effectiveness, , and impact of activities with respect to established health outcomes (insert ref. from Centres for Disease Control and Prevention).

One could say that although evaluations conducted in both disciplines share the common goal to identify presence or absence of impact, their ‘journey’ to an estimate of impact differs in many ways. In the November 2017 workshop, we identified eight key areas where differences in evaluation practices are prominent: 1. The use of theory to underpin model evaluation; 2. The generalizability and transferability of findings from the specifics of an individual case study; 3. The definition of research questions; 4. Study design, and endpoint/outcome measure of impact; 5. Power calculations and powered studies; 6. Publication of protocols and pre-specified outcomes; 7. Reporting findings, and timing of publications; and 8. Replicability. We discuss each of these below to highlight differences. A word of caution is warranted in reading the discussion that follows: while the paper reviews issues that often contrast how economist and epidemiologists conduct evaluation studies, by no it intends to claim universal truths. There are, of course, evaluations done by economists that more closely resemble those carried out by epidemiologists than those completed by other economists, and vice versa.

2.1 Using Theory to underpin the evaluation problem In ‘best practice’ economics, it is not unusual to find that impact evaluations are the empirical translation of some theory of economic behavior. Economists use theory both to model behaviour, and to identify ex-ante predictions about changes in behavior that can be attributed to an

2 intervention. While the use of theory is not universal among economists, it is generally seen as giving stronger grounding to impact evaluations; theory informs the design of the study, and guides the empirical testing. For instance, economists sometimes use structural models to investigate the mechanisms behind the impacts of a given intervention and to guide the interpretation of the results after estimation.

The practice of using theory is much less common in epidemiology, where modelling is rather employed to look at transmission mechanisms (process evaluation) that capture the dynamic nature and spread of – for example - diseases from infectious to susceptible individuals, and to incorporate positive and negative feedback characteristics of infectious processes. Epidemiology also uses theory to inform clinical decision analysis (for example by using decision trees, Markov models to assess the probability of transitions from one state to another, or Bayes modeling in network meta- analyses) or for modelling from surrogate outcomes (e.g. blood tests such as viral load in HIV to humanly-meaningful outcomes).

Reflecting on these issues, it was perceived that much more time is typically spent by economists in thinking through the entire setting of an impact evaluations because of their use of theory to both test their assumptions and inform what they are doing. Moreover, it was noted that although epidemiologists frequently use ‘theories of change’ in their studies (a practice not often found in peer reviewed economic papers), these are arguably not comparable to the mathematical models used by economists to identify hypotheses of behavioural change, and are instead considered a representation of hypothesised causal chains.

We illustrate this point with two papers evaluating the impact of Conditional Cash Transfer (CCT) programs on children’s health. Attanasio and co-authors (2011) discuss the economic benefits of combining randomised with structural models when looking at the effect of the PROGRESA CCT (a major randomised social in Mexico to improve the process of accumulation in the poorest communities). The intervention sought to increase school enrolment of poor children, and the paper used a structural model of choices to frame the behavioural change expected from the randomised experiment. We can contrast this with an epidemiology study of the impact of Bolsa Família (a CCT program in Brazil since 2003) on improving children’s health in a slum community situated in a large urban centre through the higher use of healthcare services, which would determine a fall in illness rates, and an improvement in the overall physical and psychosocial health of the population of interest (Shei et al. 2014). The authors discussed CCT programs, and explained how these can be -side tools to encourage poor families to make better use of existing healthcare services. However, no reference is made to any theory that would explain the expected changes in behaviour that would result from the Bolsa intervention. An explanation of the concept and working of the CCT programme is followed by a description of the evaluation methods used, and of the collected to estimate the mechanisms at work behind impact of Bolsa Família on children.

3

2.2 Generalizability and Transferability The practice among economists to use theory to predict the expected outcomes of an intervention could offer epidemiologists a method to help ensure that the findings of their evaluations are generalisable and transferable to other contexts, rather than being specifically applicable to the setting within their clinical trials.

In the epidemiology paper selected under the bednets theme (Krezanoski, 2010), the authors found that the provision of incentives for the use of ITNs (insecticide treated nets) significantly increased the probability of ITN use. However, the discussion does not comment on how generalizable these results are, or how they might predict the change in the proportion of using ITNs. By comparison, an economic paper by Dupas (2009) looked at take-up rates of insecticide-treated bednets following their introduction as subsidized “malaria-control devices” among rural households in Kenya. Her objective was to test whether the demand for the devices was elastic, and if it varied with how the marketing messages was framed. In her concluding remarks, she commented on the presence of significant price to the cost of ITN, on the absence of variation in take- up rates due to the marketing messages used. The findings of the impact evaluation are discussed in relation to those reported in other case studies from fellow economists, and are used to explain more widely the reason why poor households – who have limited – systematically underinvest in health services.

2.3 Research questions The second difference between evaluations in economics and epidemiology is the type of research questions that each discipline seeks to answer. Economists aim to test models of human behaviour using real life data; they try to explain behaviour. Epidemiologist are more interested in describing what will happen if a certain intervention is implemented. This approach is reflected not just in how the research questions are framed in each type of evaluation study (and in the type of outcome variables used to measure impact), but also in the context and supporting information that is provided to support the need to answer these questions.

To illustrate this point, we looked at two studies examining similar intervention: the impact of WASH interventions on diarrhea-related outcomes among younger siblings of school-going children (epidemiology paper), and the impact of providing universal access within a village to hygienic latrines and in-home piped water on the incidence of treated diarrhea episodes (in the economics paper).

Dreibelbis (2014) assessed the health and educational impacts of two WASH improvement interventions carried out in schools in Kenyan on the prevalence of diarrhoea. The indicator selected to measure impact was a 1-week period prevalence rate of diarrhoea episodes among children of school age, measured from a sample of individuals who were interviewed twice in 26 months. The body of the paper focuses mainly on explaining the study design and the methods adopted for the trial, with the section on results explaining in detail different for the odds of malaria associated with the survey population in different arms of the trial.

In a related economic paper, Duflo (2015) evaluated a village-level intervention that promotes adoption of latrine and bathing facilities, a community water tank, and a

4 system that supplies piped water to household taps. The author details the typical sequence of events for implementing the intervention, explaining in detail the procedures adopted by extension workers, the expected responses of village leaders, with an elaboration of the process by which villagers begin construction of household latrines and bathing facilities under the intervention, and a costing exercise to show the financial requirements for households constructing latrines and bathing facilities. There is a great level of detail added to the explanation of the economic study design, of the and evaluative method procedures, which takes the reader through the rationale of why and how the intervention is expected to change the behaviour of the targeted households.

The general perception which emerged from comparing these cases studies was that evaluations in epidemiology remain very focused on the specifics of the intervention that is being assessed, limiting to elaborate on details around the wider context of a study, possibly assuming readers’ knowledge on the topic or intervention being presented.

The different framing of the research questions across disciplines is also mirrored in the type and number of outcome variables. In general, economic studies tend to analyse not just one outcome as a measure of impact, but several that are correlated with potential of the project, or unintended (positive and negative) effects that result from the intended change. In epidemiology, this is rarely the case. Typically, a single main or primary outcome is investigated (which might also reflect what is feasible within the financial budget of the trial), and any additional or secondary measures of impact – if captured and analysed - might be published in separate papers.

Moreover, economists often look at a variety of measures, continuous but also binary, to capture primary expected impact, and additional effects expected from the intervention. For example, Duflo (2014) used recorded doctor or nurse visits for episodes of diarrhoea as the primary outcome variable, and three additional health outcomes measures (number of monthly cases of diarrhoea, malaria and fever). Epidemiologists, instead, use almost exclusively relative, rather than absolute, statistics for the results of their studies, such as , relative risk, hazard ratio, or survival rates. Bor (2014), explains how regression discontinuity designs models (that have typically used models, and are popular among economists), can be adapted to other that do not rely solely on the , using parametric and semiparametric regression models that specify the hazard, cumulative hazard or survivorship as a function of the assignment variable and time.

2.4 Study designs Differences in evaluation models are also evident across disciplines. Evaluation studies conducted by economists frequently use quasi-experimental methods which combine algorithms with empirical models that estimate impact, although randomised trials are not uncommon. Methods used in economic studies such as instrumental variable models (IV), local average treatment effect (LATE) models, and encouragement designs are rarely seen in epidemiology evaluations, with IV models scoring low on the scale of reliable methods (Martens et al. 2006). Other quasi-experimental methods such as propensity score matching (PSM) and regression discontinuity design (RDD) are increasingly being used by epidemiologists (see Bor, 2014, for a review of RDD modelling in epidemiology trials), whereas the interrupted- designs used by epidemiologists are not seen in evaluations conducted by economists. designs, such as cohort studies

5 are not used by economists and, in general, the observational studies that are an important step in characterising the evaluation problem in epidemiology, and in identifying what types of interventions need to be developed and where they should be targeted, are less accepted by economists.

When randomised trials cannot be used (for example, in the evaluation of health insurance programmes offered at the national level which are incompatible with assignment of treatment by randomisation), epidemiologists resort to alternative methods to address selection bias. These include simulations, interviews to assess the scale of bias and, more frequently, fixed effects models. These approaches would not be considered sound by economists to address selection bias adequately. Epidemiologists sometime use a “per-protocol” analysis (those who adhere to the allocated intervention are compared to those in the control group who were not exposed to it) or an “as treated” analysis (where those who adhere to the intervention are compared to those who do not receive it). These approaches are not acceptable in economics, which has a long tradition of addressing selection bias within observational studies.

A systematic review by Acharya and co-authors (2013) summarised the literature on impact evaluation studies of subsidised health insurance schemes, which were offered, mostly on a voluntary basis, to the informal sector in low and middle-income countries. The review shows that a substantial number of papers provided estimations of average treatment effect for those insured. Papers were grouped in two categories: those that corrected for the problem of self-selection into insurance, and the few that used an intention to treat analysis to estimate the average effect. The review selected 19 papers for inclusion (mostly authored by economists and health economists and published in journals such as and and Medicine). They excluded 15 studies on methodological grounds. These papers were predominantly written by non-economists or epidemiologists. The studies accepted for inclusion by the review used the following methods:

 PSM (11 studies, two of which in combination with DD and one in combination with IV)  IV (4 studies)  RDD (2 studies)  LATE with encouragement design (1 study)  Triple difference (1 study)

As noted above, most of these methods (possibly with the exception of the triple difference , which is also rarely used in economics) are either not widely used (PSM and LATE), or poorly understood (IV) in epidemiology.

2.5 Power calculations and vs size of impact in relation to intervention context The statistical power of an impact evaluation design is the likelihood of finding an effect (i.e. detecting a difference between the treatment and comparison groups) when in fact one exists. Low power makes a false negative (i.e. that is an effect is erroneously not found when it truly exists) more likely. On the other hand, in relation to false positives, when the p-, or calculated probability, approaches zero, this suggests a low chance of incorrectly rejecting the null hypothesis

6 of no difference between the treatment and comparison groups (i.e. that a difference is found by chance when there should be no difference).

Power calculations are used in the design of evaluation studies to show the smallest sample size required to detect a meaningful difference (also known as the minimum detectable effect) in outcomes between the treatment and comparison groups. Most economists do not report (or conduct) power calculations, and use instead balance tests on variables to detect differences in baseline characteristics between treatment and control groups.

Bandiera and co-authors (2017), for example, evaluate the economic impact of the Empowerment and Livelihood for Adolescents (ELA) program on improving the lives of adolescent girls through girls’ clubs offering vocational and life skills. In the section discussing the study design, the authors explain that girls’ clubs were randomly assigned across communities, and that participation in clubs was voluntary. They also explain that a representative sample of adolescent girls was randomly assigned to treatment and control communities at baseline and two years later, which generated a panel of 4,800 adolescent girls. The paper does not report conducting power calculations to establish the sample size above.

In contrast, the epidemiology study by Wagman (2014) assessed whether the Safe Homes and Respect for Everyone (SHARE) project reduced physical and sexual IPV and HIV incidence among individuals enrolled in the Uganda Rakai Community (RCCS). The paper explains in detail how the study was powered on the basis of statistical calculations for the reductions of sexual and physical IPV prevalence rates expected from exposure to the programme.

Two additional reflections emerged around power calculations and the usefulness of p-values. First, in evaluation studies conducted by economists, power calculations are important up to a point; what would really matter is to ‘power’ the outcome variable - on which the quality and informative value of the evaluation crucially depends - and to power the of residuals in regressions, because these show the reliability and precision of the inference. Second, although epidemiologists do power calculations, these might be probably best used in the set-up phase of an evaluation study to determine its viability: what is important is not the p-value for differences in the baseline characteristics of participants in the treatment and the control group, but the size of any such differences and their potential impact on the difference found between the groups at the end of the trial.

Correlated to the issue of power calculation, is that of the statistical power of a study, which is the ability of a study to detect a difference if a difference really exists. This depends on both the sample size (number of units), and the (e.g. the difference in outcomes between two groups). Many epidemiology studies often conduct and report power calculations, but are frequently underpowered. However, they might also report ‘less than everything’, perhaps reporting only the evidence of significant impact on a sub-set of the measures used. With a very large number of observations, a study may have good power to detect even a very small effect size, so small that it is not clinically meaningful.

7

Ioannidis and co-authors (2017) give a sense of how serious is the neglect of statistical power in economics in their review of 159 meta-analyses conducted in economics, which summarise results from 6,700 studies and 64,000 estimates. They subdivided the studies and meta-analyses into the following categories: (27%), (23%), (18%), (17%), (10%) and other (5%). They defined a study as adequately powered using the standard 80% cut-off, and classified as “powered” those studies for which the was less than the ratio between effect size and 2.8 (where 2.8 is the sum of 1.96 (for a significance level of 5%) and 0.84 (which is the standard normal of a 20/80% split in its cumulative distribution)). The review showed that:

- The proportion of adequately powered studies across the 159 meta-analyses was 10%, while the mean proportion was 22% - About 20% of meta-analysis were entirely composed of under-powered studies - The median power in the 159 meta-analyses was between 10-18%, while the mean power was between 30-33% - The average effect size across all studies was 0.17 SD The review also showed that only 4% of papers published in the American Economic Review mentioned conducting power calculations up to 1996, while only 8% reported this to 2004.

2.6 Publishing protocols and pre-specified outcomes The publication of protocols and pre-registration of the research has become the norm for randomised trials undertaken by epidemiologists; whereas economists have not widely accepted this, although it is becoming increasingly popular. When the evaluation method adopted for a study is not a randomised trial, then the practice of publishing a protocol is less frequent in both disciplines. This is likely to be because randomised trials have defined guidelines about pre- registration for publication of the final report in many journals, whereas the same is not true for other study designs.

Data mining Unlike epidemiologists, economists tend not to pre-specify their main outcomes of interest. This allows the analysis to be informed by the data gathered in the study but has the disadvantage that it can easily lead to data-mining and p-hacking as described above.

Economists are keen to find and report statistical significance regardless of the size of the impact, and often continue to refine their analysis until a statistically significant result is found, a practice which is known as p-hacking. Many argue that current scientific practices create strong incentives to publish statistically significant (i.e. “positive”) results, which in turn pushes researchers to selectively pursue and selectively attempt to publish only statistically significant research findings. The introduction of clear protocols and statistical analysis plans for randomised trials makes this practice less possible or, at least, more obvious if it is done.

P-hacking is not unique to economics, although examples in epidemiology (especially for diagnostic and prognostics) are more difficult to find. A possible solution to the problems of p-hacking could be for funding and ethics committees to require the publication of protocols and full data analysis from all the studies they fund or approve.

8

2.7 Reporting In evaluation studies conducted in economics, findings are typically published in a single journal article which is preceded by a series of working papers, while those in epidemiology might be spread across one or more journal articles. The average length of journal articles reporting economic and epidemiology studies is also quite different. The former often reach 40 pages and do not follow a standardised reporting structure; whereas published papers in epidemiology rarely exceed 15 pages due to strict word limits by journals (often no more than 3,000 words) and follow a standardised reporting structure.

Structure of publications The more standardised structure of epidemiological papers is helpful because it provides a clear template for the article, with dedicated sections to describe study design and methods, results and discussion. This facilitates the of methods in future studies. Epidemiology studies also tend to adhere to formal reporting guidelines, such as the Consolidated Standards of Reporting Trials (CONSORT) ones. These guidelines dictate ‘best practice’ in reporting the results of randomised trials and, although they are not followed universally, the situation is better than for economics where inconsistency in the reporting format of impact evaluations can cause confusion, especially among non-economists who wish to learn from these studies.

Another idiosyncrasy in the structure of evaluation studies is that economists usually discuss the context of the study in the introduction, to place the design and need for the study in context; while epidemiologists tend to refer to this in the discussion section, to place the study findings in context.

The issue around the different structure and reporting format of epidemiology and economic studies can be seen by comparing the two papers selected in the WASH theme. Duflo and co-authors’ evaluation of the economic impact of an integrated water and sanitation improvement program in rural India is 40 pages long, featuring a six-page opening section that describes in detail the background of the intervention and the socio-economic context of the targeted area. The paper also includes an eleven-page technical annex which explains all the robustness checks carried out, the data cleaning process, and shows the costing exercise of the program. The structure of Dreibelbis’ epidemiology study (2014) looking at the impact of a WASH intervention on diarrhoea-related outcomes among younger siblings of school-going children is quite different. This seven-page paper is organized around a standard structure of four sections (i.e. objectives, methods, results, and conclusions) with a clear visual depiction of the study design, and a brief section at the end outlining the limitations of the study.

Timeline to final publication The final publication of an evaluation study in an economics peer reviewed journal paper can take many years from the time that the analysis is finished. Findings, however, are publicly disseminated as they emerge through various platforms such as working papers, study reports and policy briefs. Working papers are considered to have some academic and policy value – especially given the ongoing practice to shift towards ‘open science’. Disclosure of early findings among peers and colleagues encourages critical review while generating early discussion on policy implications. However, this ‘working papers’ approach brings with it the problem that these papers might present

9 findings that can change or need to be refined, so that outcomes reported and discussed in earlier versions may be absent from later versions (links to general issue around transparency?), or might have become irrelevant.

The timeline in economics between study conduct, early findings and final publication is not compatible with the timeline for making policy. Policy makers want prompt evidence to inform their decisions, meaning that they might draw on the content of early working papers rather than waiting for the finalised results. As an illustration, the paper by Bandiera and co-authors (2017), was initially disseminated in 2012 as a working paper, before being finalised and published five years later as part of the World Bank ‘Open Knowledge’ series. Similarly, the first version of Attanasio and co-authors’ paper on the evaluation of Mexico’s Progresa CCT was submitted in 2001, but the final peer reviewed publication of the study appeared in 2012. For economists, publications in top journals are potentially more valuable than their actual impact on policy because the number of publications in high peer review journals has an impact on the scholar’s career progress towards the rank of professor.

Publication of data and code In general, in economics, peer reviewers have the right to ask for data and code from a study whose manuscript they are reviewing and, normally, these would be made available to them. Post- publication data sharing has become a formal requirement for peer review publication in economics, with the top economic journals (see American Economic Association guidelines) requiring this to ensure that the analyses are replicable by other scholars. Authors are expected to document their data sources, models and estimation procedures as thoroughly as possible. This is to allow other researchers to replicate the data analysis presented in the published paper and to allow the building of a repository of data. It can also increase the number of citations of the article, enhancing the reputation of both the authors and the journal. Similar practices are rare in epidemiology. Instead, confidentiality issues might be cited as barriers to the sharing of trial data and coding procedures after publication.

One reason why this practice has been more acceptable in economics might be the afore-mentioned time lag between the first version of the evaluation findings (typically in a working paper format) and the final version in a peer reviewed journal. Another reason is that economists generally prepare a single paper that reports all findings. This practice would suggest that by the time the study is published, the authors will have ‘exhausted’ their first-user rights to analyse the data, satisfying their priority to conduct original publishable work. Conversely, in epidemiology, the primary findings of an evaluation study are usually published in a stand-alone paper, which might then be followed by a series of other publications with a series of additional results to give the original researchers the opportunity to be the first to undertake further analysis on the data and to publish the outputs.

2.8 Replicability of impact evaluations Although it is not the norm, there are instances in epidemiology where researchers who have arrived at conflicting results over the impact of the same intervention work together to identify or resolve the source of conflict and to reconcile the conclusion of their analyses. In contrast, in economics, efforts are less likely to be made to resolve conflicting results.

10

Although the replicability of evaluation findings has been more common in epidemiology, in economics, analysing behavioural mechanisms rather than interventions has been strongly promoted as the correct approach over a number of years by the leading economics Angus Deaton. For example, he wrote “the analysis of projects needs to be refocused towards the investigation of potentially generalizable mechanisms that explain why and in what contexts projects can be expected to work” (Deaton, 2009). And, indeed, there are several papers in which the authors explicitly state that their focus is not the intervention per se (which may not be replicable) but the insight which comes from the study of its impact. A well-known example is the study of teacher absenteeism in India by Duflo and colleagues (2012). Teacher attendance was monitored by requiring them to take a date and time stamped photo each day with their class, and pay was conditional upon attendance. This scheme was more effective in reducing absenteeism than an award scheme administered by head teachers (since head teachers gave everyone the award regardless of performance). In their conclusion, the authors explicitly recognized that using cameras more widely ‘may prove difficult’. So, rather than recommending rolling out such a scheme, they concluded that the barriers to reducing absenteeism are not insurmountable and that other solutions might be found given the political will. A second example is Ashraf and co-authors’ study (2013) of social marketing of female condoms in Lusaka. The study compared financial incentives to hairdressers versus social recognition through public display of stars and awards. Social recognition had a larger impact on sales than financial incentives. In their conclusions, the authors do not discuss the specific intervention at all. Rather, they discuss aspects of context which they believe will affect the effectiveness of non-financial incentives, recommending they be adopted by others: “While we implemented a specific type of non-financial rewards, the general design principles are easily replicable and adaptable to other settings” (Ashraf et al. 2013).

3. Evidence of differences between economists and epidemiologist from systematic reviews Systematic reviews of the type performed in epidemiology are not often undertaken in economics. In this field, reviews are written by experts with little focus on the use of methods to minimise bias. Epidemiology is interested in generating ‘useful’ evidence, which may include the need to examine contradictory findings. The general perception that in economics there is less interest in addressing why a study contradicts previous ones probably explains why economists don’t use systematic reviews to compare findings of studies that used similar designs or addressed the same question. For example, two systematic reviews of CCT interventions led by epidemiologists have been published in the Cochrane library of systematic reviews that have gone unnoticed among the development economists’ community. One systematic review of CCT was published in the Campbell collaboration library of systematic reviews (Baird et al., 2013), and a critical review of Studies of PROGRESA was published by the Journal of Economic Literature (Parker et al., 2017). The first review published by Campbell is unusual and in line with the ’s effort to promote systematic reviews across disciplines. The JEL review, on the other hand, is more typical and in line with the reviews economists normally read.

The section below lists the main differences observed in the two systematic reviews above.

11

3.1 Conditional cash transfers

 The Cochrane Reviews exclusively select randomised trials and controlled before-after studies for their review, pointing to ‘strenuous’ efforts to reduce bias through the design and conduct of the study (1,2). Reviews by economists are rich in details on the methods used for the analyses of the data: matching, RDD, IV etc. (3,4)  The three systematic reviews include quasi-experimental designs in the following proportions: 2/10 (20%) (1), 5/16 (31%) (2), and 40/75 (53%) (3). Quasi-experimental studies appear more prominently in the economists’ review while epidemiologists tend to focus on experimental studies and not better specified ‘controlled before-after’ studies.  The JEL review includes a study that would not be eligible in an epidemiology review: a study estimating the impact of oportunidades on infant mortality using municipal level data and the percentage of rural households receiving the transfer.  Reviews such as that in JEL are popular in economics. They are not systematic in regard to their identification and inclusion of studies but tend to be conducted by project, assess all impacts, and discuss other topics such as multiple testing, CEA and gender.  A discussion of balancing tests is included in one the Cochrane review, which is interesting to discuss. It is common among economists to conduct a statistical tests of the difference between covariates before an intervention with the goal of ‘proving’ the similarity of the project and control group. This is often done even in experiments when the practice has no meaning and the results can be misleading. In epidemiology this tests are taken as ‘proving’ that randomisation failed when there are serious doubts about the validity of the randomisation process. Economists conducted a balance test of the covariates using the PROGRESA data and found 30% differences at the household level, which lead the authors to conclude that the data are of poor quality (1). But the differences observed are very small and are statistically significant because of the large sample size (27,000 households). It is well known that a sufficiently large sample will deliver a statistical significance difference between two groups even if this is very small and resulting from random noise.  Cochrane Reviews included the category ‘cohort study’, although it is unclear what type of study this implies (2). Fixed effects models are not considered to be a valid identification strategy in economics, i.e. they do not adequately control for selection bias.

3.2 School feeding programmes Some additional differences between economics and epidemiological systematic reviews were identified in the Cochrane Review by Kristjansson and co-authors (2007) about school feeding for improving the physical and psychosocial health of disadvantaged students.  The main objective was to determine the effectiveness of school feeding programs in improving physical and psychosocial health for disadvantaged school pupils. The review identified 30 potentially eligible studies, of which 12 were excluded for reasons other than the methods of the study (i.e. the intervention or the population were judged not to be relevant or there was a lack of relevant outcome data). The review also excluded a large number of studies on methodological grounds, such as the use of cross-sectional comparisons of participants and non- participants and longitudinal studies with no control group, but the information in the review does not indicate if these had been conducted by economists or epidemiologists. Of the 18

12

included studies: 7 are classified as randomised trials, 9 as controlled before-after studies (CBA), and 2 as interrupted time series (ITS).  With one exception (Jacoby, 1996), all included studies appear to have been conducted by non- economists, either epidemiologists, educationalists or nutritionists. The reviews do not include details of the CBA studies and it is not therefore possible to comment on the methodologies of the quasi-experimental studies. The review does not discuss whether the studies where difference-in-difference analyses, whether they were strengthened by matching methods and which ones, at a level of detail that economists would expect in order to judge the validity of the study.  One main observation on this review is that it includes two interrupted time series studies, which have no place in economics. Economists would not consider this type of study as a valid method for causal inference, despite time series being a dominant topic in econometrics and the analysis of structural time breaks being a primer in beginners’ econometrics courses.  Another main observation is that the study by World Bank economist Jacoby (2002) is omitted from the review, while the study by IFRPI economist Ahmed (2004) is noted in the list of excluded studies. Jacoby conducted a exploiting the fact that children interviews were conducted both in school and non-school days (hence with and without school meal) and correcting for selection bias using instrumental variables. The study by Ahmed replicates the study by Jacoby by producing the variation in school/out-of-school interview by . It is possible that the studies were excluded because not meeting non- methodological selection criteria such as, for example, the type of outcomes considered or the year of publication. However, this omission reminds us that natural experiments and instrumental variable approaches are normally not considered valid by epidemiologists and not included in systematic reviews of evidence.

4. Epi-Econ Evaluation Glossary Despite the existence of a wide of evaluation glossaries in economics and many epidemiology glossaries, we have not been able to find a platform where evaluation terms used across disciplines are defined and explained. The existence of a common space to reconcile terms used by evaluation practitioners would reduce the risk of miscommunication in the understanding of each discipline’s evaluation work. To help fill this gap, and to illustrate the value of such a platform, the annex includes an annotated evaluation glossary for epidemiology and economics. The next two sections explain how the glossary was collated, and provide additional insights on the extent of overlapping terms. The unified glossary is intended to become an online tool, that will be regularly updated and freely accessible to interested users.

Methods We undertook a targeted search in google and google scholar using the term ‘impact evaluation glossary’. Based on the search we identified two ‘impact evaluation’ glossaries both published by economic organisations; International Initiative for Impact Evaluation (3ie)2 and the Centre for

23ie. 3ie impact evaluation glossary [Internet]. New Delhi, India: International Initiative for Impact Evaluation; 2012 [cited 2017 Oct 23]. Available from: http://www.3ieimpact.org/media/filer_public/2012/07/11/impact_evaluation_glossary_-_july_2012_3.pdf

13

Effective Global Action (CEGA) at the University of California3 (which was based on the World Bank’s Impact Evaluation in Practice glossary4). We did not identify any epidemiological glossary specifically for impact evaluations; we selected the glossary from the Cochrane Handbook.5 The complete list of terms from the three glossaries was compiled into a single table in excel and definitions were added: 3ie and CEGA provide the ‘economic definition’, while Cochrane provides the ‘epidemiological definition’. Terms not present in the Cochrane glossary were crossed checked in an epidemiological dictionary6 and, if present, the definition was added to the table. Following this, terms were categorised as either:

 Unique to epidemiology;  Unique to economics;  Common to both;  Common to both but different definition used by the two disciplines; or  Different term used to describe the same concept.

The terms were classified into five themes to group similar terms together: study designs and approaches (e.g. cohort study, difference-in-difference), data and methods (e.g. data, survey instrument), trial procedures (e.g. run-in period, blinding), analysis (e.g. treatment of the treated, regression), outcome indicators (e.g. effect size, relative risk). From the complete list of terms identified we selected a subset that are presented in the annex to this paper. The complete list of terms is included as an appendix to this paper.

Findings In total we identified 207 unique terms from the three glossaries, expert workshop and case studies; 42.5% (88/207) unique to epidemiology, 22% (46/207) unique to economics and 33% (69/207) common to both disciplines. For two of the included terms (participant and sample) the definition slightly differed between the two terms. In one case the two disciplines used a different term to describe the same concept (dummy variable and indicator variable). An additional 13 terms were identified through the workshop and case studies; 10 unique to economics and 3 unique to epidemiology. The Annex includes a table with n excerpt of the proposed glossary.

3 CEGA. Glossary of terms [Internet]. University of California, Centre for Effective Global Action. Available from: http://cega.berkeley.edu/tools/evaluation/statistical-glossary/ 4 Gertler PJ, Martinez S, Premand P, Rawlings LB, Vermeersch CMJ. Impact Evaluation in Practice: Glossary [Internet]. World Bank; [cited 2017 Oct 24]. Available from: http://www.fapesp.br/avaliacao/manuais/impact_evaluation_2016.pdf 5 The Cochrane Reviewers’ Handbook Glossary [Internet]. Cochrane Collaboration; [cited 2017 Nov 21]. Report No.: Version 4.1.5. Available from: http://www.epidemiologia.anm.edu.ar/cochrane/pdf/glossary.pdf 6 Porta M. A dictionary of epidemiology. Oxford University Press; 2008.

14

5. Conclusions Epidemiologists and economists often evaluate similar projects and social policies using different tools and approaches in the design and analysis of experiments and quasi-experiments. In some cases, the tools are the same but are used in different ways, in other instances, researchers of the two disciplines may employ the same tools but use a different terminology. These differences lead to misunderstandings between researchers of the two disciplines. Moreover, as the results of evaluations by epidemiologists and economists differ without being strictly comparable, the readers are confused and unable to interpret the results or to reconcile the differences between evaluations by economists and epidemiologists. This paper reconciles some of these differences building on work conducted separately by mixed teams of epidemiologists and economists, and developing the thinking and discussion of the CEDIL workshop between evaluation specialists in both disciplines. We discuss areas of overlap as well as approaches and practices where the two disciplines differ with the aim to explore the potential for at harmonising the evaluation methods. The general feeling is that economists and epidemiologists conducting evaluation studies have more in common than what is believed. For example, when is not possible, both disciplines face the problem of unobserved which leads to differential selection of individuals into a study sample. While sample selection is different across the two fields, the biases are similar and can be addressed with similar methods that imitate randomization (more rarely with instrumental variables models, and increasingly so with regression discontinuity designs when these are). The two fields also share the less attractive feature of p-hacking, collecting data or conducting statistical analyses until non-significant results become significant. Reversing this trend, which represents a serious concern to scientific inference, would entail a shift in the implicit requirements for publishing evidence of impact. They also share the practice of conducting studies that are not statistically powered ex-post. While epidemiologist conduct power calculations to define the sample size of their studies, economists normally do not. But both economists and epidemiologists end up using unpowered samples in their studies. It is, however, some of the differences in the evaluation practice that offer the largest scope from cross-disciplinary. Economists often use theory to model human behaviour to inform the design of the evaluation study. They are interested in the replication of the behavioural mechanism underlying an intervention, rather than in replicating the intervention per se or its findings, and also share data of completed evaluations as part of the peer review process leading to publication. Epidemiologists offer equally important examples to economists from their evaluation practice: they follow clear guidelines when publishing the results from their studies, report power calculations to establish the appropriate sample size required to detect impact, and publish the findings of their studies within a reasonably short time, a practice which allows the timely relevance and use of their results. It is hoped that the discussion emerging from this paper, alongside with the unified evaluation glossary included, will create a roadmap of initiatives leading to a more integrated approach to evaluations between the two disciplines.

15

References Acharya A., Vellakkal, S., Taylor, F., Masset, E.,Satija, A., Burke, M., and Shah E. (2013) Impact of Health Insurance for the Informal Sector in Low and Middle Income Countries: A Systematic Review, World Bank Observer, Volume 28, Issue 2, 236 – 266. American Economic Association (consulted January 2018) ‘Data Availability Policy’. https://www.aeaweb.org/journals/policies/data-availability-policy Ashraf, N., Bandiera, O., and Kelsey J. (2013) “No margin, no mission? Evaluating the role of incentives in the distribution of public in Zambia” 3ie Report # 9 () available at: http://www.3ieimpact.org/media/filer_public/2014/02/20/ie_9-ashraf- no_margin_no_mission_web.pdf Attanasio, O., Meghir, C., and Santiago A. (2012) “Education Choices in Mexico: Using a Structural Model and a to evaluate Progresa” The Review of Economic Studies, Vol. 79, No. 1, pp. 37-66 Baird, S., Ferreira, F. H. G., Özler, B., and Woolcock, M. (2013) “Relative Effectiveness of Conditional and Unconditional Cash Transfers for Schooling Outcomes in Developing Countries: A Systematic Review”, Campbell Systematic Reviews, 2013:8 Bandiera, Oriana; Buehren, Niklas; Burgess, Robin; Goldstein, Markus; Gulesci, Selim; Rasul, Imran; Sulaiman, Munshi (2017). “Women’s Empowerment in Action: Evidence from a Randomized Control Trial in Africa.” World Bank, Washington, DC. © World Bank. https://openknowledge.worldbank.org/handle/10986/28282 License: CC BY 3.0 IGO.” Baranov, V., Bennet, D., Kohler, H-P. (2015) “The indirect impact of antiretroviral therapy: Mortality risk, mental health, and HIV-negative labor supply. Journal of Health Economics” 44; 195–211 Bor, J., Moscoe, E., Mutevedzi, P., Newell, M. L., and Bärnighausen, T. (2014) “Regression Discontinuity Designs in Epidemiology Causal Inference Without Randomized Trials Epidemiology” 25: 729–737. Centers for Disease Control and Prevention (consulted January 2018) “Introduction to Epidemiology” https://www.cdc.gov/OPHSS/CSELS/DSEPD/SS1978/Lesson1/Section4.html#_ref22 Center for Effective Global Action (CEGA) Glossary of terms [Internet]. University of California, Centre for Effective Global Action. Available from: http://cega.berkeley.edu/tools/evaluation/statistical-glossary/ Consolidated Standards of Reporting Trials: http://www.consort-statement.org/ Deaton, A. (2009) “Instruments of development: Randomization in the tropics, and the search for the elusive keys to ”. The Keynes Lecture, British Academy, October 9th, 2008. Dreibelbis, R., Freeman, M. C., Greene, L. E., Saboori, S., and Rheingans, R. (2014) “The Impact of School Water, Sanitation, and Hygiene Interventions on the Health of Younger Siblings of Pupils: A Cluster-Randomized Trial in Kenya” American Journal of Public Health, 204 (1): 91-97 Duflo, E., Hanna, R., and Stephen P. Ryan. (2012) “Incentives Work: Getting Teachers to Come to School” American Economic Review; 102(4): 1241–1278 Duflo E., et al. (2015) “Toilets Can Work: Short and Medium Run Health Impacts of Addressing Complementarities and Externalities in Water and Sanitation” NBER Working Paper No. 21521 Dupas, P. “What Matters (and What Does Not) in Households' Decision to Invest in Malaria Prevention?” American Economic Review; 99(2):224-30

16

Gertler PJ, Martinez S, Premand P, Rawlings LB, Vermeersch CMJ. Impact Evaluation in Practice: Glossary [Internet]. World Bank; [cited 2017 Oct 24]. Available from: http://www.fapesp.br/avaliacao/manuais/impact_evaluation_2016.pdf Ioannidis, J.P.A., Stanley, T.D. and H. Doucouliagos (2017) The power of bias in economic research, The Economic Journal, 127, 236-265 J-PAL (consulted January 2018) “Introduction to Evaluations” https://www.povertyactionlab.org/research-resources/introduction-evaluations Journal of Development Economics (consulted January 2018) “note about supplementary research data” https://www.journals.elsevier.com/journal-of-development-economics/news/note-about- supplementary-research-data Krezanoski PJ, Comfort AB, Hamer DH. Malar J. (2010) “Effect of incentives on insecticide-treated bed use in sub-Saharan Africa: a cluster randomized trial in Madagascar” Malaria Journal 2010 Jun; 9:186 Kristjansson EA., Robinson V, Petticrew M, MacDonald B, Krasevec J, Janzen L, Greenhalgh T, Wells G, MacGowan J, Farmer A, Shea BJ, Mayhew A, Tugwell P. (2007) “School feeding for improving the physical and psychosocial health of disadvantaged students.” Cochrane Database of Systematic Reviews, Issue 1. Art. No.: CD004676. Lagarde, A. Haines, and N. Palmer (2009) “The impact of conditional cash transfers on health outcomes and use of health services in low and middle income countries (Review)”, Cochrane Database of Systematic Reviews 2009, Issue 4. Martens, E. P., Wiebe P. R., de Boer, A., Belitser, S. V., Klungel, O. H. (2006) “Instrumental Variables: Application and Limitations” Epidemiology: Volume 17 - Issue 3 - p 260-267 Parker, S.W, and P. E. Todd (2017) “Conditional Cash Transfers: the Case of PROGRESA/Oportunidades”, Journal of Economic Literature, 55(3), 866–915. Pega F., Liu S.Y., Walter S., Pabayo R., Saith R., S. K. Lhachimi, (2017) “Unconditional cash transfers for reducing poverty and vulnerabilities: effect on use of health services and health outcomes in low- and middle-income countries”, Cochrane Database of Systematic Reviews 2017, Issue 11. Porta M. (2008) A dictionary of epidemiology. Oxford University Press. Powell-Jackson T., Davey C., Masset E., Krishnaratne S., Hayes R., Hanson K., Hargreaves J.R. (forthcoming, 2018) Trials and tribulations – cross-learning from the practices of epidemiologists and economists. Health Policy and Planning (Methodological Musings series) Shei A., Costa F., Reis M. R., and Ko A (2014). “The impact of Brazil’s Bolsa Família conditional cash transfer program on children’s health care utilization and health outcomes” BMC International Health and Human Rights 14 (10), pp. 1-9. Spegelman, D. (2016) Evaluating Public Health Interventions: 1. Examples, Definitions, and a Personal Note. Am J Public Health.; 106:70–73 The Cochrane Reviewers’ Handbook Glossary [Internet]. Cochrane Collaboration; [cited 2017 Nov 21]. Report No.: Version 4.1.5. Available from: http://www.epidemiologia.anm.edu.ar/cochrane/pdf/glossary.pdf Wagman G. et al. (2015) “Effectiveness of an integrated intimate partner violence and HIV prevention intervention in Rakai, Uganda: analysis of an intervention in an existing cluster randomised cohort” Lancet Global Health. 3ie. 3ie impact evaluation glossary [Internet]. New Delhi, India: International Initiative for Impact Evaluation; 2012 [cited 2017 Oct 23]. Available from:

17 http://www.3ieimpact.org/media/filer_public/2012/07/11/impact_evaluation_glossary_- _july_2012_3.pdf

18

ANNEX 1: Econ-Epi Unified Evaluation Glossary

Classification Econ-Epi Term Economic definition Epidemiological definition (3ie unless otherwise stated) (Cochrane unless otherwise stated) Analysis Both Confounding Factors (variables) other than the programme Confounding: A situation in which a measure of the effect of an factors/variables (a which affect the outcome of interest. intervention or exposure is distorted because of the association of confounder) [econ: see also exposure with other factor(s) that influence the outcome under IV] investigation. Both Instrumental variable [epi: An instrumental variable is a variable that helps Instrumental variable analysis: Method originally used in econometrics and See also confounding bias, identify the causal impact of a program when some social sciences that, under certain assumptions, allows the estimation residual confounding] participation in the program is partly determined of causal effects even in the presence of unmeasured confounding for the by the potential beneficiaries. A variable must exposure and effect of interest. An instrumental variable, or instrument, have two characteristics to qualify as a good has to meet the following conditions: 1. it is associated with the exposure, instrumental variable: (1) it must be correlated 2. it affects the outcome only through the exposure, and 3. it does not with program participation, and (2) it may not be share any (uncontrolled) common cause with the outcome. See also correlated with outcomes Y (apart from through confounding bias, residual confounding. program participation) or with unobserved Source: Porta variables. Source: CEGA Both Intention to treat estimate The average treatment effect calculated across An intention-to-treat analysis is one in which all the participants in a trial [econ: average treatment the whole treatment group, regardless of are analysed according to the intervention to which they were allocated, effect] whether they actually participated in the whether they received it or not. Intention-to-treat analyses are favoured in intervention or not. Compare to treatment of assessments of effectiveness as they mirror the noncompliance and the treated. treatment changes that are likely to occur when the intervention is used in practice, and because of the risk of attrition bias when participants are excluded from the analysis. Epi Sensitivity analysis An analysis used to determine how sensitive the results of a study or systematic review are to changes in how it was done. Sensitivity analyses are used to assess how robust the results are to uncertain decisions or assumptions about the data and the methods that were used.

Econ Unobservable / Characteristics which cannot be observed or Unobservable variables measured. The presence of unobservables can cause selection bias in quasi-experimental designs, if these unobservables are correlated with both participation in the programme and the outcome(s) of interest.

19

Classification Epi/econ Term Economic definition Epidemiological definition (3ie unless otherwise stated) (Cochrane unless otherwise stated)

Design Epi Case-control A study that starts with identification of people with the disease or outcome study of interest (cases) and a suitable control group without the disease or (synonyms: outcome. The relationship of an attribute (intervention, exposure or risk case referent factor) to the outcome of interest is examined by comparing the study, or level of the attribute in the cases and controls. For example, to determine retrospective whether thalidomide caused birth defects a group of children with birth study) defects (cases) could be compared to a group of children without birth defects (controls). The groups would then be compared with respect to the proportion exposed to thalidomide through their mothers taking the tablets. Case-control studies are sometimes described as being retrospective as they are always performed looking back in time. Epi Cohort study An observational study in which a defined group of people (the cohort) is (synonyms: followed over time. The outcomes of people in subsets of this cohort are follow-up, compared, to examine for example people who were exposed or not incidence, exposed (or exposed at different levels) to a particular intervention or other longitudinal, factor of interest. A cohort can be assembled in the present and followed prospective into the future (this would be a prospective study or a "concurrent cohort study) study"), or the cohort could be identified from past records and followed from the time of those records to the present (this would be a retrospective study or a "historical cohort study"). Because random allocation is not used, matching or statistical adjustment at the analysis stage must be used to minimise the influence of factors other than the intervention or factor of interest. Econ Difference-in- Double difference: The difference in the change in difference the outcome observed in the treatment group (syn: double compared to the change observed in the comparison difference) group; or, equivalently, the change in the difference in the outcome between treatment and comparison. Double differencing removes selection bias resulting from time-invariant unobservables. Also called Difference-in-difference. Compare to single difference and triple difference.

20

Econ Ex ante An impact evaluation design prepared before the evaluation intervention takes place. Ex ante designs are design stronger than ex post evaluation designs because of the possibility of considering random assignment, and the collection of baseline data from both treatment and comparison groups. Also called prospective evaluation.

Econ Ex post An impact evaluation design prepared once the evaluation intervention has started, and possibly been design completed. Unless there was random assignment then a quasi-experimental design has to be used.

Econ Interrupted Interrupted time series analysis is a non- time series experimental evaluation method in which data are analysis collected at multiple instances over time before and after an intervention is introduced to detect whether the intervention has an effect significantly greater than the underlying secular trend. Source: CEGA Both Matching [epi A method utilized to create comparison groups, in Paired design: A study in which participants or groups of participants are also called: which groups or individuals are matched to those in matched (e.g. based on prognostic factors) and one member of each pair is paired design] the treatment group based on characteristics felt to allocated to the experimental (intervention) group and the other to the be relevant to the outcome(s) of the intervention. control group. Econ Regression An impact evaluation design in which the treatment discontinuity and comparison groups are identified as being those design (RDD) just either side of some threshold value of a variable. This variable may be a score or observed characteristic (e.g. age or land holding) used by program staff in determining the eligible population, or it may be a variable found to distinguish participants from non-participants through data analysis. RDD is an example of a quasi-experimental design. Outcome Epi Relative Risk The ratio of risk in the intervention group to the risk in the control group. indicator (RR) The risk (proportion, probability or rate) is the ratio of people with an event (synonym: in a group to the total in the group. A relative risk of one indicates no risk ratio) difference between comparison groups. For undesirable outcomes an RR

21

that is less than one indicates that the intervention was effective in reducing the risk of that outcome. Epi Risk The absolute difference in the event rate between two comparison groups. A difference risk difference of zero indicates no difference between comparison groups. A (RD) RD that is less than zero indicates that the intervention was effective in (synonym: reducing the risk of that outcome. absolute risk reduction) Epi Surrogate Outcome measures that are not of direct practical importance but are endpoints believed to reflect outcomes that are important; for example, blood pressure (synonym: is not directly important to patients but it is often used as an outcome in intermediary clinical trials because it is a risk factor for stroke and heart attacks. Surrogate outcomes; endpoints are often physiological or biochemical markers that can be surrogate relatively quickly and easily measured, and that are taken as being predictive outcomes) of important clinical outcomes. They are often used when observation of clinical outcomes requires long follow-up. Trial Both Blinding (epi A process of concealing which subjects are in the Keeping secret group assignment (e.g. to treatment or control) from the procedures synonym: treatment group and which are in the comparison study participants or investigators. Blinding is used to protect against the masking) group, which is single-blinding. Blinding is generally possibility that knowledge of assignment may affect patient response to not practical for socio-economic development treatment, provider behaviours (performance bias) or outcome assessment interventions, thus introducing possible bias. (detection bias). Blinding is not always practical (e.g. when comparing surgery to drug treatment). The importance of blinding depends on how objective the outcome measure is; blinding is more important for less objective outcome measures such as pain or quality of life. See also single blind, double blind and triple blind. Econ Endline The situation at the end of an intervention, in which progress can be assessed or comparisons made with baseline data. Endline data are collected at the end of a program or policy is implemented to assess the “after” state. Source: CEGA Epi Protocol The plan or set of steps to be followed in a study. A protocol for a systematic review should describe the rationale for the review; the objectives; and the methods that will be used to locate, select and critically appraise studies, and to collect and analyse data from the included studies.

Epi Run-in period A period before a trial is commenced when no treatment is given. The data from this stage of a trial are only occasionally of value but can serve a valuable role in screening out ineligible or non-compliant participants, in ensuring that participants are in a stable condition, and in providing baseline

22

observations. A run-in period is sometimes called a washout period if treatments that participants were using before entering the trial are discontinued.

Both - diff Sample A subset of the population being studied. The sample A selected subset of a population. A sample may be random or nonrandom terminology is drawn randomly from the sampling frame. In a and may be representative or non-representative. Types of sampling: area simple random sample all elements in the frame sampling; cluster sample; grab sample; probability (random) sample; simple have an equal probability of being selected, but random sample; stratified random sample; systematic sample. Source: Porta usually more complex sampling designs are used, requiring the use of sample weights in analysis.

23