Zenda Ofir

EES Conference, Prague, Oct 2010

1 “Evaluation was conceived as an undertaking useful in …. an open society…. an experimenting society …. in which we ask serious and important questions about what kind of society we should have, and directions we should take.”

Tom Schwandt, “Educating for Intelligent Belief in Evaluation”, AEA Keynote published in AJE 29(2), 2008

2 “We …. have to be wary of the latest fads in the development field.25th They Anniversary are frequently of the OECD transformed Development into Centre simplistic and extremist ideologies which often cruelly mark the life of nations…. ” Jacques Lesourne

“The field of development is a veritable junkyard of abandoned models, each focused on a particular aspect while ignoring the rest.” Brian Walker, former Executive Director, Oxfam

3 “The substitution of reasoned assessment for ‘’ - (is) the act of …. relating a story in such a way as to influence public opinion…. (it) often uses strategies such as cherry-picking and or ….”

Tom Schwandt, “Educating for Intelligent Belief in Evaluation” AEA Keynote published in AJE 29(2), 2008

4 ` ‘Rigorous’ Delegitimisation of other ` ‘Scientific’ Notion of a designs/ hierarchy of ` ‘Hard data’ methodologies designs, as ‘unscientific’, ` ‘Credible evidence’ methodologies ‘not rigorous’,

` ‘Evidence-based’ ‘not credible’

5 Especially prominent in Impact Evaluation - the design ‘hierarchy’ 1. Experimental 1. True experimental design 2. Regression-discontinuity 2. Quasi-experimental design 3. ‘Non-experimental’ - e.g. 3. Time-series design • Single case study design 4. Constructed matched • Comparative case studies comparison group design design 5. Exhaustive alternative causal identification and elimination • Statistical correlation (used design with these designs) 6. Expert opinion summary (using General Elimination judgment design Methodology, Multiple Lines & 7. Key informant summary Levels of Evidence, judgment design contribution analysis, process Paul Duignan (2008) tracing, etc.) 6 “We know now that one of the central principles of development theory and practice is that those programs that are most precisely and easily measured are the least transformative, and those that are most transformative are the least measureable.”

From: The Clash of the Counter-bureaucracy and Development Andrew Natsios, July 2010, CGD Essay

7 Key development intervention types stand to be neglected :

1. Interventions with insufficient numbers (n)

2. Complicated interventions with multiple partners, objectives, expected outcomes - e.g. capacity building

3. Complex interventions – risky, highly adaptive

4. Interventions with heterogeneous outcomes –

◦ cannot assess disparities

◦ track important evolving outcomes/impacts over time.

8 Poor practice in, and under-funding of those development interventions most likely to lead to transformation and sustained development

1. Towards standardized, simple projects - away from risk and innovation towards

2. Away from integrated, transformative solutions and critical development priorities (institution building, empowerment, governance, livelihoods, ecosystems)

3. Away from a focus on understanding systems and complexity

9 5. Away from evaluating ‘mechanisms’ and understanding the role of context

6. Away from other issues in need of serious attention in evaluation - in particular how to evaluate for sustained development, up-scaling, unintended consequences

7. Away from comprehensive synthesis of all information about development - synthetic reviews

10 1. Increasing potential for progression from ‘spin’ to policy

2. Power of ‘converts’, ‘missionaries’ and various lobby groups

3. NONIE experiences

11 “Creating a culture in which randomized evaluations are promoted, encouraged and financed has the potential to revolutionize social policy during the 21st Century, just as randomized trials revolutionized medicine during the 20th.

Esther Duflo in editorial, “The World Bank is finally embracing Science”

(2004)

12 Factors that might accelerate decision-makers’ susceptibility to ‘spin’

Confluence of factors clustered around

• Increasing competition for resources – national, regional, global level • Political movements to the right • Decreasing attention spans, increasing information flood - need for soundbytes, ‘easy think’, numbers • Not enough understanding, capacities • Powerful actors with agendas, profile - and a hold over (resources for) knowledge • Orthodoxy spin mechanisms to influence policy

13 ` “Set in place rigorous procedures to evaluate the impact of policies and programs, report on results and reallocate resources accordingly, incorporate relevant evidence and analysis from other institutions, and inform the policy and budget process....

` ....undertake a more substantial investment of resources in monitoring and evaluation, including with a focus on rigorous and high-quality impact evaluations.”

14 Link to site “One of the principles motivating the President’s Budget is that, as a nation, of Coalition we haven’t been making the right investments to build a new foundationfor for economic prosperity ..... Evidence- based Policy

But, in making new investments, the emphasis has to be on "smarter.”

` ….Wherever possible, we should design new initiatives to build rigorous data about what works and then act on evidence that emerges ..... ` ..... and shutting down those that are failing.

` …. Over time, we hope that some of those programs will move into the top tier — but, if not, we’ll redirect their funds to other, more promising efforts.

15 “An agreed methodology would allow comparing the results reaped by very different actions… Each actor in development aid…. would have the possibility to declare the results of his development action measured according to this methodology. QuODA?

An international effort would urgently need to fill the evaluation gap…. Harmonizing methodologies between institutions…. Undertaking a global rating and ranking of public and private institutions along the lines of agreed methodologies …. This would create emulation, and encourage opting out of weak players.”

16 Key actors who can shape (Impact) Evaluation US BASED OR LINKED KEY ACTORS AND COALITIONS NONIE ECG (group of •IPA / CGD / 3ie / JPAL / DIME etc. OECD DAC IFI / Dev. Banks •Promising Practices Network Evaluation Evaluation Network Offices) •Campbell / Cochrane Collaborations •What Works Clearinghouse Evaluation •Coalition for Evidence-based Policy UNEG (UN Associations and •Etc. Networks under Evaluation Group) IOCE auspices EMERGING ACTORS

ALNAP • / BRICs (Humanitarian/emergency • work coalition) •Gulf States

Private Sector GOVERNMENTS Institutions NGOs

•Corporate •Within the Executive •UNIVERSITIES •Local •Serving the •Churches •National •FOUNDATIONS & legislative & THEIR COALITIONS •Sectoral •INGOs judicial •Topical •ALLIANCES •Embracing all •RESEARCH CENTRES levels of state, •Financial para-statal and meta-statal units 17 “Quality impact evaluations are those which use the appropriate methods and deliver policy-relevant conclusions.”

“…. The agenda should not be driven by amenability to certain impact evaluation methodologies.”

“NONIE advocates an eclectic and open approach to finding the best methods for the task of impact evaluation…”

“NONIE seeks to …use the best available methods from whatever source, and to develop new approaches that are can tackle unanswered questions about development impact, particularly for more complex interventions …that have so far been under-evaluated.”

18 ` “A tighter focus does have some costs and risks, including that developing countries may feel that their needs have not been fully considered or that a narrow view develops and gains currency.

` “...accept that advocacy is less urgent now that the profile of IE has increased and commissioning more impact evaluations internationally are being actively pursued by 3ie.

` “3ie is also providing services in synthesising the existing literature and identifying enduring questions.”

19 ` Impact: “….net changes for a particular group of people …. that can be attributed to a specific program…. using the best methodology available ….”

` Quality standards for inclusion in its IE database: “Quality evaluations employ either experimental or quasi- experimental approaches.”

` From the 3ie Glossary: “Impact evaluations have either an experimental or quasi-experimental design.”

20 Cordaid’s experiences with and lessons learned on Participatory Impact Assessment Cordaid

9 Almost 1000 partner organisations in 40 countries.

9 Programmes: ™ Participation ™ Emergency aid and reconstruction ™ Health & Wellbeing ™ Entrepreneurship & microfinance

13-10-2010 2 Why Participatory Impact Assessment? 1

Partner organisations in general have well‐developed informal feedback mechanisms But: –Reports and project evaluations don’t give information on impact and little on outcome –No use of comparison: no baseline data or triangulation – Programme evaluations commissioned by Cordaid for accountability face the problem of lack of data, and therefore their quality is often substandard

13-10-2010 3 Why Participatory Impact Assessment? 2

‐ Growing public and political pressure on development organisations to show results/outcome ‐ CIDIN (Dutch research institute) proposed Cordaid to pilot methodology for Impact Assessment. CIDIN is responsible for data gathering and processing, Cordaid finances pilot and connects with partner organisations ‐ Methodology is based on with/without + before/after as a principle for evidence based impact assessment.

13-10-2010 4 Objectives of Participatory Impact Assessement a) Design and implementation of a participatory system for impact assessment b) Generation of complementary insights from different methods, both quantitative and qualitative in PIA c) Strengthening the capacity of partner organisations in using impact assessment as an instrument for monitoring, learning, accountability and innovation d) Use the results of the impact assessment at the level of Cordaid for evaluation, learning, accountability and innovation

13-10-2010 5 13-10-2010 6 Methodological approach 1

• Strong statistical design (quasi‐experimental); – Counterfactual reconstructed through control group(s) – Interventions are not randomly assigned, but.. – minimized by difference‐in‐difference estimation (DiD) on matched sample • Mixed methods; – Qualitative inquires to obtain richer picture of how interventions affect people’s lives and to reveal underlying processes and societal transformations –Close monitoring of project progress to distinguish between design failure and implementation failure

13-10-2010 7 Methodological approach 2

• Focus on attitudes / capacities required to sustain impact; –Examples: trust, self‐esteem, patience, locus of control, capacity to aspire, risk aversion, etc. –Survey results triangulated with qualitative studies and/or field experiments • Participatory (at level of partner organisation); –Partner organisations involved in design, discussion on findings – Ultimate beneficiaries not involved in design, but can voice their opinion on project and are debriefed on survey results (pilot debriefing with treatment and control groups)

13-10-2010 8 Methodological approach 3

• What’s new? –Using control groups and baselines: this is no common practice for (Dutch) NGO’s like Cordaid, nor for partner organisations. – Added dimensions of attitudinal and subjective well‐being indicators are innovative in impact assessment; f.i. trust in different actors – Participatory design: partners were involved in design and choosing control groups; discussion of results,etc. – Feedback: Results of surveys were discussed with treatment and control groups in one area.

13-10-2010 9 Process from 2007 – 2010

•2 partners in Ghana (health care), 2 partners in (income generation), 2 partners in Peru (water & sanitation) • 2007: ‘Design workshops’ in each country with the two partners and local research institutes, to develop the indicators for the impact assessment • 2008: 6 base line surveys and some additional qualitative research

13-10-2010 10 Process from 2007 – 2010

2008: ‘Feed back workshops’ with partner organisations and research institutes to discuss findings of baseline

2009 – 2010: follow‐up surveys feed‐back discussion with beneficiaries and control groups in Ghana

July 2010: conference with all participants (partner organisations, local research institutes, CIDIN, Cordaid) to discuss process, results, lessons learned and13-10-2010 next steps. 11 Preliminary insights of PIA:

Illustration: Impact of Diocesan hospital services in Ghana •High client satisfaction and trust. Insight in accessibility of curative and preventive services. As a result of PIA Diocese now focuses more on education in further away areas • Comparison of insecticide‐treated bednet (ITN) use between treatment and control communities suggests perverse incentive of access to curative care on preventive efforts. National health insurance (NHI) scheme reinforces incentive problem

13-10-2010 12 Preliminary insights of PIA:

Illustration: Qualitative study on access to women self‐heIp groups (SHGs) in Orissa, India •In‐depth interviews reveal that SHG access for young married women crucially hinges on attitude (conservative vs. liberal) of family‐in‐law, in particular mother‐in‐law Illustration: Impact of water & sanitation project in Cusco, Peru • Incidence of diarrhea decreased in households that received health education and water meters. There is a strong complementarity between the introduction of water meters and health education.

13-10-2010 13 Possibilities of PIA 1

• Baseline and follow‐up surveys, complemented with qualitative research, gave some new insights in interventions (also unintended effects) • Surveys give detailed information at the level of beneficiaries (surprisingly enough this level of information was not always available before) •Can be used to ‘ test’ different interventions and compare them

13-10-2010 14 Possibilities of PIA 2

•“Data and Dialogue”: information gives important input in discussion on intervention strategies, choice of beneficiaries, etc. • Increased understanding and cooperation between ‘researchers’ and ‘development practitioners’ •(Some) Partnerorganisations see it as a possibility to attract other donor funds (that require evidence based assessment), esp. in Peru (Cordaid is withdrawing)

13-10-2010 15 Limitations of PIA 1 •PIA ‘dictates’ what to measure: specific interventions at household level. Doesn’t give information on systemlevel, power relations, etc. => Interventions of partners are much broader than what has been included in the PIA. • Control groups are difficult to define, spillover is a real problem. Also ‘indirect’ spillover f.i. as a result of influencing local politics • Requires quite sophisticated knowledge on statistical data analysis, this limits ownership of 13-10-2010methodology with partner organisations 16 Limitations of PIA 2

•Not all indicators can be measured through surveys. • Evidence is context specific, interventions are not necessarily effective in other context. •It is not possible to generalise the evidence at project level to the effectiveness of a programme (consisting of more than 100 projects/partners), therefore use for accountability at level of Cordaid is very limited.

13-10-2010 17 Limitations of PIA 3

•Cracking a nut with a sledgehammer? Costs and time invested are very high (€ 150.000 a year PIA/ € 575.000 total investment Cordaid in 6 projects = 26%) • Participation of partners and communities in PIA has its limits because of the rigour in methodological set‐up, data gathering, etc.

13-10-2010 18 Conclusion/discussion • Pressure on more quantitative data has been translated (by researchers) in a quasi experimental statistical design as ‘only’ way • Evidence based should be improved, but quasi experimental design is not appropriate for most of the development Cordaid and its partners are involved in. •Impact evaluation should follow the nature of development, not the other way around. Applying one specific method for impact assessment for different interventions is not realistic, design has to be flexible. 13-10-2010 19 •Cost effectiveness should be taken into account, although more resources are needed to improve M&E practice. But PIA turns out to be too expensive (replication will not be cheaper, because of intensive data gathering and data analysis)

13-10-2010 20 •Way forward: – Invest more in discussing theories of change, and underlying assumptions, together with actors involved. –Define data needed to investigate these ToC, include baseline –Make reporting more appropriate, include participatory monitoring (data & dialogue) –Improve quality of programme evaluations, including comparison (baseline, comparative case studies, triangulation of data)

13-10-2010 21 BUT……. Dutch Ministry (largest back donor – 60% total) requires baseline ánd control group/reference group as a basis for funding for the next financing period (2011 – 2015)

13-10-2010 22 13-10-2010 23 Our Concern

„ Trends that deligitimise methods, with privileged methods determining what gets funded „ Old issues, new contexts, new ideas? The debates of the 1980s shifted to international development arena Our Story Line

„ Four panel members: • Zenda, ex AfREA president – where do concerns come from, what’s at stake? • Rens, evaluation coordinator Cordaid – trying ‘emerging orthodoxy’ • Myself, independent consultant – conference in May on rigorous evaluative practice that embraces complexity • Rick Davies, well known from MandENews - discussant Rethinking Rigour

Revisited: Improving the ReflectionsQuality of from Evaluative ‘Evaluation Practice by Embracing Complexity ’

Irene Guijt, Learning by Design EES Prague 2010 What’s the concern?

„ Societal transformation in development: • Multiple nested activities + many different players • In for the long haul • No guarantees, emergent

‘those development programs that are most precisely and easily measured are the least transformational, and those programs that are most transformational are the least measurable’ (Natsios) Societal transformation (change intention in a specific context)

values

rigorous complexity evaluative quality practice standards

methodological implicit options explicit choices Complexity… Context: more/fewer variables & their volatility ✚ Intention of change: more/less tangible & within one’s control Î Nature of intervention: more/less linear, stable, predictable, multi- actor … so what? What then is an appropriate evaluation strategy and what rigour norms apply? Values

„ Underpin evaluative practice: design, do, make sense • whose values count (e.g. CORT, PADEV) • sharing power – implications for rhythm and design • ontological preference

„ Surfacing & considering values during process • Aspect of rigour Rigour at the heart

„ Differences in judgement of ‘quality’ • Choice of what matters • Intellectual honesty • Applying standards consistently „ Privileging of internal validity • attribution-driven statistical causality • ‘zero generalizability concern’ Rigour Considerations

„ Relevance of method and then rigour in its use (Schwandt 1981)

„ Principles rather than methods: systematic, empirical, critical, technically competent, inclusive, agile, self-aware „ Adaptively managing design and implementation „ Explicit about limitations and assumptions

„ Rigorous and/or vigorous ...

„ Credibility for whom: whose values count in value judgements, who counts for whom „ Quality of sensemaking process (eg CORT)

„ Quality of application, the mastery Standards

„ Current standards – utility, accuracy, propriety, feasibility - get us a long way „ How known/used are these standards? „ Are they enough? • Standards are situational – whose standards • Standard on ‘systemic-ness’ of evaluation • Standards are normative, whose values count Trade offs

„ Where do trade-offs occur? • ownership vs controls; RCTs vs context; statistically skilled vs ownership; independence vs utility

„ What are the consequences of the inevitable, value-driven choices for evaluation findings and for development? (Re)legitimising Methods for Complexity

1. Building the case for choice • Rigorously documented examples • Articulating choices and tradeoffs • Meta evaluation to enhance credibility 2. Own practice as change • Negotiating choice • Risk taking • Enhancing rigorous of use re ALL applications 3. Networking and linking • sharing what works and doesn’t Orthodoxies and Opportunities

Some broad questions to consider… Improvement, learning and evolution... • Biological evolution is NOT about the survival of the fittest, but the non-survival of the least fit. • Picking winners may not be the way to go • This process leaves room for some diversity amongst those that survive, and this diversity enables further evolution. • Culling failures based on local circumstances may be more appropriate Issues for discussion....

1. Is there a dominant/dominating Evaluation orthodoxy? • Or, is there a real possibility of one emerging in the near future? 2. If there is a dominant/dominating orthodoxy… • Is this a necessarily a bad thing? • If so, why? 3. If there is a dominant/dominating orthodoxy, and it’s a bad thing… • What should we do about it? • Re-think rigour? • Emphasise other aspects of quality, not just rigour? • Examine political interests at stake, not just issues of evaluation methodology? [supplementaries]

• If methodology does matter, how can diversity and innovation be promoted? • Are there alternate funding mechanisms? • If some kinds of projects cannot be evaluated should they be funded? • Evaluation at what cost? Are there some upper limits and if so, where are they?