Canadian Journal of Program Evaluation

The La Canadian Revue Journal canadienne of Program d’évaluation Evaluation de programme

33.3 Special Issue 2019 / Numéro spécial 2019

Next Generation Program Theory

Editor’s Remarks / Un mot de la rédactrice v Isabelle Bourgeois

ARTICLES

The Current Landscape of Program The orizing 287 Jane Whynot, Steve Montague, and Sebastian Lemire

Using Actor-Based Theories of Change to Conduct Robust Evaluation in Complex Settings 292 Andrew Koleros and John Mayne

Does Your Implementation Fit Your Theory of Change? 316 Steve Montague

Can’t See the Wood for the Logframe: Integrating Logframes and Theories of Change in Development Evaluation 336 Gordon Freer and Sebastian Lemire

Knitting Theory in STEM Performance Stories: Experiences in Developing a Performance Framework 354 Jane Whynot, Catherine Mavriplis, Annemieke Farenhorst, Eve Langelier, Tamara Franz-Odendaal, and Lesley Shannon

Till Time (and Poor Planning) Do Us Part: Programs as Dynamic Systems—Incorporating Planning of Sustainability into Theories of Change 375 Sanjeev Sridharan and April Nakaima Meta-Modeling Housing First: A Theory-Based Synthesis Approach 395 Sebastian Lemire and Christina A. Christie

How We Model Matters: A Manifesto for the Next Generation of Program Th eorizing 414 Sebastian Lemire, Jane Whynot, and Steve Montague

Peer Reviewers for Volume 33 and Manuscripts Submitted in 2018 / Examinateurs des manuscrits du volume 33 et des manuscrits soumis en 2018 435

Instructions to Authors 439 Instructions aux auteurs 441 Editorial Team / Équipe éditoriale

Editor-in-chief / Isabelle Bourgeois, Ph.D. Rédactrice en chef Associate Professor, École nationale d’administration publique Université du Québec (819) 771–6095 ext. 2231 Associate Editor / Jill Anne Chouinard, Ph.D. Rédactrice associée Assistant Professor, University of North Carolina at Greensboro Associate French Language Astrid Brousselle Editor / Rédactrice Director, School of Public Administration associée francophone University of Victoria Book Review Editor / Jane Whynot Rédactrice, Comptes University of Ottawa rendus de livres Editorial Coordinator / Emily Taylor Coordonnatrice, rédaction Ontario Tobacco Research Unit University of Toronto EDITORIAL BOARD / COMITÉ DE RÉDACTION Courtney Amo A/Director, Evaluation and Risk Directorate; Atlantic Canada Opportunities Agency (ACOA), Government of Canada Tim Aubry School of Psychology, University of Ottawa Nicole Bowman University of Wisconsin, Madison Ayesha Boyce University of North Carolina at Greensboro Bernadette Campbell Carleton University Brad Cousins Faculty of Education, University of Ottawa Sarah Earl YMCA GTA Paul Favaro Chief of Assessment and Accountability, Peel Board of Education Marie Gervais Université Laval Steve Jacob Université Laval Marlène Laeubli Consultant, Switzerland Chris Lovato University of British Columbia John Mayne Consultant, Ottawa James McDavid School of Public Administration, University of Victoria Céline Mercier Centre de réadaptation Lisette Dupras, Lachine Anita Myers Department of Health Studies, University of Waterloo Michael Obrecht Consultant, Ottawa John Owen Centre for Program Evaluation, University of Melbourne Burt Perrin Consultant, France Cheryl Poth University of Alberta Hallie Preskill Executive Director, FSG Social Impact Consultants, San Francisco Lynda Rey École nationale d’administration publique Lucie Richard Faculté des sciences infirmières, Université de Montréal Valéry Ridde Université de Montréal Ray C. Rist The World Bank, Washington, D.C. Daniela Schröter Western Michigan University Robert Schwartz University of Toronto Mark Seasons School of Planning, University of Waterloo Souraya Sidani Ryerson University Nick L. Smith School of Education, Syracuse University Sanjeev Sridharan Health Policy Management & Evaluation, University of Toronto

PRINTED AND BOUND IN CANADA PUBLICATIONS MAIL AGREEMENT NO 40600510 RETURN UNDELIVERABLE CANADIAN ADDRESSES TO UNIVERSITY OF TORONTO PRESS, JOURNALS DIVISION 5201 DUFFERIN ST., TORONTO, ONTARIO, M3H 5T8 EMAIL: [email protected] Editor’s Remarks

Th e guest editors for this special issue, Jane Whynot, Sebastian Lemire, and Steve Montague, bring to our attention the continued importance of program theory. As they mention in their introduction, program theory is certainly not new to the field of evaluation; however, as our collective experiences continue to expand our understanding of program theorizing, we must take stock of what is known and what we need to develop further to meet the needs of organizations and stake holders. Through practical examples and in-depth analyses, the papers included in this collection help us move beyond descriptions of program theory and consider instead how program theorizing can be conducted successfully in increasingly complex structures. I am pleased to introduce this special issue and thank the guest editors and authors for their contributions to the CJPE . Isabelle Bourgeois Editor-in-Chief

Jane Whynot, Sebastian Lemire et Steve Montague, les rédacteurs invités pour ce numéro spécial, nous rappellent l’importance de bien décrire les théories d’intervention sous-jacentes aux programmes que nous évaluons. Comme ils le mentionnent dans leur introduction, les théories des programmes ne sont pas nouvelles dans notre domaine ; cependant, à mesure que nos expériences collec tives continuent à élargir notre compréhension des enjeux associés à l’élaboration de ces théories, nous devons identifier les meilleurs moyens de répondre aux besoins des organisations et des intervenants. À partir d’exemples pratiques et d’analyses approfondies, les articles de ce numéro spécial nous permettent de mieux comprendre comment l’élaboration de théories des programmes peut con tribuer à l’évaluation de structures de plus en plus complexes. Je suis heureuse de vous présenter ce numéro spécial et je remercie les rédacteurs invités et les auteurs pour leur contribution à la RCÉP . Isabelle Bourgeois Rédactrice en chef

© 2019 Canadian Journal of Program Evaluation / La Revue canadienne d’évaluation de programme 33.3 (Special Issue / Numéro special), vi doi: 10.3138/cjpe.33.3.vi The Current Landscape of Program Theorizing

Jane Whynot University of Ottawa Steve Montague Performance Management Network Sebastian Lemire University of California, Los Angeles

Th ere is a long and rich tradition of developing and using program theories in evaluation. This commitment is reflected clearly within the Canadian federal government evaluation context. Despite two evaluation policy updates in the last decade, five out of the six accompanying guidance materials explicitly posit theorizing as foundational to evaluation efforts, whether addressing the develop ment of performance measurement strategies (Treasury Board Secretariat, 2010), assessing resource utilization (Treasury Board Secretariat, 2013), rapid impact assessment (Treasury Board Secretariat, 2017), evaluating horizontal initiatives (Treasury Board Secretariat, 2012a), or program theory itself (Treasury Board Secretariat, 2012b). Theory has permeated the federal government evaluation landscape, the expectation clear that theorizing should underpin evaluation con versations. Expectations are still shrouded, however, by the various ways in which program theorizing can, and should, occur. Th e overarching aim of this special issue is to promote reflective practice in program theorizing: to expand and strengthen both the conceptual and technical foundations of program theories in evaluation and to establish theory develop ment and use as a fundamental tenet of evaluative thinking more broadly. Driving motivations for this special edition stem from decades of applied evaluation expe rience and cumulative, consistent reflection on behalf of the scholar practitioner guest editors who probe at the paradoxes of program theorizing and its unrealized potential to achieve many of its promised benefits. Underpinning this objective is the critical, accompanying shift in thinking, moving from “program theories in evaluation is a good idea” to “these are good ideas for program theorizing.” To be sure, the use of program theories in evaluation is nothing new. Early program theory inklings can be traced back to the work of Donald Kirkpatrick (1959) , Edward Suchman (1967) , and Daniel Stuffl ebeam (1967) , among oth ers. However, the blossoming of program theories came with the formalization of theory-based evaluation in the 1980s (Bickman, 1987; Chen & Rossi, 1980, 1983). In Canada, program theory was embodied in component profi les, program models, and causal models, eventually becoming known as models that covered a review of the “rationale” of a program. These models were a fundamental piece

© 2019 Canadian Journal of Program Evaluation / La Revue canadienne d’évaluation de programme 33.3 (Special Issue / Numéro special), 287–291 doi: 10.3138/cjpe.56900 288 Whynot, Montague, and Lemire of the Canadian government’s nascent evaluation guidance in the early 1980s (Treasury Board of Canada Comptroller General, 1981). Since then, a steady flow of articles, special issues, and books dedicated to the topic of program theo ries specifically, and theory-based evaluation more generally, has emerged. Th e aspects of program theorizing covered in these contributions range broadly, including reflections on different types of program theories (Chen & Rossi, 1980, 1983; Funnell & Rogers, 2011), what constitutes good or even just decent pro gram theory (Mayne, 2015, 2017; Weiss, 1997), the role and purpose of program theory in evaluation (Bickman, 1987, 1990; Donaldson, 2007; Funnell & Rogers, 2011), how program theories are used in practice ( Bickman, 1987; Coryn, Noakes, Westine, & Schroter, 2011), and how to test and use program theories (Bickman, 1987, 1990; Rogers, Hacsi, Petrosino, & Huebner, 2000), to name but a few. Th e interest in describing and understanding the underlying logic of social programs is pervasive and persistent. Th is is not the place to provide a comprehensive and detailed review of the historical roots and developments of program theorizing in evaluation (see Funnell & Rogers, 2011, for an exemplary review). Suffice it to say that program theorizing continues to gain traction among evaluation scholars and practition ers, advancing our practice in new directions. Recently, in reflecting on past and present trends in theory-based evaluation, Brousselle and Buregeya (2018) argued for the rise of a new generation of theory-based evaluation reaching beyond summaries of “plausibility, effect and implementation analysis” to address the challenges and limitations presented within current program theory thinking to address complex social issues, and the associated complexity grounded in open systems, which are in turn embedded in multiple social systems (Brousselle & Buregeya, 2018, pp. 163–164). As Brousselle and Buregeya rhetorically ask, “How to deal with uncertainty created by interdependency among numerous actors who are constantly evolving and adapting? How to adapt to non-linear and sometimes unpredictable relationships? How to assess emergent and unanticipated outcomes resulting from relationships that are sometimes non-linear?” These are but a few of the key questions around which this next generation of theory-based evalua tion revolves. Anticipating this next generation of program theorizing, the special issue of fers six articles that all in their own way illustrate specific strategies for enhancing the conceptual development, empirical validation, and practical use of program theories in evaluation practice. In the first article, Andrew Koleros and John Mayne propose and illustrate the use of nested actor-based theories of change. Based on an application of a contribution analysis of a complex police-reform program, the authors compel lingly argue that “the strength of a contribution claim is only as good as the ToC [theory of change] being used” (p. 295). Toward building stronger and evaluable theories of change, the authors illustrate the development of a nested theory of change, wherein the complexity underlying a general theory of change for a program is further unfolded in a subset of nested actor-based theories of change, each of

© 2019 CJPE 33.3, 287–291 doi: 10.3138/cjpe.56900 The Current Landscape of Program Theorizing 289 which provides more fine-grained details on select aspects of the overarching theory of change for the program. In the second article, Montague argues that further systematic coding and analysis of change theories, action theories, and in particular their combinations in programs could produce useful insights for both evaluation and public-policy decision making. Motivated by the adage that the whole may be greater than the sum of the parts in terms of explanation, Montague cogently argues for fur ther codification of both implementation/action theories and change theories, whereby both can be considered and empirically examined in tandem as part of theory-based evaluations. Freer and Lemire, in the third article, continue the focus on the role and purpose of different types of theories. Writing in the context of development evaluation, the authors argue that while logframes and a theory of changes are complementary aids of thinking, they are typically developed in isolation from one another. As a result, and while the two models might display similarities and commonalities, logframes and theories of change are perceived as serving diff er ent roles and as reporting against different aspects of a program, and are not seen as complementary. Informed by a real-world example, the authors propose fi ve steps toward integrating these tools in program planning and evaluation. Jane Whynot and the Chairs for Women in Science and Engineering, in cluding Catherine Mavriplis, Annemieke Farenhorst, Eve Langelier, Tamara Franz-Odendaal, and Lesley Shannon, share the results of their practical and conceptual efforts integrating gender in program theory. They embrace theory knitting: drawing on and situating gendered expertise and experiences to address measurement and evaluation efforts in developing program theory to address the under-representation, recruitment, retention, and promotion of girls and women in STEM. Framing programs as “dynamic processes,” Sanjeev Sridharan and April Nakaima argue that “planning for sustainability needs to be a critical aspect of the impact chains of all theories of change” (p. 375). This involves recognizing that the impact pathways may differ across different participants and even change over time. Informed by an evaluation of an empowerment program for immigrant women, the authors compare and contrast a linear mechanical view of the change process with a view that explicitly incorporates planning for sustainability, argue for the important role of planning for sustainability, and consider the implications of planning for sustainability for the practice of theory-driven evaluation. Motivated by the surge of interest in mixed-methods and theory-based sys tematic reviews, Lemire and Christie promote and illustrate an application of meta-modeling, a theory-based synthesis approach. Combining meta-analysis and qualitative comparative analysis, meta-modeling offers a systematic and transparent approach to developing meta-models of programs across a broad range of existing studies. Based on a practical application of meta-modeling on Housing First, the authors call for further attention to and developments in theory-based synthesis approaches.

In the concluding article of this special issue, the guest editors—Sebastian Lemire, Jane Whynot, and Steve Montague—scale the proverbial soapbox and declare a call for action to strengthen and promote reflective practice in program theorizing. Motivated by their own successes and failures, and inspired and in formed by the significant contributions comprising the present special issue, the guest editors formulate ten declarations that collectively serve as a motivating and useful manifesto for the future of program theorizing in evaluation.

REFERENCES Bickman, L. (1987). Using program theory in evaluation. In L. Bickman (Ed.), Using pro gram theory in evaluation: New directions for program evaluation (Vol. 33, pp. 5–17). San Francisco, CA: Jossey-Bass. Bickman, L. (1990). Advances in program theory: New Directions for Program Evaluation (Vol. 47). San Francisco, CA: Jossey-Bass. Brousselle, A., & Buregeya, J. (2018). Theory-based evaluations: Framing the existence of the new theory in evaluation and the rise of the 5th generation. Evaluation, 24(2), 153–168. https://doi.org/10.1177/1356389018765487 Chen, H.-T., & Rossi, P. H. (1980). The multi-goal, theory driven approach to evaluation: A model linking basic and applied social science. Social Forces, 59 (1), 106–122. https:// doi.org/10.2307/2577835 Chen, H.-T., & Rossi, P. H. (1983). Evaluating with sense: The theory-driven approach. Evaluation Review, 7 (3), 283–302. https://doi.org/10.1177/0193841X8300700301 Coryn, C. L. S., Noakes, L. A., Westine, C. D., & Schroter, D. (2011). A systematic review of theory-driven evaluation practice from 1990 to 2009. American Journal of Evaluation, 32 (2), 199–226. https://doi.org/10.1177/1098214010389321 Donaldson, S. I. (2007). Program theory-driven evaluation science: Strategies and applica tions. Mahwah, NJ: Lawrence Erlbaum Associates. Funnell, S. C., & Rogers, P. J. (2011). Purposeful program theory: Effective use of theories of change and logic models. San Francisco, CA: Jossey-Bass. Kirkpatrick, D. (1959). Techniques for evaluation training programs. Journal of the Ameri can Society of Training Directors, 13 (11), 21–26. Mayne, J. (2015). Useful theory of change models. Th e Canadian Journal of Program Evalu ation, 30 (2), 119–142. Mayne, J. (2017). Theory of change analysis: Building robust theories of change. Th e Canadian Journal of Program Evaluation, 32(2), 155–173. https://doi.org/10.3138/ cjpe.31122 Rogers, P. J., Hacsi, T. A., Petrosino, A., & Huebner, T. A. (Eds.). (2000). Program theory in evaluation: Challenges and opportunities. New directions for evaluation, 87. San Francisco, CA: Jossey-Bass. Stufflebeam, D. S. (1967). The use and abuse of evaluation in title III. Th eory into Practice, 6 (3), 126–133. https://doi.org/10.1080/00405846709542071

Suchman, E. (1967). Evaluative research: Principles and practice in public service and social action programs. New York, NY: Russell Sage Foundation. Treasury Board of Canada Comptroller General. (1981). Guide on the program evalua tion function. Retrieved from https://www.tbs-sct.gc.ca/cee/pubs/guide1981-eng. asp#archived Treasury Board Secretariat. (2010). Supporting effective evaluations: A guide to develop ing performance measurement strategies. Retrieved from https://www.canada.ca/en/ treasury-board-secretariat/services/audit-evaluation/centre-excellence-evaluation/ guide-developing-performance-measurement-strategies.html Treasury Board Secretariat. (2012a). Guidance on the governance and management of evaluations of horizontal initiatives. Retrieved from https://www.canada.ca/en/ treasury-board-secretariat/services/audit-evaluation/centre-excellence-evaluation/ guidance-governance-management-evaluations-horizontal-initiatives.html Treasury Board Secretariat. (2012b). Theory-based approaches to evaluation: Concepts and practices. Retrieved from https://www.canada.ca/en/treasury-board-secretariat/ services/audit-evaluation/centre-excellence-evaluation/theory-based-approaches evaluation-concepts-practices.html Treasury Board Secretariat. (2013). Assessing program resource utilization when evalu ating federal programs. Retrieved from https://www.canada.ca/en/treasury board-secretariat/services/audit-evaluation/centre-excellence-evaluation/ assessing-program- resource-utilization-evaluating-federal-programs.html Treasury Board Secretariat. (2017). Guide to rapid impact evaluation. Retrieved from https://www.canada.ca/en/treasury-board-secretariat/services/audit-evaluation/ centre-excellence-evaluation/guide-rapid-impact-evaluation.html Weiss, C. H. (1997). How can theory-based evaluation make greater headway? Evaluation Review, 21(4), 501–524. https://doi.org/10.1177/0193841X9702100405

Andrew Koleros Mathematica Policy Research John Mayne

Abstract: The use of theories of change (ToCs) is a hallmark of sound evaluation practice. As interventions have become more complex, the development of ToCs that adequately unpack this complexity has become more challenging. Equally important is the development of evaluable ToCs, necessary for conducting robust theory-based evaluation approaches such as contribution analysis (CA). This article explores one approach to tackling these challenges through the use of nested actor-based ToCs using the case of an impact evaluation of a complex police-reform program in the Democratic Republic of Congo, describing how evaluable nested actor-based ToCs were built to structure the evaluation. Keywords: actors, complexity, pathways, theories of change

Résumé : L’utilisation d’une théorie du changement fait partie des bonnes pra tiques en évaluation. Compte tenu de la complexité croissante des interventions, l’élaboration de théories du changement reflétant cette complexité devient de plus en plus difficile. Parallèlement, la capacité d’utiliser ces modèles pour les évaluations axées sur la théorie, comme l’analyse de la contribution, est tout aussi importante. Cet article explore une approche visant à relever ces défis soit la théorie du change ment imbriquée et fondée sur les acteurs avec, comme exemple illustrant la perti nence de cette approche, le cas de l’évaluation des répercussions d’un programme complexe de réforme des corps policiers à la République démocratique du Congo. Mots clés : acteurs, complexité, cheminements, théories du changement

A key aim of this special issue is to enhance the development, validation, and use of program theories in evaluation practice. The introductory chapter provides background on the development and use of program theories in the evaluation of different types of interventions. One issue evaluators currently face to the use of program theories in evaluation practice is the fact that many settings now involve complex interventions (Byrne, 2013; Copestake, 2014; Garcia & Zazueta, 2015; Gerrits & Verweij, 2015; Ramalingam, 2013; Ramalingam & Jones, 2008),

Corresponding author: Andrew Koleros, Mathematica Policy Research, 955 Massachusetts Avenue, Suite 801, Cambridge, MA 02139; [email protected]

002_52946_Koleros_Mayne4.indd 292 21-02-2019 10:51:24 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 293

interventions that contain some or all of the following characteristics (Befani, Barnett, & Stern, 2014; Mayne, forthcoming):

• numerous types of activities undertaken; • a number of partners delivering different aspects of the intervention; • involvement of several layers of government; • many external factors at play; • a number of different actors involved in the intervention; • feedback loops; • emerging outcomes and uncertainty; and • a multi-year timeframe.

Without unpacking this complexity through program theory, evaluators struggle to fully identify what is going on in an intervention, determine what aspects of it are working (or not), and understand whether interventions are playing a positive causal role in bringing about (expected) change. A second issue that arises, especially in complex settings, is the need to ensure that program theories, or theories of change (ToCs), are “evaluable.” As many have noted (James, 2011; Stein & Valters, 2012; Valters, 2014; Vogel, 2012), theories of change in the literature are extremely varied, serving widely different purposes. What the evaluator needs is a ToC that is useful for evaluation purposes. Despite the importance of these two issues within current evaluation practice, developing evaluable program theories of change that adequately take account of this complexity within interventions remains a challenge for many evaluators (Funnell & Rogers, 2011; Rogers, 2008). In this article, we discuss the development and use of actor-based nested ToCs as one approach to addressing some of these challenges in order to articulate an evaluable ToC that both addresses complexity and allows for sound theory-based evaluation. Our aims in this article are as follows:

1. to describe how actor-based theories of change can be used to unpack complexity; 2. to discuss how to ensure that these actor-based ToCs are evaluable theories of change; and 3. to reinforce the importance of evaluable theories of change as the basis of sound evaluation practice.

We first present our definitions of ToCs and discuss the usefulness in complex settings of having a set of nested ToCs to represent the intervention in a practical and evaluable way. In particular, we highlight how the use of actor-based ToCs can facilitate this process and serve as the basis for sound theory-based evaluation practice. We then briefly outline contribution analysis, one theory-based approach in evaluation that relies heavily on solid, well-structured ToCs. These ideas and approaches are then illustrated through a discussion of key aspects of the evaluation of the Security Sector Accountability and Police Reform (SSAPR)

002_52946_Koleros_Mayne4.indd 293 21-02-2019 10:51:24 AM 294 Koleros and Mayne

program (Palladium, 2016) in the Democratic Republic of Congo (DRC), which used actor-based ToCs to conduct a theory-based evaluation using contribution analysis. We conclude with reflections on strengthening the use of actor-based ToCs in evaluation practice.

Using theories of change in complex settings Theories of change and nested theories of change As noted in the introduction to this Special Issue, program theories (theories of change) are a key tool in theory-based evaluations. But different terms are used in the literature. We will use the following terms: impact pathways describe causal pathways showing the linkages between the steps that go from activities to impact; a theory of change adds to an impact pathway by describing the causal assumptions behind the links in the pathway—what has to happen for the causal linkages to be realized (Mayne, 2015, p. 121). Thus theories of change comprise both a pathway of results as well as the causal link assumptions for each step in the causal sequence, explaining how the causal link is expected to happen. A theory of change sets out the impact story of the intervention, or part thereof. An important idea to keep in mind, especially when contemplat- ing a complex intervention, is that there need not be only one representation of an intervention’s theory of change (Mayne, 2015, 2018a, 2018b; Mayne & Johnson, 2015). We have listed the kinds of characteristics that can be found in complex interventions (see above). A detailed ToC for representing such interventions would be a very complicated model. Indeed, trying to capture such a complex intervention in a single ToC is neither realistic nor helpful. The result is often a “spaghetti” ToC of little use to anyone other than those who developed it. There is a need, rather, to unpack the complexity into more manageable forms. In discussing these complexity challenges, Mayne (2015) argues for the need for a high-level overview ToC to capture the big picture of the intervention and show the key pathways to impact but not the details, and nested ToCs to provide more detail on the specific pathways within the intervention. Mayne and Johnson (2015) provide some examples in the setting of agriculture research for development. The actor-based theories of changes discussed in this article often provide a very useful way to unpack this complexity. Complex interventions typically aim to change the behaviour of a number of actor groups involved in the intervention with specific activities undertaken to influence these groups. In the example discussed below, the actor groups include the police, the media, parliamentarians, the public, and the justice system. Actor-based ToCs involve building a ToC for each group to represent how the intervention aims to influence the group and the results that are then expected. Actor-based ToCs often interact with each other to produce the desired overall impacts, but it is of interest to know the actor impact stories: the extent to and the manner in which the intervention has in practice worked to influence these actor groups.

002_52946_Koleros_Mayne4.indd 294 21-02-2019 10:51:24 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 295

Contribution analysis: One theory-based evaluation approach As noted, ToCs are used in many theory-based evaluation approaches, such as realist evaluation and contribution analysis. Contribution analysis (CA) (Mayne, 2001, 2009, 2011, 2012a) is increasingly being used in evaluations of interventions in order to address the perennial challenge of “dealing with causality,” particularly in complex systems where changes in outcomes are the result of a number of factors in addition to the intervention—that is, situations where there are several factors acting together to bring about change, with each factor being a contributory cause and none being sufficient on its own to bring about the desired change (Mayne, 2012a; see also Buckley, 2016; Buregeya, Brousselle, Nour, & Loignon, 2017; Delahais & Toulemonde, 2017; Downes, Novicki, & Howard, 2018; Kane, Levine, Orians, & Reinelt, 2017; Mayne, 2012b; Noltze, Gaisbauer, Schwedersky, & Krapp, 2014; Terrapon-Pfaff, Gröne, Dienst, & Ortiz, 2018; and Ton, 2017. The aim is to make credible causal claims about the contribution an intervention is making to observed results. CA argues that if one can verify or confirm a theory of change with empirical evidence,1 then it is reasonable to conclude that the intervention in question has made a difference. This is based on the observation that a ToC is a model of the intervention as a contributory cause (Mayne, 2012a). The general steps in CA are as follows (Mayne, 2011):

Step 1: Set out the cause-effect issue to be addressed. Step 2: Develop the postulated theory of change for the intervention. Step 3: Gather the existing evidence on the theory of change. Step 4: Assemble and assess the contribution claim, and the challenges to it. Step 5: Seek out additional evidence to verify the theory of change. Step 6: Revise and strengthen the contribution claim.

ToCs are thus a key tool in CA. It is obvious, however, that the strength of a contribution claim is only as good as the ToC being used. A weak ToC can only lead to weak contribution claims. Strong, well-structured ToCs that can be evaluated will support credible contribution claims. In the SSAPR evaluation discussed below, we used criteria developed by Davies (2013) for “evaluable” theories of change, that is, theories of change that are useful for evaluative purposes, such as conducting CA. The need for evaluable ToCs is clear, but this can present the evaluator with significant challenges when the intervention already has developed a ToC that, while useful for some purposes, is not very evaluable. The use of actor-based nested ToCs can help overcome some of these challenges.

Using actor-based nested tocs to strUctUre the impact evalUation of a complex program Background on the SSAPR program The Security Sector Accountability and Police Reform (SSAPR) program in the Democratic Republic of Congo (DRC) was a five-year program funded by the UK

002_52946_Koleros_Mayne4.indd 295 21-02-2019 10:51:24 AM 296 Koleros and Mayne

Department for International Development (DFID) from January 2010 to May 2015. The stated goal of SSAPR at program design was to assist the Government of the DRC in laying the foundations for the re-establishment of the rule of law by supporting the creation of accountable and service-oriented security and justice institutions able to improve safety, security, and access to justice for Congolese citizens. SSAPR contributed to this overall goal specifically through support to the police-reform process, as well as improving the capacity of communities to cooperate with police and demand effective policing. The program comprised four independently managed components: Project Support to the Police, Control and Coordination of Security Sector, External Accountability, and Monitoring and Evaluation. The first three components were designed to target different Congolese actors and empower them to work together to improve the public sense of security in the program’s three pilot provinces (Bas Congo, Western Kasai, and South Kivu) as well as at the national policy level. Activities involved stakeholders from all aspects of public administration, police, and civil society, including the Ministry of Interior and Secu- rity (MIS), the Congolese National Police (Police Nationale Congolaise, PNC), the General Inspection Audit (Inspection Générale de l’Administration, IGA), the Secretariat of the Poverty Reduction Strategy Paper (PRSP), Parliament, civil society, and the media, along with magistrates and Congolese researchers. The Monitoring and Evaluation component of the program was commissioned after program design and was responsible for conducting ongoing program monitoring throughout implementation as well as the final impact evaluation of the program. Introduction to the SSAPR impact evaluation The evaluation aimed to answer the following questions for policy makers, program planners, and other stakeholders:

• Did SSAPR interventions contribute to changes in police capacity and accountability, stakeholder empowerment and ownership, and sustainability over the intervention period? How did these changes occur? • Did these changes contribute to more effective policing? How did this occur? • Did more effective policing contribute to improved security for Congo- lese citizens? • What were the key factors that contributed to these changes?

Based on these evaluation questions, the evaluation team opted for a quasi- experimental design nested within an overall theory-based evaluation approach using CA as the analytical approach to guide causal inference analysis. This design approach was selected for a number of reasons. First, the three pilot cities selected for the intervention were not selected at random but purposively selected based on a series of programmatic criteria, particularly the security situation in these sites. Thus, by definition, a selection bias existed in that these sites were chosen

002_52946_Koleros_Mayne4.indd 296 21-02-2019 10:51:24 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 297

because they exhibited a relatively poor security situation compared with other sites in the country. As such, a pure experimental design approach was not an option. We therefore decided on a difference-in-difference (DiD) approach based on city-level matching, to estimate the extent to which there was a change in population- level outcomes in program intervention sites. Each of the three intervention cities was matched with a comparison city to serve as its counterfactual. Cities were matched in 2010 based on a range of factors using available data, including relative population size, ethnicity, conflict level, region, language, recent historical trends, and geographical proximity. The DiD approach is particularly appropriate when only one causal factor is under investigation. In the case of SSAPR, however, many possible factors could have had an impact on an individual’s perception of security (the main outcome of interest), beyond the performance of the police. These included larger socio-economic changes occurring within these three provinces as well as the country itself during the five-year implementation period, including infrastructure, education, political economy, among others. Any of these factors could be considered a contributory cause to changes in the outcomes of interest. In addition, SSAPR was not implemented as one intervention uniformly in each of the three sites. Different interventions targeted different actor groups differently in each site, requiring us to further unpack what the “intervention” looked like in each program site. As such, the DiD design alone lacked the explanatory power to answer the “hows” and “whys” of program effectiveness as posed in the evaluation questions. We opted, therefore, to nest the DiD design within an overall theory- based design approach as a more appropriate framework for causal inference. Among the different theory-based approaches available, we found CA particularly appropriate for this evaluation, as its original inspiration was as a tool for cases in which multiple factors may have influenced the outcome. CA is particularly recognized for its ability to reduce uncertainty about the contribution of a given intervention to observed results by understanding why the observed results have occurred, highlighting the roles played by the intervention and other internal and external factors. Consequently, the six steps of CA were followed to design and implement the evaluation. The next section describes how the first two steps of CA were conducted to design the evaluation and develop an evaluable ToC for the program using nested actor-based ToCs. The following section outlines how the evaluation was implemented using these actor-based ToCs (steps 3–6 of CA). The intention of these sections is not to explain how we conducted each step of CA in detail as part of this evaluation. The full SSAPR impact evaluation report, including this methodology, is referenced for the interested reader, along with a number of other articles in the background section that provide a more comprehensive description of how to conduct each step of CA. Rather, this section focuses on how we integrated the use of nested actor-based ToCs into the overall CA exercise.

002_52946_Koleros_Mayne4.indd 297 21-02-2019 10:51:24 AM 298 Koleros and Mayne

Developing an evaluable ToC as part of evaluation design: Steps 1 and 2 of CA After identifying the specific cause-and-effect issues to be addressed through agreeing answerable evaluation questions with stakeholders, it was necessary to develop the postulated theory of change for the intervention. As discussed above, developing an evaluable ToC—Step 2 of CA—is a critical but often neglected component of conducting a robust CA. This can be challenging, however, when program stakeholders have previously developed their own ToC, which can have different aims than evaluation. In these cases it is necessary to develop a more “evaluable” program ToC while not reducing ownership of program stakeholders in the process, since getting their buy-in to the ToC is a necessary first step to obtaining their buy-in for the overall evaluation. This was the situation during the early stages of the SSAPR impact evaluation. Program stakeholders drafted their first ToC around the mid-point of program implementation, about two years before the evaluation began (Figure 1). The ToC development process therefore employed largely inductive approaches, involving extensive consultations with program stakeholders and based on SSAPR’s implementation experience since program start-up. At the outset of the evaluation, an evaluability assessment of this ToC was conducted. Although there are a number of definitions and interpretations of what makes a ToC “evaluable,” the nine criteria that Davies (2013) poses for assessing the evaluability of a program ToC served as pragmatic and useful guidance: clarity, relevance, plausibility, validity or reliability, testability, contextualized, consistent, complexity, and agreement. These criteria help one to assess the degree to which a ToC is clear and plausible and whether relevant data are available for evaluative purposes.

Improved local co-operation also leads to better responses to other human security needs WIDER HUMAN CAPACITY: Police have SECURITY greater capacity to respond Improved operational capacity to public needs enables greater responsiveness Economic, environmental, health, food security Key stakeholders better Improved police response to understand policing and are more Improved police public security needs (improved able to demand improved policing capacity to work with crime prevention and response) the public Improved police eﬀectiveness Information sharing and improves objective security EMPOWERMENT: joint planning Key stakeholders (communities, local authorities, civil Improved police-public-local OWNERSHIP AND Increased public sense of society groups) are authority cooperation SUSTAINABILITY: SSAPR security empowered to hold engages politically and the police to builds consensus around Key stakeholders are account and Public more ready to reforms more able to engage Trust increases demand improved co-operate with police with police because public policing Trust in police improves sense of security (subjective security) feel more Accountability mechanisms secure entrench idea that police are Greater public trust in and Key stakeholders are more able answerable to the public support for the police OTHER SECURITY ACTORS to hold the police to account Trust improves as police become State bodies: FARDC, ANR DGM ACCOUNTABILITY: Police more transparent and accountable increasingly uphold and are Non-state armed groups seen to uphold basic Community security groups standards of policing and to (and vigilantes) protect human rights Traditional justice mechanisms

figure 1: Original program theory of change

002_52946_Koleros_Mayne4.indd 298 21-02-2019 10:51:25 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 299

The assessment found that many of the elements of an “evaluable ToC” were sufficiently covered in the program ToC, particularly concerning its clarity, relevance, and plausibility. It also highlighted a number of elements of the ToC that would need to be strengthened in order to use it for evaluative purposes, for instance the testability of some of the causal linkages within the logic and its integration of the more complex aspects of the program, particularly with respect to the expected multiple interactions of different actor groups within the three pilot cities. This section first describes how the original program ToC was used to develop a more evaluable ToC by adopting an actor-based approach. This is followed by a discussion of how buy-in to these actor-based ToCs (and hence the evaluation design more generally) was gained by working closely with program stakeholders. An actor-based approach to unpacking a complex ToC Based on the results of this assessment, it was determined that some aspects of the ToC would need to be revised to render it more “evaluable.” The initial focus was on one of the key assessment findings, which suggested a need to better articulate the expected multiple interactions between program components within the program sites. As each component was designed to target different Congolese actors, the evaluation team first identified the key actor groups involved with each component at the community level. To achieve this, the core program components and activities were first mapped out. Then, for each activity, the target actor group for this activity was identified. An excerpt from the results of this mapping are presented in Table 1. This mapping exercise identified three key actor groups at the community level directly targeted by SSAPR activities: (i) the police (PNC), (ii) community members, and (iii) local authorities. For each of these three main actor groups, an actor-based ToC was built to outline the pathway for each actor group from SSAPR intervention to the intended program impact. The “useful ToC model” (Mayne, 2015) was adopted as an overall framework for this work. In this model, activities and results are depicted according to different levels within a program impact pathway as a sequence of results leading from program- level activities to the long-term intended impact of the program on the identified target population. These include the following levels: activities undertaken by the program; goods and services produced as direct outputs of these activities; the reach of these activities on the intended target groups and the target group’s reaction; changes in capacity (knowledge, attitudes, skills, etc.) of those reached by the program’s goods and services; the behavioural changes, or changes in practice, that occur among a target group reached; the direct benefits, or improvements, in the state of individual beneficiaries; and the well-being changes, or the longer- term improvements, in the overall lives of individuals. The useful ToC model also includes external influences, events, and conditions unrelated to the intervention that could contribute to the realization of the intended result, as well as the positive or negative unintended effects that occur as a result of the program’s activities and results. Lastly, and importantly, the useful

002_52946_Koleros_Mayne4.indd 299 21-02-2019 10:51:25 AM 300 Koleros and Mayne

table 1: SSAPR components and activities mapped to key actor groups

SSAPR Activities Actor groups component

pnc capacity: p nc has the capacity to be responsive to local security needs Mechanisms to • Police–community forums • Local police address public • Training police in community • Community members needs engagement • Local authorities • Work with local authorities Enabling legal • Technical support to drafting • Members of Parliament and policy new legislation, policies, and • Regional delegations framework operational guidance • National Police • Political engagement to ensure • Ministry of Security this is adopted • Ministry of Interior • Capacity building of MISDAC empowerment: Key stakeholders can demand improved policing and can hold the police to account Community • Preparing the public to engage • Community members engagement with police • Community scorecards Capacity of • Training and support to civil • Civil society non-state society, media, and research • Media actors actors regarding security and • Research actors accountability Discipline and • IG (Inspecteur Général); police • National Police / IG abuse of power complaints mechanism accountability: the pnc is internally and externally accountable for its actions Building con- • Daily engagement across state • State institutions sensus around institutions • Members of Parliament reforms • Building external demand • Civil society • Engagement at both national • Media and regional levels

ToC model includes assumptions about the causal links in the impact pathway: the salient events and conditions that have to occur for each link in the causal pathway to work as expected. Within the model, the arrows between boxes represent expected “causal links” (e.g., that changes in police knowledge and skills lead to changes in police practice), while the “causal link assumptions” explain how and why the causal link is expected to work. In practice, the information from the original program ToC and other relevant documents was first mapped onto the useful ToC model for each of the three actor groups. For instance, the different activities targeting the police, the

002_52946_Koleros_Mayne4.indd 300 21-02-2019 10:51:25 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 301

figure 2: Actor-based theory of change for local police

expected changes in capacity these activities were intended to bring about, and so on, were all mapped to the useful ToC model. Assumptions for each link in the impact pathway were then identified by arguing through the question of what salient events and conditions would need to occur for this pathway to hold true. These initial ToCs were discussed internally within the evaluation team over a series of iterative sessions, engaging both in discussion and consulting (and re-consulting) program documents. This process was particularly useful in articulating the assumptions at each link. An example of the initial actor-based ToC for the local police is included in Figure 2. The key assumptions included in the three actor-based ToCs largely concerned the ability of and motivation for these key actors to initially change practice, and the conditions that would need to exist to sustain these changes (e.g., changes in incentive structures for the police). Using these assumptions, a second round of actor mapping was then conducted, identifying additional actor groups who were most related to these key causal link assumptions holding true. For instance, in Figure 2 displaying the police impact story, Assumption box A3 includes an assumption around “strengthened internal accountability mechanisms existing within the police” as a causal link assumption that must hold true for “improved police capacity” to lead to “changes in police practice.” Establishing internal accountability mechanisms for the police required a change in practice that the program aimed to bring about through intervention with the Security

002_52946_Koleros_Mayne4.indd 301 21-02-2019 10:51:26 AM 302 Koleros and Mayne

figure 3: SSAPR actor-based theory of change

Sector Coordination Committee (CCOSS), including the National Police, the Ministry of Interior, and other actors at the national level. A number of secondary actors critical to these assumptions being met were thus identified. Additional nested actor-based ToCs were then built for each of these actors following the same process outlined above: civil society organizations, security sector coordination actors, the media, and parliamentarians within Provincial Assemblies. The different nested actor-based ToCs were then linked into an overview ToC articulating the overall impact pathway and key assumptions required for this impact pathway to hold true (i.e., for SSAPR component activities to cause an increased public sense of security). This is presented in Figure 3. This model then served as the overall ToC used to conduct the rest of the CA exercise. Getting buy-in from program stakeholders on the evaluable ToC The evaluable overview ToC displayed in Figure 3 is visually quite different from the program ToC developed by program stakeholders displayed in Figure 1. In order for the program stakeholders to buy into our overall evaluation design using CA, it was necessary for them to first have ownership over this revised program ToC, as the actor-based ToCs effectively became the frames of the impact stories where we presented overall evaluation findings. This was achieved through a number of stakeholder consultations, beginning with a workshop with the evaluation commissioners and program implementers to present the overall evaluation design using CA and discuss the importance of developing an evaluable ToC as part of this approach. This was

002_52946_Koleros_Mayne4.indd 302 21-02-2019 10:51:26 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 303

an important framing exercise to build a shared understanding that a complex intervention such as SSAPR can be depicted through many different models, with each model serving a specific purpose. For example, the original program ToC was built around the Capability, Accountability and Responsiveness (CAR) model of understanding governance, as program stakeholders felt it was important to capture both the operational aspects of the program’s interventions to strengthen police capacity as well as the focus on improving police governance around a democratic policing model (Moore & Teskey, 2006). The ToC also explained the program in the context of larger security-sector reform occurring in the DRC. At the time, this was a key program tool in driving more integrated program delivery at the site level, as opposed to the fragmented and “siloed” delivery by program components that had characterized the initial years of program intervention. This provided an entry point to discuss some of the limitations of the program ToC from an evaluability perspective in a way that did not discredit the importance of the original model for the purposes of more integrated delivery and shifting the thinking of implementers from a focus on police capacity to a broader perspective of police governance. From this common understanding, a number of one-on-one consultations with each program component were then organized to explain the approach to developing actor-based nested ToCs based on the original program ToC model in order to strengthen the causal logic within the original ToC and clarify the multiple interactions occurring at a site level. This was done by replicating the actor-based mapping exercise for each program component’s activities in a collaborative session. This was a very practical and hands-on exercise that participants found easy to engage in and stimulated good discussion around who the program is targeting and how, as well as differences in implementation in the three different program sites. The overall useful ToC model was then presented as a framing concept to understand more generally “how change happens” among actor groups. In comparison to using evaluation jargon around outputs, outcomes, and impacts, program stakeholders were able to relate to and more easily grasp the steps in the useful ToC causal model. For instance, program implementers could more easily discuss what changes in practice they wished to see in the different actor groups targeted and describe in their own words how they thought their program activities would lead to these changes in practice (e.g., building capacity through skills training). This not only led to a more meaningful conversation about the initial impact pathway for each actor group, which generated useful additions and clarifications to our initial logic, but also provided an opportunity to pose a number of questions that challenged program stakeholders’ logic and thus identified additional assumptions for inclusion in the nested actor-based ToC models. It also resulted in program implementers identifying a number of relevant program documents that could then be used in evaluation implementation. The next section discusses how these actor-based ToCs were then used to implement the evaluation and generate evaluation findings.

002_52946_Koleros_Mayne4.indd 303 21-02-2019 10:51:26 AM 304 Koleros and Mayne

how good actor-based tocs allow for good impact stories The previous section described how the original program ToC, relevant program documents, and a series of stakeholder consultations were used to develop an evaluable ToC for the evaluation (Steps 1 and 2 of CA). In this section, the use of nested actor-based ToCs to construct robust impact stories (Steps 3–6 of CA) is described. Gathering data for impact stories: Step 3 of CA Based on this overall ToC model, a set of indicators was first identified that could be used to assess the degree to which each step in the impact pathway held true for each actor group and the degree to which causal link assumptions were met. Existing possible data to report against each indicator were then gathered. This included a number of data sources produced by the program components, including program monitoring data, quantitative survey data and qualitative research studies. Many of the research studies used as secondary sources were research reports conducted by the Monitoring & Evaluation (M&E) component over the life of the project. The remainder of this section provides a short description of the main data sources used.2 Household surveys A main data source for the evaluation was the results of two cross-sectional representative household sample surveys in three SSAPR pilot sites and three matched comparison sites (three “city pairs”), using a structured questionnaire in 2010 at baseline and in 2014 at endline. Sampling followed a stratified sampling approach, which maximized the likelihood of matching and capturing comparable population samples over time. Over 8,000 individuals were sampled in each round, ensuring equal numbers of men and women. Data were collected using a quantitative questionnaire. Samples were analyzed through a difference-in- difference (DiD) approach to produce quantitative estimates of change over time attributable to the program. Program monitoring data All available program monitoring data were also gathered from the program components, including quarterly and annual reports and other thematic reports around specific interventions. After an initial review, the program was requested to provide additional data sources as well, which were highlighted in the initial reports. In total, the evaluation team reviewed and analyzed over 100 independent data sources from the program’s routine monitoring system. ToC monitoring ToC monitoring is a longitudinal research approach we adapted from the outcome mapping methodology, which allowed the research team to consider how SSAPR and other actors may have contributed to change, as a diverse range of actors influenced each change as part of complex, long-term processes. This study was

002_52946_Koleros_Mayne4.indd 304 21-02-2019 10:51:26 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 305

conducted during the last year of program implementation and involved conducting retrospective community scorecards and focus-group discussions (FGDs) during repeated visits to SSAPR intervention and comparison sites. Thematic research studies Two thematic research studies conducted by the M&E component were also drawn from for this. The first was a Rapid Assessment, Response and Evaluation study, conducted in 2013, which aimed to explore the link between the security and justice sectors in the DRC from both the supply and the demand sides. The second was a study on the state of gender mainstreaming as part of the process of reform in the PNC. Assembling and revising impact stories: Steps 4–6 of CA Following Step 4 of the CA approach, the nested actor-based ToCs were then used as the structure for constructing impact stories. For each actor-based ToC, all relevant data from the data sources gathered were systematically extracted and organized against each indicator. The evidence extracted for each indicator was then synthesized and assessed in terms of the strength of available evidence to support that each change occurred or causal link assumption was realized, using the following rating scale:

• GREEN indicated evidence was available confirming that a change occurred or an assumption was fully realized; • AMBER indicated mixed evidence for/against a change occurring or an assumption being realized; • RED indicated evidence available disproving a change or that an assumption was not realized; and • BLUE indicated that no/little evidence was available, so no conclusions could be drawn.

This was further distilled into a complex narrative of change, or impact story, structured along the causal impact pathway and presenting the multi-source evidence that supported causal and contributory claims. A separate impact story was developed for the three primary actor groups. An impact story around the Overview ToC was also developed to investigate the impact of the program on perceptions of public security. The four impact stories created are displayed in Figure 4. Each actor-based impact story also included an assessment of sustainability at the assumption level. Here, evidence was analyzed and presented on the degree to which the change occurred during the program implementing period, and the degree to which emerging evidence suggested that this change would be sustained after the program ended. Following Steps 5 and 6 of CA, impact stories were presented to program stakeholders in a participatory workshop in order to identify data gaps that could be filled by both program monitoring data and additional primary data collection. Based on initial stakeholder consultations, the evaluation team then sought

002_52946_Koleros_Mayne4.indd 305 21-02-2019 10:51:26 AM 306 Koleros and Mayne

figure 4: Nested actor-based impact stories

to fill any evidence gaps through additional primary data-collection exercises. The major evidence gaps identified were around the potential contribution of other external or influencing factors on study results. To fill these gaps, a number of focus-group discussions and key informant interviews with representatives of groups not previously targeted for in-depth qualitative research were conducted. It also identified a number of secondary data sources that could help to fill data gaps. The data from these additional data-gathering exercises were extracted and added to the indicator-synthesis exercise, strength-of-evidence assessment, and ultimately each impact story. The outcome was a revised and final set of impact stories. The findings from this last step were shared with program stakeholders. Both internal and external stakeholders, including independent quality-assurance reviewers with backgrounds in evaluation methods and security and justice programs in developing country settings, provided a subsequent round of comments. The results of these reviews were further incorporated into the impact stories.

example of stUdy findings To provide an illustrative example of what the final product of this analysis looked like, two excerpts are included below from the impact stories of two of the primary actor groups: the police and community members. The actor-based ToCs and impact story for all actor groups are available in the full evaluation report (Pal- ladium, 2016). Each impact story begins with a diagrammatic nested actor-based ToC outlining the main impact pathway and causal assumptions. Within these

002_52946_Koleros_Mayne4.indd 306 21-02-2019 10:51:26 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 307

diagrams the colours used above to represent the strength of available evidence are also used. Police impact story: Police officers exposed to SSAPR have positively changed their practice as a result of SSAPR intervention The police component of the program aimed to improve police practice to better serve the public and respond to community needs by training the police and providing infrastructure and equipment support. SSAPR reached a sufficient number of police officers with police support interventions in favour of the community policing model, or police de proximité (PdP), including intensive training and coaching for over 1,500 officers. For some police officers, this was the first training they had received. Several gender-targeted activities were also conducted, initially in support of victims of gender-based violence (GBV). Police stations were provided with infrastructure and equipment to support the police in dealing with female victims and cases of GBV. The program also incorporated training into the community policing approach on how to effectively deal with cases of GBV. Additional gender mainstreaming and gender-equity activities began relatively late in the program. This included integrating gender focal points into police operations and developing strategies to mainstream gender issues across the organization. Community members also reported improvements in police practice over time in SSAPR pilot sites, specifically among officers trained by the program in community policing. Quantitative analysis indicated that positive interactions

figure 5: Actor-based theory of change for police

002_52946_Koleros_Mayne4.indd 307 21-02-2019 10:51:26 AM 308 Koleros and Mayne

with a community police officer were highly correlated with improved perceptions of security. Community members in SSAPR pilot sites also reported seeing more frequent patrols over the life of the program. To support these changes, the program supported the media and civil society in establishing external accountability mechanisms in SSAPR pilot sites to hold the police to account for their behaviour. The volume of evidence gathered during this evaluation suggested that these changes were credibly the result of SSAPR activities. Participation in the police training program improved the capacity of the police and their practices, although work here may be beginning to slip, with declining police motivation and a reduction in activity within external accountability mechanisms. The evidence is more mixed on the degree to which changes in equipment and infrastructure supported by the program led to changes in police practice, although it seems to have had an effect on the community’s perceptions of the police’s capacity. Supplementary analysis also did not find evidence of any compelling external factors likely to have strongly influenced these changes, strengthening the certainty of our findings. Community-member impact story: Communities in SSAPR pilot sites have positively changed their practice around community engagement as a result of SSAPR The evaluation found community-engagement activities organized through community-group actors—namely, civil society organizations (CSOs) and the media—were successful in reaching a large proportion of the population with education and awareness-raising activities. Evaluation findings indicated that changes in CSO and media practice were effective in providing opportunities

figure 6: Actor-based theory of change for community members

002_52946_Koleros_Mayne4.indd 308 21-02-2019 10:51:27 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 309

for engagement with the police, through community fora (known as forums de quartier) and other community-engagement events. This led to changes in community practice regarding engagement with the police. More than one-third of respondents in household surveys in SSAPR intervention sites reported having participated in a community forum or other community-engagement event. Qualitative data gathered during this evaluation suggested that both the police and community members attended and actively participated in these fora and events. Given the large reach of these activities, and because no other security-related education or awareness campaigns occurred in SSAPR sites during the implementation period, it is reasonable to conclude that exposure to SSAPR-initiated activities led to this change in community capacity. As community-engagement activities were ongoing and able to sustain a high degree of interest and positive engagement from pilot communities over a number of years, the evaluation team concluded that changes in community capacity and practice with regard to the police were indeed the result of exposure to SSAPR community-engagement activities. Analysis also did not find any compelling evidence of external factors that could have strongly influenced these changes, strengthening the certainty of our findings. This sustained change in community practice, however, is predicated on other actors, such as CSOs, the media, and local authorities continuing to provide engagement opportunities for community members to use their improved capacity to better engage with the police. Although these opportunities existed over the life of the program, they were all funded, and in some cases coordinated, by SSAPR. Emerging evidence indicates that some of these activities may be beginning to decline in the absence of funding from the intervention.

conclUsion Strengths and weaknesses of the evaluation approach Overall, this experience using nested actor-based ToCs to evaluate complicated and complex aspects of an intervention through CA identified a number of strengths and weaknesses. There are a number of benefits: Unpacking complexity. The focus on actors was a useful way of understanding causality in a complex program such as SSAPR, with multiple component activities working with multiple target populations to bring about long-term and sustained change in the population at large. Identifying who was being targeted by which interventions, and how they were connected, helped to better articulate the causal logic and identify packages of interventions targeting specific stakeholder groups. Equally, linking causal-link assumptions between the different actor-based ToCs provided a robust picture of who needed to change (and how) in order to bring about the program’s intended results. This was important in engaging with program teams on what could realistically be expected to change, and in what timeframe.

002_52946_Koleros_Mayne4.indd 309 21-02-2019 10:51:27 AM 310 Koleros and Mayne

Engaging with stakeholders. The ability to engage program stakeholders into the actor-based approach was also a significant advantage. Stakeholders were better able to articulate and reflect on changes when put in the context of specific actor groups as opposed to more theoretical causal-impact pathways that are often used in theory-based approaches. This allowed for meaningful and constructive critique of initial draft impact stories and the identification of how the impact stories—and thus the contribution claims within the evaluation—could also be strengthened. Constructing impact stories. The nested actor-based ToCs also provided a pragmatic framework for organizing a wealth of information within each impact story. As hundreds of primary and secondary sources were analyzed to extract data for the impact stories, the actor-based focus helped us in coding and organizing large amounts of data. Assessing sustainability. This actor-based approach also helped to usefully build in an assessment of the likelihood of sustainability of the intervention. By focusing on how actors would need to sustain a behavioural change, it was possible to assess factors that would need to remain in order to create these favourable conditions and better determine which were likely to remain in the absence of program funds. Although many impact evaluations include “sustainability” as a key measure, it is often difficult in real terms to incorporate this into an evaluation design. The focus on actor-level sustainable behaviour change was a pragmatic way to address this critical component of impact evaluation. Despite these advantages, some challenges are also useful to reflect on for future applications of nested actor-based ToC approaches: Selecting impact stories. There were some initial difficulties in situating the focus of the impact stories. With six research sites and four impact stories in each site, it was not feasible to conduct over twenty individual impact stories or a higher-level synthesis, due to time and financial constraints. Therefore, it was necessary to decide early on how to best structure the impact stories. It was decided in the end to conduct one overarching impact story for each actor group and include “variations to the narrative” to report variation observed across actor groups and different sites. Significant time commitment. The development of these iterative impact stories did require a substantial time commitment from evaluation stakeholders, particularly the program teams. This was challenging, given that the program was shutting down and program staff were consequently under immense time pressure; however, their participation in the process was considered critical to the success of the approach. In the future, a realistic assessment of time commitments should be made and discussions held up front with program teams to manage expectations. Significant data requirements. Last, it should be noted that developing ToCs for each actor group—both the primary and secondary actor groups— required a significant amount of data. But there is a positive correlation between the amount of data used to construct impact stories and the strength of the contributory claims that can be made within each actor-based impact story. Both the

002_52946_Koleros_Mayne4.indd 310 21-02-2019 10:51:27 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 311

availability of data and the time required to gather and analyze them are important factors to be considered. Future improvements We continue to use actor-based ToCs in our evaluation work and have built in several enhancements since conducting this evaluation. As modelling behaviour change is the basis for these ToCs, we now see a stronger and more intuitive behaviour-change model, namely the COM-B model developed by Michie, van Stralen, and West (2011), whereby behaviour (B) occurs as the result of interaction between three necessary conditions: capabilities (C), opportunities (O), and motivation (M). For instance, in constructing the actor-based ToCs for each actor group as part of this evaluation, we found the “capacity” box within the useful ToC model to be particularly difficult to fully articulate and explain for each actor group, as different stakeholders had a different understanding of what “capacity” was—often limited to a technocratic perspective (e.g., knowledge and skills)—and we found that it did not fully capture constructs such as “attitudes” and “motivations” to change practice. Applying the COM-B framework more clearly identifies the conditions needed to bring about behaviour change. The COM-B ToC model is discussed by Mayne (2018b). Indeed, we have found that articulating the COM-B construct for each actor group can strengthen the causal logic within an actor-based ToC and more robustly identify causal-link assumptions. For example, by fully identifying the different conditions associated with a behaviour, it is possible to identify both an actor’s current capabilities, opportunities, and motivations (which explain their current behaviour) and how these conditions would need to change to stimulate the expected behavioural change. This allows one to better identify two sets of changes: the changes in behavioural conditions that are targeted by program intervention(s) and thus expected to change due to program intervention, strengthening the primary impact pathway of the actor-based ToC; and those that are not specifically targeted by the program but would need to hold true nonetheless for the behavioural change to occur, strengthening the causal-link assumptions in the actor-based ToC. This can serve as the basis for articulating an evaluable actor-based program theory useful for program design as well as evaluation. This is further discussed inKoleros, Mulkerne, Oldenbeuving, and Stein (2018). One of the key challenges in developing ToCs is identifying the assumptions for each causal link in the ToC. This was certainly the case in the SSAPR evaluation. More recently, Mayne has also developed further guidance on articulating assumptions. In Mayne (2018b), generic criteria for the COM-B model are discussed. Several of the criteria for robust ToCs involve testing the soundness and necessity of assumptions (Mayne, 2017). And in Mayne (2018a), there is a discussion on generating causal link assumptions. A key message of this article is that CA is only as good as the ToC used. A second improvement in the approach is using more clearly the criteria for a robust

002_52946_Koleros_Mayne4.indd 311 21-02-2019 10:51:27 AM 312 Koleros and Mayne

ToC. Starting with Davies’s conditions mentioned above, Mayne (2017) has developed criteria for robust ToCs around two sets of criteria:

• criteria for a structurally sound ToC, and • criteria for a structurally sound ToC that is plausible.

These criteria provide more structure for developing robust ToCs.

Final remarks Our use of actor-based ToCs on this evaluation as well as in other contexts has led us to conclude the following:

• since complex interventions target different actors, building actor-based ToCs is a pragmatic way to unpack the complexity of an intervention; • adopting an actor-based approach can help to strengthen the evaluability of a ToC, as it ensures that no key causal linkage between actor groups is omitted from the logic and that all actors targeted by an intervention are sufficiently included; • applying the useful ToC model as part of an actor-based ToC helps to build buy-in of an overall evaluation by relating difficult and complex concepts in easily understood terms, allowing stakeholders to more easily engage in the evaluation process and its findings; and • telling the story of a complex intervention is not obvious: focusing on the actors involved provides a basis for meaningful stories that stakeholders can relate to, while maintaining the causal logic of the overall ToC.

It is hoped that sharing this experience will contribute to the use of actor-based ToCs as a way to develop robust ToCs both for effective applications of CA and other similar approaches.

notes 1. CA does not rely on opinions about the contribution being made, as some have suggested (Schmitt & Krisch, 2017), but rather on empirically verifying a theory of change. 2. A more detailed data-collection method for each of these data sources, including the sampling strategy for the household survey mentioned below, is included in the full impact evaluation report.

references Befani, B., Barnett, C. & Stern, E. (2014). Introduction: Rethinking impact evaluation for development. IDS Bulletin, 46(6), 1–5. http://doi.org/10.1111/1759-5436.12108 Buckley, A. P. (2016). Using Contribution Analysis to evaluate small & medium enterprise support policy. Evaluation, 22(2), 129–148. https://doi.org/10.1177/1356389016638625

002_52946_Koleros_Mayne4.indd 312 21-02-2019 10:51:27 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 313

Buregeya, J. M., Brousselle, A., Nour, K., & Loignon, C. (2017). Comment évaluer les effets des évaluations d’impact sur la santé : le potentiel de l’analyse de contribution. Cana dian Journal of Program Evaluation, 32(1), 25–45. http://doi.org/10.3138/cjpe.31151 Byrne, D. (2013). Evaluating complex social interventions in a complex world. Evaluation, 19(3), 217–228. https://doi.org/10.1177/1356389013495617 Copestake, J. (2014). Credible impact evaluation in complex contexts: Confirma- tory and exploratory approaches. Evaluation, 20(4), 412–427. https://doi. org/10.1177/1356389014550559 Davies, R. (2013). Planning evaluability assessments: A synthesis of the literature with recommendations. Working Paper 40: DFID. Retrieved from https://www.gov.uk/ government/uploads/system/uploads/attachment_data/file/248656/wp40-planning- eval-assessments.pdf Delahais, T., & Toulemonde, J. (2017). Making rigorous causal claims in a real-life context: Has research contributed to sustainable forest management? Evaluation, 23(4), 370–388. https://doi.org/10.1177/1356389017733211 Downes, A., Novicki, E., & Howard, J. (2018). Using the contribution analysis approach to evaluate science impact: A case study of the National Institute for Occupational Safety and Health. American Journal of Evaluation. Advance online publication. https://doi. org/10.1177/1098214018767046 Funnell, S. C., & Rogers, P. J. (2011). Purposeful program theory. San Francisco, CA: Jossey- Bass. Garcia, J. R. & Zazueta, A. (2015). Going beyond mixed methods to mixed approaches: A systems perspective for asking the right questions. IDS Bulletin, 46(1), 30–43. https:// doi.org/10.1111/1759-5436.12119. Gerrits, L. & Verweij, S. (2015). Taking stock of complexity in evaluation: A discussion of three recent publications. Evaluation, 21(4), 481–491. https://doi. org/10.1177/1356389015605204 James, C. (2011). Theory of change review: A report commissioned by Comic Relief. Comic Relief. Retrieved from http://www.actknowledge.org/resources/documents/James_ ToC.pdf Kane, R., Levine, C., Orians, C., & Reinelt, C. (2017). Contribution analysis in policy work: Assessing advocacy’s influence. Centre for Evaluation Innovation. Retrieved from http://www.evaluationinnovation.org/sites/default/files/Contribution Analysis_0.pdf Koleros, A., Mulkerne, S., Oldenbeuving, M., & Stein, D. (2018). The actor-based change (ABC) framework: A pragmatic approach to program theory in complex systems. American Journal of Evaluation, advance online publication. http://www.doi. org/10.1177/1098214018786462 Mayne, J. (2001). Addressing attribution through contribution analysis: Using performance measures sensibly. Canadian Journal of Program Evaluation, 16(1), 1–24. Retrieved from https://pdfs.semanticscholar.org/7501/501b7fb4ee9f31985540f3e1ca661f262 ec6.pdf Mayne, J. (2009). Building an evaluative culture in organizations: The key to effective evaluation and results management. Canadian Journal of Program Evaluation, 24(2), 1–30. Retrieved from https://evaluationcanada.ca/secure/24-2-001.pdf

002_52946_Koleros_Mayne4.indd 313 21-02-2019 10:51:27 AM 314 Koleros and Mayne

Mayne, J. (2011). Contribution analysis: Addressing cause and effect. In R. Schwartz, K. Forss, & M. Marra (Eds.), Evaluating the complex (pp. 53–96). New Brunswick, NJ: Transaction. Mayne, J. (2012a). Contribution analysis: Coming of age? Evaluation, 18(3), 270–280. https://doi.org/10.1177/1356389012451663 Mayne, J. (Ed.). (2012b). Special issue: Contribution analysis. Evaluation, 18(3). Mayne, J. (2015). Useful theory of change models. Canadian Journal of Program Evalua tion, 30(2), 119–142. Retrieved from https://evaluationcanada.ca/system/files/cjpe- entries/30-2-119_0.pdf Mayne, J. (2017). Theory of change analysis: Building robust theories of change. Canadian Journal of Program Evaluation, 32(2), 155–173. Mayne, J. (2018a). Developing and using useful theories of change. Evergreen briefing note. Retrieved from https://www.researchgate.net/publication/323868372_Developing_ and_Using_Useful_ToCs Mayne, J. (2018b). The COM-B theory of change model: Working paper. Retrieved from https://www.researchgate.net/publication/323868561_The_COMB_ToC_Model4. Mayne, J. (Forthcoming). Realistic commissioning of impact evaluations: Getting what you ask for? In A. Paulson & M. Palenberg (Eds.), Evaluation and the pursuit of impact. Taylor and Francis. Mayne, J., & Johnson, N. (2015). Using theories of change in the CGIAR research program on agriculture for nutrition and health. Evaluation, 21(4), 407–428. https://doi. org/10.1177/1356389015605198 Michie, S., van Stralen, M. M., & West, R. (2011). The behaviour change wheel: A new method for characterising and designing behaviour change interventions. Implemen tation Science, 6(42), 11. https://doi.org/10.1186/1748-5908-6-42 Moore, M. and Teskey, G. (2006) The CAR framework: Capability, accountability, responsiveness. What do these mean, individually and collectively? Discussion note for DFID governance and conflict advisers. Retrieved fromhttp://www2.ids.ac.uk/gdr/ cfs/pdfs/CARframeworkDRCweb.pdf Noltze, M., Gaisbauer, F., Schwedersky, T., & Krapp, S. (2014). Contribution analysis as an evaluation strategy in the context of a sector-wide approach: Performance-based health financing in Rwanda. African Evaluation Journal, 2(1). https://doi.org/10.4102/ aej.v2i1.81 Palladium. (2016). Independent evaluation of the Security Sector Accountability and Police Reform Programme. Retrieved from https://assets.publishing.service.gov.uk/ government/uploads/system/uploads/attachment_data/file/563343/Eval-security- sector-accountability-police-reform-prog.pdf Ramalingam, B. (2013). Aid on the edge of chaos: Rethinking international cooperation in a complex world. Oxford, England: Oxford University Press. Ramalingam, B., & Jones, H. (2008). Exploring the science of complexity: Ideas and impli cations for development and humanitarian efforts. Working Paper 285. London, UK: Overseas Development Institute. Retrieved from https://www.odi.org/sites/odi.org. uk/files/odi-assets/publications-opinion-files/833.pdf

002_52946_Koleros_Mayne4.indd 314 21-02-2019 10:51:27 AM Using Actor-Based Theories of Change to Conduct Robust Evaluation 315

Rogers, P. (2008). Using programme theory to evaluate complicated and complex aspects of interventions. Evaluation, 14(1), 29–48. http://dx.doi.org/10.1177/1356389007084674. Schmitt, J., & Krisch, F. (2017). A mechanism-centered approach to evaluating complex aid interventions: The case of accompanying measures to budget support. Journal of MultiDisciplinary Evaluation, 13(28). Retrieved from http://journals.sfu.ca/jmde/ index.php/jmde_1/article/view/455 Stein, D., & Valters, C. (2012). Understanding ‘Theory of Change’ in international develop ment: A review of existing knowledge. The Asian Institute and the Justice and Security Research Programme. Retrieved from http://www.theoryofchange.org/wp-content/ uploads/toco_library/pdf/UNDERSTANDINGTHEORYOFChangeSteinValtersPN. pdf Terrapon-Pfaff, J., Gröne, M.-C., Dienst, C., & Ortiz, W. (2018). Impact pathways of small-scale energy projects in the global south: Findings from a systematic evaluation. Renewable and Sustainable Energy Reviews, 95, 84–94. https://doi.org/10.1016/j. rser.2018.06.045 Ton, G. (2017). Contribution analysis of a Bolivian innovation grant fund: Mixing methods to verify relevance, efficiency and effectiveness. Journal of Development Effectiveness, 9(1), 120–143. http://doi.org/10.1080/19439342.2016.1231702 Valters, C. (2014). Theories of change in international development: Communication, learn ing, or accountability? JSRP Paper 17: The Asia Foundation. Retrieved from http:// www.lse.ac.uk/internationalDevelopment/research/JSRP/downloads/JSRP17.Valters. pdf Vogel, I. (2012). Review of the use of ‘Theory of Change’ in international development. Department for International Development (DFID). Retrieved from http://www. oxfamblogs.org/fp2p/wp-content/uploads/DFID-ToC-Review_VogelV4.pdf

aUthor information Andrew Koleros is a researcher at Mathematica Policy Research. He has over fifteen years of experience in the design and implementation of programme monitoring, evaluation and learning systems for development programmes. John Mayne is an independent advisor on public sector performance. Over the past 13 years he has focused largely on international development evaluation and results-based management work.

002_52946_Koleros_Mayne4.indd 315 21-02-2019 10:51:27 AM Does Your Implementation Fit Your Theory of Change?

Steve Montague Performance Management Network

Abstract: A brief review of evaluation findings in almost any given domain typically reveals that most and sometimes all major findings deal with the implementation of initiatives—also known as action theory. Moreover, the findings regarding im plementation frequently allude to mismatches between the type or level of imple mentation occurring and the fundamental nature of the initiative. Case examples will illustrate that while all permutations and combinations of change and action theories cannot be summarily assessed, one can use case analysis to draw some les sons to suggest that some combinations are essentially toxic, while others provide at least a reasonable chance of success. The implication is that further systematic coding and analysis of change theories, action theories, and in particular their combinations in programs could produce useful insights for both evaluation and public-policy decision making. Keywords: action theory, implementation, program theory, public policy, theory of change

Résumé : Une revue rapide des résultats d’évaluation dans presque n’importe quel domaine révèlerait que la majorité des études abordent l’implantation des interven tions (théorie de l’action). De plus, les résultats d’analyse d’implantation indiquent fréquemment un manque de correspondance entre le type ou le niveau de mise en œuvre et la nature fondamentale de l’intervention. L’analyse de quelques cas indique que même s’il est impossible d’évaluer toutes les permutations et les combinaisons de théories de l’action et de changement, il est possible d’utiliser ces cas pour tirer certaines leçons qui suggèrent que certaines combinaisons sont essentiellement, toxiques, alors que d’autres indiquent une chance raisonnable de réussir. Une codi fication et une analyse systématiques plus poussées des théories du changement, des théories de l’action, et de leur combinaison peut conduire à des observations utiles autant pour l’évaluation que pour la prise de décision en matière de politiques publiques. Mots clés : théorie de l’action, implantation, théorie d’intervention, politiques pub liques, théorie du changement

Corresponding author: Steve Montague, Performance Management Network, 1945 Lauder Drive, Ottawa, ON, K2A 1B2, Canada; [email protected]

003_53008_Montague4.indd 316 21-02-2019 10:58:58 AM Does Your Implementation Fit Your Theory of Change? 317

Theories of change and program theories have been much discussed (as they are in this special edition); however, what has not been focused on lately are the key differences between what has been called the theory of change and the action or implementation theory. According to Rogers (2014), a “theory of change” es sentially explains how activities are understood to produce a series of results that contribute to intended impacts. Chen, Pan, Morosanu, and Turner (2018) have recently gone on to distinguish the theory of change and the action theory quite definitively. They note that the change model describes the causal process gener ated by the program and distinguish the action model from the change model as a “systematic plan for arranging staff, resources, settings and support organizations, to reach a target group and deliver intervention services” (p. 54). The Treasury Board of Canada Secretariat (TBS) (2010) has noted that such theories should include assumptions, risks, and external factors that describe how and why a program is intended to work. The TBS goes on to say that theory connects “the program’s” activities with its goals. It is inherent in the program design and is often based on knowledge and experience of the program, research, evaluations, best practices, and lessons learned. I contend that the problem with the TBS’s state ment is that it appears to conflate “theory of change” (i.e., an explanation of how and why a certain type of intervention will make a difference) with the action or implementation theory. As noted above, action (some have called it implementa tion) theory has been distinguished from change theory by Chen (2005, p. 23), who states that action theory

specifies the major activities a program needs to carry out, ensuring an environment for the program that is supportive or at least not hostile, recruiting and enrolling appropriate target group members to receive an intervention, hiring and training program staff, structuring modes of service delivery, designing an organization to coordinate efforts, and so on.

As an extension of Chen’s comments, see also Sager and Andereggen (2012), Mayne and Stern (2013), Montague and Porteous (2013), Renger, Bartel, and Foltysova (2013), and the six distinguishing elements of an action model (Chen et al., 2018). Indeed, Chen’s program-theory models typically show reasonably elaborate action theories. The problem is that many modern logic models, and certainly those found in Canadian studies, suffer from the conflation of these ideas, as shown in the TBS definition above. They either do not show the “action theory” at all or they “blend” in the key elements of Chen’s action theory.1 What results from this conflation is a framing for analysis that may not anticipate design, delivery, or implementation weaknesses that relate to how a program is implemented—as compared to weaknesses that are a result of problems or gaps in the change theory. Note that, in Chen’s model, more conceptual space is dedicated to the Action Model than to the Change Model and yet in most logic models, it is the other way around.

003_53008_Montague4.indd 317 21-02-2019 10:58:59 AM 318 Montague

An illustrAtive exAmple of the issue In order to illustrate this point, an example is in order. Consider an education or training program. The theory of change follows Kirkpatrick and Kirkpatrick’s (1994) learning theory, which states that there are four levels of change connect ing what Funnell and Rogers (2011) categorize as an individual-based theory of reasoned action:

1. Reaction: what participants thought and felt about the training; 2. Learning: the resulting increase in knowledge and/or skills, and change in attitudes; 3. Behaviour: transfer of knowledge, skills, and/or attitudes from class room to the job (change in job behaviour due to training program); and 4. Results (some evaluators might call these ultimate outcomes or im pacts): the final results that occurred because of attendance and par ticipation in a training program and due to the behavioural changes that ensued after the training (these are benefits and can be monetary, large-scale performance-based, etc.; they typically connect to mission goals).

(For an example of such a theory with assumptions, please see Koleros and Mayne in this special issue.) A typical logic model or theory of change description might simply show inputs and activities and outputs leading to the above-noted expected change. The problem here is that even if a broad range of assumptions and con textual factors are considered, when mapped simply to the change-theory logic they may only peripherally relate to the soundness of the design and delivery of the training. Many times in evaluations of training, the design and delivery of training are seen as critical to its success. The mode, medium, content, timing, and physical conditions surrounding the training are critical. How often have we seen that training offered to people has suffered from poorly designed materials, learning environments, timing, or tailoring (relevance and suitable format), or from deliv ery problems with those teaching, relationships between teachers and students, or linkages to supportive infrastructure, institutions, or individuals? How often have we seen a failure to reach the appropriate students in the first place? Such elements and assumptions are rarely included in theory of change depictions and logic models, yet experienced educational evaluators (and empirical evidence) suggest that delivery and design components can make a huge difference to the success of educational investments. The quality of teaching is particularly important; see, for example, Chetty, Friedman, and Rockoff (2014), whose statistical study sug gests that teacher quality is associated with huge differences in student outcomes. Lesson learned #1: For educational programs (theories of change), pay atten tion to the pedagogy and the quality of the teaching.

003_53008_Montague4.indd 318 21-02-2019 10:58:59 AM Does Your Implementation Fit Your Theory of Change? 319

the preponderAnce of the problem A quick perusal of recent evaluation findings shows that observations of de sign and delivery (action or implementation theory) elements are more com mon than observations regarding the theory of change. A quick review using Google of available online evaluation reports notes that items such as delivery timeliness, collaborative support, data and information sharing, the delivery of funding, and the clarity of roles and responsibilities were found to be prevalent. These components are all related to the manner in which a program is delivered, rather than focusing on its anticipated results. In summary, some of the learn ing in evaluation reports relates to how programs/initiatives are delivered, yet most logic models and frameworks are either silent or give short shrift to the implementation (action) theory that serves the theory of change. Therefore, evaluation learning in perhaps its area of highest potential is unsystematic and almost accidental. Other articles in this issue explore “useful” models for complex settings, nested models, actor-based models, and methods of getting away from “mecha nistic” approaches to depicting theories. In this article I look at simplified case examples to illustrate the value in recasting activities, outputs, and outcomes by area in order to distinguish and recognize the theory of action or implementa tion (these words will be considered synonymous) as separate but related to the application of a theory of change (ToC) itself. These cases will broadly illustrate the application of this thinking at different levels and contexts, but they will also note “lessons learned” in order to show some immediate practical results from adopting such an approach.

summAry cAses for considerAtion Case 1: “Cash for Clunkers” vs. infrastructure support as economic stimulus In the fall of 2008, policy analysts and economists debated what to do about that year’s recession. They accepted the need for large government stimulus packages, but their debates centred on where to put the money. Two types of stimulus pack ages that were often discussed in North America were major public works (infra structure) and tax “break” programs. Logically, the theories worked as follows:

1. Public infrastructure: • invest in (needed) infrastructure; • construction and “supply community” employment will be directly and indirectly created; • the economy will be stimulated / “saved” / maintained; • needed public infrastructure will be put in place in place; also ena bling • supporting goods and services markets to be maintained.

003_53008_Montague4.indd 319 21-02-2019 10:58:59 AM 320 Montague

2. Tax break: • provide a tax cut to taxpayers; • those people will spend more money in the economy; and • the economy will be stimulated / “saved” / maintained.

While not all economists agreed, option (1) tended to be favoured over option (2), essentially because the idea was that public infrastructure investments left a legacy to help further growth (think “Marshall Plan” for war-ravaged Europe or Asia), while tax breaks could serve to more strongly exacerbate deficits: one-time infrastructure spending creates a “temporary” deficit and builds assets, compared to tax breaks, which could cause more permanent structural deficits, especially since tax cuts are politically hard to reverse. Economists also noted issues with distributive effects (tax cuts tend to favour those who pay more taxes, namely the wealthy) and problems of absorption. People might tend to save money since times were uncertain, so the stimulus would not work its way into markets as “fully” as other incentives. This latter concern appeared to come true, as Ameri can savings rates jumped to their highest levels in the decade right around this time, and the wealthiest Americans—the primary beneficiaries of the tax cuts of the early 2000s—are shown to have saved disproportionately more than lower- income earners (Board of Governors of the U.S. Federal Reserve System, 2016). In the end, governments went mostly for infrastructure investments, with some “sweetening” of social programs; for example, in Canada the employment insur ance requirements were adjusted. So what happened? Tim Kiladze (2010) of the Globe and Mail noted that in vestors were far too optimistic about the extent to which public stimulus funding to infrastructure would boost the fortunes and bottom lines of the construction industry (and the economy). Kiladze quoted a market analyst as follows:

Very few meaningful stimulus related infrastructure projects were launched in 2009. In fact, somewhat ironically, some infrastructure spending was actually delayed ... as [provincial, state, and local] governments awaited funding from the federal stimulus coffers.

By contrast with the above, and assuming that some stimulus was needed to boost the economy, one of the few Canadian federal government programs that argu ably seems to have worked well—perhaps almost too well—in terms of spending money quickly was euphemistically called “Cash for Clunkers” (Elliot, 2009). The Canadian equivalent, “Retire Your Ride,” was essentially a rebate program to support people trading in their gas-guzzling, polluting older vehicles for more fuel-efficient, newer vehicles. The national program was in a way like a tax cut, but one-time, temporary, and based on a specific activity. It took off, meeting its annual targets of retiring 50,000 vehicles each year once the full program had been launched. Its evaluation noted that the program would have met its vehicle-retirement target of 200,000 in 48 months if it had not been delayed in its implementation by 21 months (Environment Canada, 2011). A large number of

003_53008_Montague4.indd 320 21-02-2019 10:58:59 AM Does Your Implementation Fit Your Theory of Change? 321

Provincial or State ‘Governance’ Governance Governments B-M relationship among all ‘negotiate’ terms agreed parties ‘consistent’

Government Regional and A suggests Municipal Projects Procurement Infrastructure Economic public Governments N-Z decided and contract projects well stimulus infrastructure get involved and mgt. process managed effects stimulus propose projects ‘sound’

NGOs and other interested parties Supply Supply present interests community community engaged capacity and (appropriately) support achieved Media and citizens weigh in

figure 1: A simplified logic flow for infrastructure spending of a central gov ernment in a federal system. those “retirements” were achieved in 2009, which, according to reports, provided a significant boost to automobile sales, possibly acting as a key push to reinvigor ate the automobile sector (Elliot). At a minimum, sales were accelerated during a time when spending stimulus was needed. Why was this kind of program able to work while the best-laid “grand design” programs for infrastructure fail? The an swer lies most likely not in the (mostly macro-)economic theory of one stimulus type versus another once in place; instead, it arguably lies in the implementation network or pathway of reach and relationships that these different initiatives require. Let us reconsider the infrastructure results pathway, including the reach and relationships “implementation” logic (see Figure 1). Figure 1 shows that the reach, roles, and relationships for the delivery of typi cal infrastructure investments are myriad, complex, and politically dynamic. This means that, for one thing, they tend to take time. The number of major capital infrastructure projects that have been completed on time and on budget (in North America at least) can practically be counted on one hand. The fact is that the struc ture of the roles, relationships, and authorities required to “action” the stimulus assistance, even when streamlined, works against speed in delivery. As noted by Kiladze (2010), the Government of Canada infrastructure program announced in 2008, but delivered late, likely had the perverse effect of “de-stimulating” the economy by delaying otherwise shovel-ready investments. By contrast, the key reach, relationships, and results for “Cash for Clunkers”-like programs are much more direct (see Figure 2).

Lesson learned #2: Infrastructure support programs in multi-jurisdictional settings tend to make for slower than expected stimulus funding.

Clearly, other levels of government, members of civil society, mass media, and citizens can still affect the relationship between the government provider and the user, but this relationship is still much more direct and “authoritative” than a program to create public infrastructure. In other words, government “A”—in this case the U.S. or Canadian government—has an unambiguous authority to offer

003_53008_Montague4.indd 321 21-02-2019 10:59:00 AM 322 Montague

Economic stimulus Government A Government A (American) effects suggests a ‘Cash offers incentive consumers take-up for Clunkers’ incentive to incentive advance car purchases

Environmental, Auto companies and dealerships health and other respect the program public good effects

figure 2: Simplified logic flow for “Cash for Clunkers” (“Retire Your Ride”)

citizens an incentive (a “carrot” or “bribe”) to trade in their vehicles for new ones. In most infrastructure investments, there are several jurisdictions and dozens of competing interests in play.

Lesson learned #3: Stimulus delivery speed = F(# +complexity of authorities, the number of key actors, transaction amount, relationship (strength, trust), technical complexity, and other factors).

The main point is that public policy makers need to examine the reach, relationships, and roles implied by particular schemes when deciding on a given course of action. This is especially true when the initiative is required within a tight time frame for stimulus spending. Infrastructure investments may be a “good” investment in terms of economic theory, but as quick stimuli they are often structurally handicapped. The implementation reality does not typically fit the theory of change in this case. Case 2: Repayable contributions for high-technology innovation Another way to look at implementation and change theory “fit” is to describe them along with their requisite assumptions and then review evidence. Table 1 summarizes a brief review of repayable contributions for innovation programs. The reviews used here involve more than a dozen studies, either conducted or re viewed by the author, of different types of innovation programs run over the past four decades by the Government of Canada. Some programs included repayability clauses and consortium or partnered delivery, while some did not. All related to some level of innovation commercialization as at least one intermediate outcome. Three theories are identified in Table 1: theory of need or theory of the problem, theory of change, and theory of implementation/action. For more on problem identification and analysis discussion, see the United Nations Development Pro gramme (2009, pp. 39–43).

Lesson learned #4: Proceed with extreme caution regarding the inclusion of repayability clauses in innovation funding agreements if you are dealing with projects that are a fair way from commercializing, but also if repayment triggers have widely interpretable definitions of success and consortium recipients have access to highly competent lawyers and accountants.

003_53008_Montague4.indd 322 21-02-2019 10:59:00 AM Does Your Implementation Fit Your Theory of Change? 323

table 1: Summary of findings from past innovation contribution program reviews

Theory Reality observed

Problem theory (cause–effect): Investment in sector X becoming increasingly There is insufficient innova multinational ... not clear that domestic invest tion in sector X caused by ment gap hinders innovation but rather gap in a lack of domestic invest international investor confidence plus other policy, ment to bring innovations to standards and marketplace barriers. market. Change theory (how a Cash does appear to enable many proponents to contribution mechanism move forward faster in many cases than they would should work) otherwise; however, if negotiations and payments A contribution will provide go too slowly, then the lack of certainty creates risk needed cash to companies aversion and may discourage highly innovative and to reduce their burden and less predictable investments in favour of stable and costs so as to encourage “safe” investments proposed by proponents. (See them (or allow them) to move repayability below.) innovations from discovery to commercialization. Implementation (Action) “Repayability” clauses were found to work against Theory: the need for nimble investment, since consortia are Repayability in the contribu often involved and repayment liability is handled tion will address WTO con- by lawyers and accountants. (Intellectual property cerns about unfair support. disputes among consortium members and the It will also ensure greater Government of Canada were also raised as barriers.) discipline in the innovations This “delay” and increased uncertainty create a to focus on getting product negative feedback loop, which leads to a slow, hard to market. process, which in turn leads to negative reactions, which lead to fewer good applicants and projects. In any case, when repayment has been contingent on commercial success, most funded innovation programs in Canada have had a very modest repay ment rate. It seems that the payment triggers can be quite easily avoided—especially by large and more sophisticated contribution recipients. Department will partner Associations were not always found to represent with the sector X industry all important sector interests—causing internal association in order to select political disputes and lack of trust in the process, and deliver projects—based reducing reach and slowing processes, reinforcing on the premise that the sector a negative feedback loop, leading to fewer good industry association will know applicants, which led to less success in areas of and represent the needs of newer development (i.e., less innovative innovation the sector for innovation. projects may get funded because of a bias toward the more developed interests of larger players who carry greater weight in associations and partnered program management initiatives and who create “a success to the successful”2 feedback loop.)

003_53008_Montague4.indd 323 21-02-2019 10:59:00 AM 324 Montague

Case 3: Administrative Monetary Penalties (AMPs) We can also review the use of theory regarding needs or problem areas, change theory, and implementation/action theory in the area of regulatory control mecha nisms. The idea of an Administrative Monetary Penalty (AMP) has a long history and connects to a deterrence pyramid “theory” stemming from Australian coal mining over many decades. An enforcement pyramid subjects regulated firms to escalating forms of regulatory intervention. These typically escalate from persua sion, a warning letter, civil penalty, criminal penalty, licence suspension, and then licence revocation (Ayres & Braithwaite, 1992). Table 2 summarizes the theory of the need or problem, the theory of change and the theory of action / implementation.

table 2: Summary of findings from past AMPs review

Theory Reality observed

Problem Theory Not all regulators need a civil sanction, and not all civil Regulators need an ability penalties fit easily into the “scaled” pyramid. to moderate an otherwise harsh response (i.e., too big a jump between warning and criminal penalty). Regulators have inefficient existing means available (i.e., need to improve efficiency in regulation). Change Theory (how a AMPs often do not work as planned due to both theory monetary penalty sanction being misapplied to context and implementation will work) factors. The AMPs mechanism (as a theory of change) A civil commercial penalty works where there exist (the Administrative • a high level of regulated party commitment to the Monetary Penalty—AMP) basic intent of the Act (low level of willful non is needed to fill out the compliance); deterrence pyramid and to • controlled inspection conditions; allow for scaled deterrence. • low complexity in terms of regulatory clauses and transactions; • a significant proportion of commercial transaction “value” represented by the AMP—the economics of the marketplace; • a belief on the part of the regulated party that enforcement actions will be upheld; and • a complementary naming and shaming mechanism is in place (i.e., charged names are published)— good for established companies with a potential for reputational loss.

(Continued)

003_53008_Montague4.indd 324 21-02-2019 10:59:00 AM Does Your Implementation Fit Your Theory of Change? 325

table 2: (Continued)

Theory Reality observed Implementation (Action) There can be significant cultural differences between Theory and among implementing parties which significantly Agencies will readily apply impact AMPs’ efficiency and effectiveness. AMPs to gain efficiencies The key stakeholders include: in sanctioning violators • associations (supplier and consumer) and will improve the cost- • policy makers and program proponents effectiveness of the whole system. • inspectors • enforcement officers • legal counsel • “review” institutions Other key implementation factors include: • clarity of language defining violations • knowledge by inspectors and investigators of what constitutes a violation • “commitment” to the promotion of regulatory compliance by inspectors and regulators • level of engagement with regulated parties—and their representatives • consistency of the interpretation of legal respon sibilities and authorities of all concerned parties and burden of proof / sufficiency of evidence

The main findings summarized here are from an evaluation study conducted for the Canadian Food Inspection Agency (CFIA) in 2011. This study undertook an in-depth review of theory available in the literature before examining the use of AMPs at CFIA. Lesson learned #5: AMPs fit much better for some types of applications (i.e., straightforward, high volume, inadvertent breech, strong supporting community) than for others. In addition, the implementation/action characteristics required for AMPs to work require a real commitment to deterrence by inspectors, infor mation sharing practices, and clear/easy to interpret authorities.

WhAt the evAluAtion teAm tried In cases 2 and 3 above, a systematic approach was used in real evaluation studies to come up with the findings noted in Tables 1 and 2. In summary, the study team

i) extracted the implementation theory from a conventional logic model; ii) lined it up with the change theory;

003_53008_Montague4.indd 325 21-02-2019 10:59:00 AM 326 Montague

iii) drew from research, experience, and analysis key assumptions and ena bling factors to examine what factors were important and how they connected; and iv) tested the “lined-up” theories with real case evidence.

The study team, led by the author, constructed models to test key stages of sig nificant cases for evidence of how key factors influenced performance and to test possible alternative explanations for results. The teams found that case-by case assessment allowed study teams to pinpoint evidence to attempt to validate contribution claims and then help explain key factors for success. This altered depiction enabled the teams to identify issues regarding funding structure, governance, activities flow, and sector engagement/participation, which could be shown to directly influence the nature of the change theory, behavioural results, and impacts. More importantly, the dialogue allowed all concerned to separate issues, observations, and “learning” related to program governance and implementation from concerns related to a broader theory of change. In other words, the approach likely helped avoid a rush to judgement regarding the merits of an overall approach to regulation or innovation by noting that the characteristics of the implementation design and delivery had a profound effect on how the change theory worked. Therefore, if one reads between the lines in the 2011 AMPs evaluation, the study is not saying that AMPs don’t work; it is saying that this instrument works better in some applications than others, due to some key enabling conditions. The laying out of a results chain showing how implementation (action) theory conditions change theory helps to establish the true (complicated and/or com plex) reality of a program. It can also help all stakeholders to understand that change occurs in and among different groups. Such an insight in turn suggests that there is merit in planning initiatives with different expected results chains for different groups (see the article by Koleros and Mayne in this special edition for one approach that illustrates this concept).

proposed ApproAch So analysts like Chen (2005) and realist evaluators like Pawson and Tilley (1997), Funnel and Rogers (2011), and most recently Brousselle and Buregeya (2018) have distinguished action/implementation from change theory. Brousselle and Buregeya suggest that logical analysis, contribution analysis, and realist evaluation have a grounding in critical realism and that we may be observing a fifth genera tion of evaluation. If so, it will require models that are intuitively accessible to a wide range of users; that is, the models can’t get too complicated. It is hoped that the cases presented above show the practical value (lessons learned) from thinking about how each component works, and most importantly how each component works with (or against) the other in terms of how implementation (design and delivery) arrangements match up with theories of change.

003_53008_Montague4.indd 326 21-02-2019 10:59:00 AM Does Your Implementation Fit Your Theory of Change? 327

As noted, traditional logic models do not extensively describe theory of any kind, let alone both theory of change and theory of action/implementation. At- tempts to include both can be “unwieldy,” to say the least. So what does an evaluator do? Need one rely on narratives and side descriptions to adjust conventional logic models? Can a comprehensible approach be undertaken? Over the past five years, along with other evaluation colleagues, I have developed an approach to systematically and sequentially consider the action/implementation theory and the change theory. This approach is summarized in Figure 3. The first step is to recognize the action/implementation logic or theory that is involved. The process evaluation questions related to “how” a program is delivered are relevant, along with governance questions, attributes, and so on that may be drawn for program-authorizing documentation and reviews such as internal audits. The next step is to establish the theory of change that accompanies the program. There may be more than one. For example, there may be an educational component, suggesting a theory such as Kirkpatrick and Kirkpatrick’s (1994) learning model, as well as a financial contribution designed to reduce cost burden and/or encourage particular behaviour (a “carrot” program). There may be some kind of potential penalty or sanction for those not ultimately complying (a “stick” program). Each of these theories should be “modelled” as applicable (see Bemelmans-Videc, Rist, & Vedung, 2011). Research can next be done on the theories to determine which factors have been important in shaping success. In addition, it will be important to look for

Theories Assumptions + factors Governance + Management Activities +outputs Reasoned action theories Public Management + Intermediaries Reach, But also: empowerment, Governance engagement + reactions diffusion, socio ecological, Intermediaries Capacity Change network + various engagement Internal + theory notions + ideas Intermediaries actions + support intermediary engagement Enabling Environment (due to Agency + intermediaries’ actions) Engagement (relationship) External target group Target group(s) Reach, theories followed by reasoned engagement engagement + reaction action theories - Target group capacity change bolstered by empowerment, Policy instrument diffusion, socio ecological, factors Behavioural Change /actions network theories PLUS theories related to policy Direct Benefits instruments (i.e. carrots, Broad contextual (for beneficiaries) sermons + sticks: e.g. factors deterrence pyramid, or pay for Goals/Impacts results) micro-economic theories, broader policy change theories etc. Implementation Theory Change Theory

figure 3: Theories of implementation (action) and change, key assumptions and factors. Montague, S. First used in 2015. Using Realistic Contribu- tion Analysis for Process and Impact Evaluations, CES Annual Learning Event, February 25, 2015

003_53008_Montague4.indd 327 21-02-2019 10:59:01 AM 328 Montague

Results Chain Assumptions / External Factors Set up (regulatory) process to encourage compliance to !! = Major variance ! = Limited variance ? = unknown relevant Act and regulations Institutional arrangements are conducive to cooperation / collaborative co-delivery ! Engagement of co-deliverer authorities Shared goals and perspective among key parties, re: risks, authorities and roles ! / !! Acceptance of common agenda and objectives by co- deliverers and support in terms of coordinated / collaborative efforts Context, conditions, legal framework allow for appropriate engagement of regulatees !! Engagement of regulated parties in terms of inspection / investigation and / or other deterrence mechanisms* Resources, authorities and other conditions allow for clear, timely and accurate issuance of warnings and penalties !! Notice of violation (warning or other) issued appropriately to regulatee and / or other deterrence actions appropriately conducted Regulatee understands and respects the system. Relative cost-benefit to regulatee to pay and comply ! Regulatee reacts appropriately to intervention (pays vs. appeals) Context, conditions, authorities and relationships allow for appeal timely and appropriate appeal review. !! pay Support to appeal process (appropriately) provided to appeal ‘tribunal’ or other Legal and situational factors allow for clear and consistent decisions Consistent and clear decisions rendered by appeals !! tribunal -Relative cost-benefit of compliance encourages compliance -Social and other conditions encourage compliance ! ? Regulatee is deterred from non-compliance or encouraged to comply Compliance will lead to improved health, safety or economic conditions, no compensating behaviours or phenomena (e.g. ‘work arounds’) will work against the benefits ? Health, safety, economic viability of Canadian food system

*Deterrence mechanisms could include information, consultations, programs, commercial or criminal sanctions / prosecutions, licence revocation or other actions intended to influence behaviour. Note. Shaded boxes show implementation elements.

figure 4: Regulatory initiative results chain for contribution analysis: Sum- mary of AMP observations. CFIA. (2012). Evaluation of Administrative Mon- etary Penalities (AMPs). http://inspection.gc.ca/about-the-cfia/accountability/ other-activities/audits-reviews-and-evaluations/evaluation-of-amps/amps/ eng/1337024520304/1337025417391

studies and analyses of how the proposed implementation strategy has worked in combination with the stated change theory. Figures 4 and 5 show the results of such an analysis: one regulatory application and one innovation support program. The importance of this approach is that it can provide insights with only minimal original data collection and can guide data collection so as to invest gradually, in a targeted fashion and as needed, to support evaluation questions. This can potentially save thousands of dollars in data collection. If, for example, early parts of the results logic show that the implementation strategy is not reaching the target users, then one need not proceed much further with an analysis of impact on that target group. In each of the examples shown, the insights had different emphases, but they also had some common elements. In the implementation of financial stimulus via infrastructure investments (as opposed to direct cash rebates for trading in one’s vehicle), the broader enabling environment would seem to play the fundamental role in determining success. Very little can be done about the federal system that exists in the United States and Canada, and these systems are in fact structured, re- spectively, by design and by evolution in order to temper the will of one government

003_53008_Montague4.indd 328 21-02-2019 10:59:02 AM Does Your Implementation Fit Your Theory of Change? 329

CTS Results Expectations Assumptions (External Factors)

1. NRCan determines need and defines CTS and component program objectives A. Appropriate information, understanding and analysis of problems convert into appropriate program design, investment 2. NRCan and co-delivery agents invest in program(s) B. Sufficient, appropriate and consistent funding and program assistance 3. The appropriate arrangements and (critical mass of) co-delivery agents engage with NRCan and other ‘partners/beneficiaries’ to develop the program C. Agendas remain consistent with key co-deliverers 4. Governance structures are formed and actively managed (Program Advisory Committees and OERD) D. Support climate allows for clear governance 5. Program priorities are (clearly) set and projects are solicited (appropriately) E. Economic, management and political circumstances allow for appropriate public and private sector engagement in project proposals 6. Appropriate public and private sector participation / engagement in project proposals * F. Key sector proponents have the capacity and commitment to participate 7. Appropriately targeted and realistic proposals supported (i.e., they respond to public and private in project proposals sector needs / market realities) G. Proponents have ‘will’ and ability to carry through on project 8. Projects are conducted as anticipated (appropriately addressing needs). commitments * H. Target communities attracted to participate / engage in initiatives 9. Appropriate target groups (e.g., regulatory, industry, research community, etc.) are reached by CTS dissemination activities. I. Information / technology developments are ‘attractive’ and compelling to * participants, used in making decisions 10. Groups reached by initiatives show positive reactions, capacity (knowledge, abilities, commitments): • public sector - willingness and commitment to using scientific evidence in decisions, key influencers have info they need • private sector – apply scientific knowledge and technology in development of new products and proceesses (vehicle fuels, systems and components) • International partners (US and others) consider Canadian transportation sector regulations, products and processes to be environmentally responsible and preferred choice J. Canadian transportation sector technologies (regulations, vehicles, fuels) are recognized as environmentally responsible, preferred choice (nationally 11. CTS objectives are met: and internationally) • Development and use of cleaner, sustainable transportation fuels and systems and regulations • Adoption of cleaner sustainable transportation vehicle fuels and systems in domestic and K. Canadian transportation fuels and systems are cost-competitive and meet international markets (sales of new technologies, fuels, transportation systems) international environmental standards 12. Reduced GHG and CAC emissions from transportation sector L. Clean transportation technologies are a competitive advantage for the Canadian transportation sector 13. Sustainable transportation sector M. Net benefit to Canadian transportation sector companies leads to net 14. Net benefit to Canada and Canadian communities benefits to Canada and Canadian communities *Key links for causal influence tests. Needs assessment, priority setting and governance Longer term impacts and mission The engagement, reaction and supportive actions factors appear to have strongly affected CTS programs achievement. of key reach groups has varied considerably.

figure 5: CTS results chain factors and assumptions. Natural Resources Canada. (2013). Evaluation of Clean Transportation Systems Portfolio. https:// www.nrcan.gc.ca/evaluation/reports/2013/14844#3_2

over others. This structural condition of the political economy—including the vast geography of both countries—would seem to create what amounts to a fundamental barrier to conducting large-scale centrally directed infrastructure projects in a short time frame, assuming that these liberal democracies stay in place as they are. In Canada, infrastructure in the form of a national railway system was at once a source of the nation’s emergence and the cause of its first major scandal. The interesting thing about the building of the Canadian transcontinental railway is that, according to Pierre Berton (1970, 1971), the construction element of the railway was done quickly and, despite some deplorable treatment of immigrant workers, efficiently. The main issues were political—with regional politics dominating from the outset. In this case the “implementation” theory writ large relates to the characteristics of the political system in place, its distribution of authorities, and therefore its complicated authorizing environment. (See also Mark Moore’s (1995) Strategic Triangle, in which he describes the “value proposition”—akin to the theory of change, “operational and resourcing considerations,” and “authorizing environment.” In the second case of repayable contributions for innovation, the use of a theoretically appropriate condition (i.e., repayability), appealing to both those concerned with politics and those interested in economics, does not work in practical application because of both the behavioural effects such a clause triggers and the uncertainty it brings. In the case of one of the innovation programs I

003_53008_Montague4.indd 329 21-02-2019 10:59:02 AM 330 Montague

reviewed, agreements and payments were taking many months, causing concern and indeed hardship to all but the most patient and well-capitalized investors. The conclusions of this evaluation include the following:

In an era of Federal Government deficit reduction, TPC has been asked to cover a myriad of industrial policy objectives. The program was given a wide reaching man date to create jobs and foster innovation in three very different sectors. At the same time, TPC’s role in the innovation assistance process was limited by the focus on re quirements for investment payback, as well as being subjected to intense international scrutiny. These constraints have led to several logical inconsistencies in its set-up, as compared to previous and other existing contribution programs:

• The program focuses on repayment of investments, yet it must fund (increas ingly since the WTO decision of 1998) high risk technology innovation—often in emerging areas of technology. • TPC must operate in a consistent, transparent and “fiscally responsible” manner, leading to a lengthy multi-step assistance process, yet it is by definition mandated to fund projects of high technology risk and market uncertainty—areas which require speed and flexibility. • The program was essentially developed in many respects as a “son of DIPP.” The Defence Industries Productivity Program (DIPP) was designed to serve the needs of the mature, cold-war aerospace and defence industry of the 1970s and 1980s. The approach is likely not consistent with a program being asked to assist sectors (even that of the modern aerospace industry) facing a completely differ ent market situation.

In conclusion, TPC has, in theory and rhetoric, been established to serve several publicly stated industrial innovation goals and sectors, while subjecting itself to the constraints of economic development assistance in the modern era. In reality it would appear that the “one-size fits all” approach that has been taken up to now may not be up to the task. (Industry Canada, 2003, pp. 5–6)

Regrettably, while innovation programming has evolved away from a “plodding” risk-averse delivery culture, it has not, at this time, moved all that far. This perhaps speaks as much to the Canadian government’s weakness in systematically accu mulating evaluative evidence and knowledge as it does to a failure to distinguish the implementation factors important to program success. (The theme of knowl edge accumulation through better and more consistent theory-based evaluation is a theme expressed elsewhere in this special issue.) In the third case, a regulatory instrument—AMPs—is first situated in its theoretical place in terms of regulatory instruments (i.e., on a continuum between warnings and criminal prosecutions) and then examined for its fit with respect to the logical expectations for the theory. In essence, the theory fails in certain application areas (i.e., low-value cargo, difficult inspection conditions, multiple groups involved in processes, the lack of basic “will” to comply in the target and surrounding communities, and technical testing difficulties) and has certain im plementation considerations and constraints (i.e., motivations of inspecting staff,

003_53008_Montague4.indd 330 21-02-2019 10:59:02 AM Does Your Implementation Fit Your Theory of Change? 331

vaguely worded guidance in terms of legal or regulatory clauses, limited penalty levels versus cargo values, and the presence or absence of complementary deter rents or incentives, including a “sympathetic” quasi-judicial and judicial appeals system). When these and possibly some other conditions are negative, AMPs are doomed to fail. This was found for applications under the Health of Animals Act and regulations of the CFIA, but several of these conditions were also found to be true in other cases outside of food safety. The CFIA’s (2011) Evaluation of Ad ministrative Monetary Penalties summarized the factors guiding use in terms of evidence from the research and evidence from interviews as follows:

Scholars have noted that AMPs are appropriate when the following elements are present, and their observations are supported by the interviews conducted for this evaluation:

• a large volume of cases is likely to be processed annually (that is, many transac tions are being inspected); • the regulator had stronger sanctions but the monetary penalties could be used to moderate a harsher response; • speedy adjudication to the enforcement scheme is important; • specialized knowledge (for example, technical expertise) and agency expertise in the resolution of disputed issues was needed; • issues of law are rare; • consistency of outcome was important; and • there is a likelihood that an agency or group of agencies will establish an impar tial forum in which cases can be efficiently and fairly decided.

Interviewees made the following suggestions of key factors affecting AMPs success:

• The regulated party has a high level of commitment to the basic intent of the legislation, so there is a low level of willful non-compliance. For example, asso ciations and companies are willing to work with the CFIA to ensure compliance; • The regulations contain clear language defining violations; • There are controlled inspection conditions. For example, the inspection takes place in a regulated public facility vs. a remote privately run location; • Transactions are not complex; • Inspectors and investigators share an understanding of what constitutes a violation; • Inspectors and all other regulator[y] program staff share a commitment to the promotion of regulatory compliance; • There is a consistent interpretation of the legal responsibilities of all concerned parties, and of the burden of proof and of evidence; • Significant proportion of commercial transaction “value” represented by AMP (i.e., the cost of the AMP versus the value of the shipment); and, • Regulated parties believe that enforcement actions will be upheld.

In this case, sorting these observations after the fact into implementa tion and theory of change factors can be useful, because in general terms,

003_53008_Montague4.indd 331 21-02-2019 10:59:02 AM 332 Montague

implementation elements (penalty levels, regulation clarity, inspection and appeals process changes, communications and internal education investments) urge one to consider how to do things right. On the other hand, the broader fac tors related to the target areas and the basic fit of the deterrence theory suggest how to do the right thing (e.g., if we are dealing with chronic non-compliers who see the AMP as a cost of doing business and have a business model that essen tially relies on “borderline” practices, then it will likely take more than AMPs to be successful in bringing them into compliance). In the case of the latter— complementing AMPs with the publishing of the names of AMP recipients (naming and shaming) subsequently worked for at least one chronic offender, according to direct correspondence between a CFIA official and the author (see Pawson, 2006, for an elaboration of factors allowing the use of naming and shaming to work). So while each of the cases briefly examined here looks at different levels of policy, programming, and instrument use, has different levels of observable evidence, and appears to relate to slightly different definitions of implementing or action theory—if not both action/implementation and change theory—these cases also have some thing in common. All three cases suggest that sorting or synthesizing a policy, pro gram, or initiative by its change theory and its implementation/action theory can be beneficial in understanding why and how phenomena occur. Ultimately, the practice should be helpful in accumulating knowledge and evaluative evidence.

conclusions And implicAtions Canadian evaluators appear to have led a worldwide trend to develop what Brous selle and Buregeya (2018) have called a fifth generation of theory-based evalu ation. This generation combines theory-based approaches to determine logical consistency and likely impact (logical analysis, contribution analysis) with ex planatory features surrounding program mechanisms (realist evaluation). They see a coalescing of approaches using critical realism as a foundation to focus on “the explanatory power of contextual characteristics, implementation processes, and causal pathways to show, by identifying expected effects and impacts, how an intervention’s activities and outputs lead to outcomes” (p. 64). This paper sug gests that in order to continue to foster this trend, which has evolved somewhat unconsciously up to now, it is useful to more clearly codify implementation/ action theories and change theories, and furthermore to consider them together whenever one looks at a policy, program, or project intervention. Brouselle and Buregeya also suggest that we may be witnessing the evolution of a composite approach that follows the logic of a given intervention:

As the intervention unfolds, several implicit causal mechanisms result in the cumula tive success or failure of the entire intervention or some of its components. Theory based approaches to evaluation are used to shed light on these mechanisms that oper ate in open systems and are embedded in multiple social systems. (p. 164)

003_53008_Montague4.indd 332 21-02-2019 10:59:02 AM Does Your Implementation Fit Your Theory of Change? 333

This paper briefly models just such an approach and attempts to integrate theory- based approaches in order not only to make an attempt to show logical connec tions and contributions but also to help explain how and why certain results occur through specified contextual considerations applied at different parts of the results chain or impact pathway. Thus the whole may be greater than the sum of its parts in terms of explanation. There would seem to be a major opportunity in the future to continue to consist ently codify both implementation/action theories and designs and change theories. This accumulated knowledge and learning could focus on what works, to what extent, for whom, and why, under specific application conditions—including the implementa tion characteristics, the change theory characteristics, and the combination of the two. I have suggested a few practical lessons learned from such efforts. Imagine an open- access learning system that might collectively put forward such learning for review, challenge, embellishment, and refinement. Such a practice might finally embed evalu ative thinking into public management—perhaps in a kind of global Socratic forum where “learnings” such as those sprinkled throughout this paper can be discussed. At the very least, we as evaluators might systematically address Patton’s (2018) ninth principle of evaluative thinking: “Evaluative thinking looks at the connec tions between processes and outcomes, and that means distinguishing them and measuring both” (p. 23). I have proposed a small addendum to this principle— namely, that evaluative thinking should also systematically look at the fundamen tal “fit” of implementation processes with desired outcomes.

notes 1. This practice may have been a result of a lack of familiarity with the Chen model, but also the somewhat complicated depictions and language used in its communications such that many practitioners do not see the difference between the action or implemen tation theory and the essential change theory. 2. “Success to the successful” is a famous archetype in systems thinking. See, for example, Senge (1994); Kim (2018).

references Ayres, I., & Braithwaite, J. (1992). Responsive regulation transcending the deregulation de bate. New York, NY: Oxford University Press. Bemelmans-Videc, M., Rist, R. C., & Vedung, E. (2011). Carrots, sticks, and sermons: Policy instruments and their evaluation. New Brunswick, NJ: Transaction. Berton, P. (1970). The national dream: Building the impossible railway. Toronto, ON: Mc Clelland and Stewart. Berton, P. (1971). The last spike: The great railway 1881–1885. Toronto, ON: McClelland and Stewart. Board of Governors of the U.S. Federal Reserve System. (2016). Report on the economic well-being of U.S. households in 2015. Retrieved from https://www.federalreserve. gov/2015-report-economic-well-being-us-households-201605.pdf

003_53008_Montague4.indd 333 21-02-2019 10:59:02 AM 334 Montague

Brousselle, A., & Buregeya, J.-M. (2018). Theory-based evaluations: Framing the existence of a new theory in evaluation and the rise of the 5th generation. Evaluation, 24(2), 153–168. https://doi.org/10.1177/1356389018765487 Canadian Food Inspection Agency [CFIA]. (2011). Evaluation of administrative mon etary penalties. Retrieved from http://inspection.gc.ca/about-the-cfia/accountability/ other-ctivities/audits-reviews-and-evaluations/evaluation-of-amps/amps/ eng/1337024520304/1337025417391 Chen, H.-T. (2005). Practical program evaluation: Theory-driven evaluation and the inte grated evaluation perspective. Thousand Oaks, CA: Sage. Chen, H.-T., Pan, H.-L. W., Morosanu, L., & Turner, N. (2018). Using logic models and the action model/change model schema in planning the learning community program: A comparative case study. Canadian Journal of Program Evaluation, 33(1), 53–58. https://doi.org/10.3138/cjpe.42116 Chetty, R., Friedman, J. N., & Rockoff, J. E. (2014). Measuring the impacts of teachers II: Teacher value-added and student outcomes in adulthood. American Economic Review, 104(9), 2633–2679. https://doi.org/10.3386/w19424 Elliot, H. (2009). Did Cash for Clunkers work? Forbes. Retrieved from https://www. forbes.com/2009/09/02/cash-for-clunkers-toyota-ford-lifestyle-autos-auto-sales. html#43bc9c723f62 Environment Canada. (2011). Evaluation of National Vehicle Scrappage Program. Retrieved from http://www.ec.gc.ca/doc/ae-ve/2011-2012/1447/ec-com1447-en-s4.htm#s4-3 Funnell, S., & Rogers, P. J. (2011). Purposeful program theory: Effective use of theories of change and logic models. San Francisco, CA: Jossey-Bass. Industry Canada. (2003). Evaluation of the Technology Partnerships Canada Program. Re trieved from https://ito.ic.gc.ca/eic/site/ito-oti.nsf/eng/00191.html Kiladze, T. (2010, November 23). The great infrastructure boom that wasn’t. Globe and Mail. Retrieved from https://www.theglobeandmail.com/globe-investor/investment ideas/the-great-infrastructure-boom-that-wasnt/article1315521/ Kim, D. (2018, May). Success to the successful: Self-fulfilling prophecies. Retrieved from https://thesystemsthinker.com/success-to-the-successful-self-fulfilling-prophecies/ Kirkpatrick, D. L., & Kirkpatrick, J. D. (1994). Evaluating training programs: The four levels. San Francisco, CA: Berrett-Koehler. Mayne, J., & Stern, E. (2013). Impact evaluation of natural resource management research programs: A broader view. Retrieved from http://aciar.gov.au/publication/ias084 Montague, S., & Porteous, N. L. (2013). The case for including reach as a key element of program theory. Evaluation and Program Planning, 36(1), 177–183. https://doi. org/10.1016/j.evalprogplan.2012.03.005 Moore, M. (1995). Creating public value: Strategic management in government. Cambridge, MA: Harvard University Press. Patton, M. Q. (2018). A historical perspective on the evolution of evaluative thinking. In A. T. Vo & T. Archibald (Eds.), Evaluative thinking. New Directions for Evaluation, 158, 11–28. https://doi.org/10.1002/ev.20325 Pawson, R. (2006). Evidence-based policy: A realist perspective. London, England: Sage.

003_53008_Montague4.indd 334 21-02-2019 10:59:02 AM Does Your Implementation Fit Your Theory of Change? 335

Pawson, R., & Tilley, N. (1997). Realistic evaluation. Thousand Oaks, CA: Sage. Renger, R., Bartel, G., & Foltysova, J. (2013). The reciprocal relationship between imple mentation theory and program theory in assisting program design and decision- making. Canadian Journal of Program Evaluation, 28(1), 27–41. Rogers, P. J. (2014). Theory of change. Methodological Briefs: Impact Evaluation 2. Flor ence, Italy: UNICEF Office of Research. Retrieved from https://www.unicef-irc.org/ publications/747-theory-of-change-methodological-briefs-impact-evaluation-no-2. html Sager, F., & Andereggen, C. (2012). Dealing with complex causality in realist synthesis: The promise of qualitative comparative analysis. American Journal of Evaluation, 33(1), 60–78. https://doi.org/10.1177/1098214011411574 Senge, P. (1994). The fifth discipline: The art and practice of the learning organization. New York, NY: Random House. Treasury Board of Canada Secretariat [TBS]. (2010). Supporting effective evaluations: A guide to developing performance measurement strategies. Retrieved from https:// www.canada.ca/en/treasury-board-secretariat/services/audit-evaluation/centre excellence-evaluation/guide-developing-performance-measurement-strategies.html United Nations Development Programme. (2009). UNDP handbook on planning, moni toring and evaluating for development results. Retrieved from http://web.undp.org/ evaluation/handbook/documents/english/pme-handbook.pdf

Author informAtion Steve Montague is a partner in Performance Management Network Inc. and an adjunct professor at Carleton University.

003_53008_Montague4.indd 335 21-02-2019 10:59:03 AM Can’t See the Wood for the Logframe: Integrating Logframes and Theories of Change in Development Evaluation

Gordon Freer University of the Witwatersrand, Johannesburg Sebastian Lemire University of California, Los Angeles

Abstract: There are numerous ways in which to model the underlying theory of pro grams. In the context of international development evaluation, the most ubiquitous are likely “logframes” and to some extent “theories of change,” both of which may serve to guide program development and management, monitoring, and evalua tion. While logframes and theories of change are often developed in parallel, they are rarely fully integrated in their practical application. Drawing on lessons from a recent theory-based evaluation, this article argues that fully integrating the program theory of change within the program logframe provides for a stronger and more holistic understanding of program progress. Keywords : international development, logframe, market development programs, program management, theory-based evaluation, theory of change

Résumé : Il y a de nombreuses façons de modéliser la théorie d’intervention d’un programme. Dans le contexte de l’évaluation en développement international, on re trouve plus couramment les « cadres logiques» et, dans une certaine mesure, les « théo ries du changement ». Les deux peuvent servir à orienter l’évaluation, le suivi, la gestion et l’élaboration de programmes. Les cadres logiques et les théories du changement sont souvent élaborés en parallèle, et rarement complètement intégrés. À partir d’une évaluation théorique (theory-based) récente, le présent article montre qu’en intégrant pleinement la théorie du changement au cadre logique d’un programme, on obtient une représentation plus robuste et plus globale des progrès réalisés par le programme. Mots clés : développement international, cadre logique, programmes de développe ment des marchés, gestion des programmes, évaluation fondée sur la théorie, théorie du changement

Corresponding author: Gordon Freer, International Relations, School of Social Sciences, University of the Witwatersrand, Johannesburg, 2050, South Africa; [email protected]

© 2019 Canadian Journal of Program Evaluation / La Revue canadienne d’évaluation de programme 33.3 (Special Issue / Numéro special), 336–353 doi: 10.3138/cjpe.53007 Can’t See the Wood for the Logframe 337

Th e logframe has been ever present in development circles since its adoption and promotion by the U.S. Agency for International Development (USAID) in the early 1970s. Informed by a review of USAID’s evaluation system, the logframe was originally developed as a tool to help conceptualize a project and analyze the assumptions behind it (Rosenberg & Posner, 1979). Since then, the logical framework approach has undergone cosmetic changes and shifts in terminology; however, its primary purpose remains intact: to demonstrate how parts of a pro gram fit together, neatly and logically, and how a series of program activities will lead to a specific set of program objectives (however we chose to define these). Th e logframe approach has in many instances proven extremely valuable for project design, planning, implementation, management, monitoring, and evaluation and is now widely used by bilateral and multilateral development agencies as the de facto program-management tool (Bamberger, Rugh, & Mabry, 2012; Hummelb runner, 2010; Prinsen & Nijhof, 2015). The sustained dominance of logframes in international development is illustrated in the UK Department for International Development guidelines for funding applicants, which require that “all newly approved projects regardless of project value must also now contain a logframe” (DFID, 2011, p. 2). Th e logframe, when it is designed and used as intended, works well as a monitoring tool to assess program progress against predefined objectives. One central limitation, however, is that logframes often focus on short- and medium- term objectives, as opposed to detailing how these lead to long-term changes (Bamberger et al., 2012; Channell, 2005). In extension, another limitation is that logframes by design do not provide information on how or why program objec tives were reached (or not reached). As Eyben, Kidder, Rowlands, and Bronstein (2008) observe, the “linear cause-effect thinking” inherent to logframes fails to capture the complexity of the change processes underlying most development programs. Herein lies one of the significant assumptions of logframes that has not done evaluation any favours: it is assumed that the logic laid out in the logframe’s progressive steps holds true and that an achievement of the program objectives is proof that the program is working according to this logic. This assumption, while seemingly trivial, comes with great consequences, as it leaves the inner workings of how and in what way the program achieves the stated objectives undisclosed and unexamined. In marked contrast, a theory of change (ToC), also known as program theory, centers exactly on the inner workings of programs (Connell & Kubisch, 1998). Emerging from the “tradition of logic planning models,” such as the logical frame work approach, the purpose of the ToC is to make explicit how specifi c program activities lead to specific outputs, which in turn lead to a specified set of outcomes (Stein & Valters, 2012, p. 5). Moreover, ToCs—at least the better ones—oft en con sider the external environment of the program and then places the program and its activities within this context, determining how these activities might interact with influencing factors in the broader environment or to what extent compet ing programs might complement or in some other way influence the program

doi: 10.3138/cjpe.53007 CJPE 33.3, 336–353 © 2019 338 Freer and Lemire activities and outcomes (see Mayne, 2012, for a discussion on “embedded” theo ries of change). In providing a more detailed explication of the inner workings of programs, ToCs potentially remedy the limitations associated with logframes described above. For one, ToCs bring attention to the long-term impact of programs, specifying how short- and medium-term outcomes of the development pro gram are at least intended to bring about long-term changes (Prinsen & Nijhof, 2015 ). This long-term line-of-sight may potentially serve well to counter the all- too-common pressure among project managers to focus almost exclusively on monitoring and reporting on short-term program progress (Imas & Rist, 2009; Prinsen & Nijhof). Moreover, and in direct relation to logframes, “theory of change thinking,” by explicating how specific program activities lead to specifi c outcomes, may serve “to bridge the ‘missing middle’ that the log-frame hides” (Vogel, 2012, p. 19). As the preceding paragraphs illustrate, logframes and theories of change potentially complement and even enhance one another. Unfortunately, and while many development programs involve both a logframe and a theory of change as part of their performance-management framework, the integration of these potentially complementary tools is often challenging (Vogel, 2012). Accordingly, and as Prinsen and Nijhof (2015) conclude, “work remains to be done in order to find ways to optimise the combined use of the logframe and ToC in programming” (p. 244). Speaking directly to this call, this article makes the case for more purposeful integration of the design and use of logframes and ToCs. More specifi cally, we argue that integrating these tools in program planning and evaluation enhances both the quantitative measures that are often the ambit of the logframe and the “story of the program,” which, when it is told, often relies heavily on the ToC. We assert that by developing and using the two tools as complementary, from project inception to final evaluation, we will be able to record and relay a more holistic view of the program¾not only what it has achieved but also how this was accomplished. Toward this end, toward advancing further integration, we formulate fi ve steps for how this integration might work by way of a recent real-world case example. These steps are not intended as sure-fire recipes for success to be un critically adopted in other settings and contexts. Indeed, this type of mechanistic application cuts against the grain of what we aim for: more thoughtful and pur poseful integration of logframes and theories of change. The modest aim of the five steps is simply to motivate further interest and to support further work on this important topic. Th e article is structured as follows. In the first part, we consider the promises and perils of logframes and ToCs when designed and considered individually, followed by an examination of the underlying reasons that logframes and ToCs tend not to be fully integrated in their practical application. Informed by this examination, in the second part of the article, we then consider a real-world case

© 2019 CJPE 33.3, 336–353 doi: 10.3138/cjpe.53007 Can’t See the Wood for the Logframe 339 example of how logframes and ToCs might be integrated in the context of inter national development settings and outline five steps that can be applied to further this integration process.

THE PARALLEL PROMISES AND PERILS OF LOGFRAMES AND THEORIES OF CHANGE Logical frameworks Since the 1970s, logical frameworks have become commonplace in international development (Rosenberg & Posner, 1979). Reflecting their wide application, logi cal frameworks have been referred to by many different labels, most commonly “logframe” and “logframe matrix” (Bamberger et al., 2012). In its practical appli cation, a logframe is ideally developed during the program design and planning stages, with revisions made throughout the implementation of the program. Th e logframe matrix typically takes the form of a four-by-four table, with rows for program components (activities, output, outcome, and goals) and columns for measurement information (program summary, indicators, means of verifi cation, and risks/assumptions). A generic logframe matrix is presented in Table 1. Given the wide use of logframes, as well as their adaptation to preferences within diff er ent government and donor organizations, these categories may of course vary, but the matrix in Table 1 displays the generic structure. Th ere are many potential benefits of using logframes. If designed and imple mented well, the logframe matrix may support program planning, management, and monitoring. As a management approach, the logframe matrix may also serve as the overarching overview and work plan for the program, guiding program implementation and management (Imas & Rist, 2009). As a monitoring tool, the logframe may support logical framework analysis by establishing salient activities, outputs, and outcomes to be monitored; by connecting these with measurable in dicators; and by identifying plausible risks and assumptions ( Imas & Rist ). Th ese are but a few of the many benefits of logframes. Despite its many benefits, however, concerns and challenges about logframes have been raised on multiple occasions (Gasper, 1997, 2000a; Fujita, 2010). One common criticism is that the logframe approach promotes short-term program

Table 1: Generic logframe matrix

Summary Indicators Means of verification Risks/assumptions

Goals Objectives Outputs Activities

doi: 10.3138/cjpe.53007 CJPE 33.3, 336–353 © 2019 340 Freer and Lemire objectives, even if these are counter to long-term impact ( Channell, 2005; Perrin, 2003; Rogers, 2008). Any reader who has worked with logframes, either reporting against it as a program management tool or trying to assess a program though the logframe lens, will likely have thought (perhaps even aloud) about this po tential conflict. To be sure, and especially when used in isolation, a logframe can potentially result in perverse incentives and a focus on shorter-term achievements rather than on more distant outcomes, often obscuring contextual learning (Bam berger et al., 2012; Gasper, 2000b). As Channell observes,

Contractors fail when they do not meet various requirements for deliverables under their contracts with donor agencies. ... A contractor can be completely successful ... even if full performance has resulted in negligible benefits. (p. 17)

Moreover, and in part because the logframes are often structured around con tractually obligated program objectives, logframes tend to remain static and not updated, running the risk over time of becoming a restricting “lock frame” for program learning and development (Gasper, 1997, 2000b). Another distinct—yet related—limitation is that logframes by design do not provide information on how program objectives were reached (or not reached); the inner workings of how the program is intended to achieve the stated ob jectives are left undisclosed and unexamined (Eyben et al., 2008; Imas & Rist, 2009). Th e simplified “linear cause-effect thinking” inherent to logframes fails to capture the complexity of change processes underlying most development programs—what Gasper (2000b) pointedly refers to as a “lack-frame.” Th is is sue exists in part because a logframe will describe “how much” but not “how” program outcomes are achieved. In the same way, the dashboard of a car indi cates how fast the vehicle is travelling, how much fuel is left, how far you have travelled, and whether the engine is overheating; if you had a series of snapshots of the car dashboard, you would have a similar set of information—at this time, the car was travelling at this speed and had this amount of fuel. However, the dashboard would not (and could not) tell you how or why you travelled a spe cific route, which landmarks you passed, the reason for the direction you were travelling, or why you decided on the highway or a more scenic route. For that you need a different piece of equipment.

Theories of change Th eories of change speak directly to this information need. Initially introduced in the context of community-change initiatives (Connell & Kubisch, 1998), and now widely used both inside and outside international development evaluation, the theory of change approach has and continues to be defined and deployed in many different ways (see James, 2011; Stein & Valters, 2012; Vogel, 2012, for comprehensive reviews of ToC practices in international development settings). If we scan across this conceptual diversity, we can see that the term usually refers to both a process and a product , focusing on making explicit how program activities

© 2019 CJPE 33.3, 336–353 doi: 10.3138/cjpe.53007 Can’t See the Wood for the Logframe 341 and outputs are intended to bring about a specifi ed set of outcomes (Connell & Kubisch ; Vogel ). As Vogel observes,

Some people view it as a tool and methodology to map out the logical sequence of an initiative from inputs to outcomes. Other people see it as a deeper refl ective process and dialogue amongst colleagues and stakeholders, reflecting on the values, worldviews and philosophies of change that make more explicit people’s underlying assumptions of how and why change might happen as an outcome of the initiative. Theory of change is at its best when it combines both approaches. (p. 3)

In both process and product, the ToC specifies how specific program components (activities and outputs) lead to a specific set of desired outcomes (Connell & Kubisch ). The better ones also include contextual conditions—within which the program is embedded—that may depress or enhance the ability of the program to generate the desired outcomes ( Mayne, 2012). While there is no single format for a ToC, a generic version is for illustrative purposes provided in Figure 1. As the figure illustrates, a ToC depicts how specific program activities by way of specifi c program outputs connect with specific program outcomes. The connections, in the figure depicted as arrows, are in the form of assumptions, that is, hypothesized connections to be verified empirically ( Connell & Kubisch ; Mayne ). Considered collectively, these components comprise the underlying logic of the program. In addition, the figure also indicates that ToCs may include contextual conditions (influencing factors) that may serve to enhance or depress the program outcomes. If designed and implemented well, ToCs can be a potentially rich source of information regarding a program’s progress, intended and unintended outcomes, and causal relationships (Vogel, 2012; White, 2009). By detailing how specifi c

Activity 1 Outcome 1

Output 1

Activity 2 Outcome 2

Output 1 Outcome 3 Activity 3

C O N Activity 4 T E X T

Figure 1: Generic theory of change

doi: 10.3138/cjpe.53007 CJPE 33.3, 336–353 © 2019 342 Freer and Lemire activities and outputs lead to specific outcomes, the ToC supports a more fi ne- grained analysis and understanding of how and why the program brings about the desired outcomes (or even fails to do so). As Bamberger et al. (2012) observe, ToCs may serve well to establish the most central program activities and out comes, to identify the critical assumptions (“links”) on which the success of the program is contingent, and, by extension, to guide what subsequent evaluation resources should be allocated. These are all worthwhile benefits. Th e ToC, however, is not without limitations. In the context of international development, one commonly observed shortcoming is the initial development of vague and too generic theories of change (James, 2011; Vogel, 2012). Th ere might be several reasons for this issue, including the difficult task of striking the appropriate balance between the perceived need for simplicity and the real-world context of program complexity, an admittedly difficult task (James). Another reason may emerge from the challenging nature of making explicit and detailing the wide range of assumptions underlying complex programs (Vogel ). As experi enced by Prinsen and Nijhof (2015 ), it is not uncommon for developers of ToCs to miss steps, to omit key assumptions, or simply to fail to acknowledge salient risks or other contextual conditions. Regardless of the reason, the implication of developing an overly simplified theory of change is that it reduces its ability to serve as the guiding rod for the program planning and implementation, which in turn results in a lack of interest and commitment to the ToC among implementers and decision makers—the practical value of the theory of change is depressed. Another shortcoming emerges from the common conflation of thinking of the make up and purpose of a theory of change (Freer & Lemire, 2016). When the underlying design and purpose of a theory of change are not well understood, or perhaps even misunderstood, the subsequent use of the ToC—by evaluators to gather data or by program staff as a tool to inform decisions—is also likely to be misapplied. This leaves decision makers, who are often searching for data regard ing program progress and direction, without a nuanced, contextual understanding of program’s current position. A disconnect emerges between the information called for by the ToC and the information called for by the program staff . We also suggest that a third shortcoming might lie in the understanding and capacity of the program staff who are used to institutional reporting against quantifiable objectives, and where success is measured against the achievement or overachievement of these same objectives. Alternative measures of program implementation or outcomes, even if relevant and quantifiable, are in this context less likely to be promoted, pursued, and monitored. Programs are not assessed on their rate of failure or their innovativeness in overcoming challenges, in spite of donors’ and funders’ repeated verbal acknowledgement that they want to learn from failure and challenges (Channell, 2005). In our own experience of asking program staff to unpack why a course of action might not have been taken, or why program progress has fallen short of the anticipated objectives, there is an initial defensive attitude to the lack of progress or failure to achieve identifi ed objectives. Moreover, it takes time and effort to understand why certain options were not

© 2019 CJPE 33.3, 336–353 doi: 10.3138/cjpe.53007 Can’t See the Wood for the Logframe 343 chosen over others, and these decisions are often not recorded in detail. Regular revision of a ToC might assist in capturing these decisions; however, the develop ment and subsequent revisions of a ToC can be time-consuming, and over time, at least from a program resource perspective, can become a grudge investment. Th ese are but a few select challenges. Considered collectively, however, and in our experience, these challenges—lack of specificity, lack of clear purpose, and a too narrow focus on program objectives—often leave the full potential of theories of change more talked about than realized.

The limited interplay between logframes and theories of change Despite the seemingly obvious complementarity between logframes and theories of change, the potential benefits of their integration are rarely realized. While in the design phase programs are often required to develop both, rarely does a program call on one of the tools to support the other. As such, programs tend to forget about their potential mutuality: one as an internal focal guide and the other as a point of reference in a wider environment. The position we hold is that the underlying reasons for the limited integration of logframes and theories of change are rooted in conceptual conflation of the tools, the parallel ways in which the tools are developed and designed, and the differing purposes in their subsequent implementation. For one, the similarities and distinctions between logframes and ToCs are not always clear (James, 2011; Vogel, 2012). As Vogel documents in her review of ToC practice in international development, “People fi nd difficult to separate theory of change from the familiar logframe .... This is not surprising, as ... they come from the same family of approaches, programme theory” (p. 19). Th e intertwined intellectual roots, combined with the shared focus on describing the underlying logic of programs, lead to conceptual conflation of logframes and ToCs. Echoing this conceptual confusion, some practitioners come to view theories of change as “glorified logframes” (James , p. 10). Another reason for this is to be found in the parallel ways in which program theories and logframes are often developed. In some cases, individuals who are no longer part of the program team might have developed the individual tools, perhaps as part of securing program funding, while program managers and staff are subsequently asked to deliver on and report against them (Prinsen & Nijhof, 2015). In these situations, those implementing the program are oft en presented with both tools as completed items, outlining the ideal program implementation as envisioned in the early program-design stage. While there might be an evident logic, which can be read into each tool; nuances and detail, which might have formed part of the initial conceptualization, including connections between the two tools, can easily be lost, as the tools become parallel artefacts of historical thinking (Fujita, 2010). Moreover, and even if the development of logframes and theories of change are conceptually distinct and aligned in their development, the tools are in their subsequent practical applications perceived as serving different roles and of

doi: 10.3138/cjpe.53007 CJPE 33.3, 336–353 © 2019 344 Freer and Lemire reporting against different aspects of a program. As documented by James (2011), “many use theory of change to explore their organisation or programme at a broader level—to develop an overall vision and understanding of change—and then use logframes to defi ne specific projects” (p. 10). The tools come to serve the professed needs of specific audiences, who might use them for diff erent purposes. Even when drafted as complementary, the tools tend to develop in diff erent direc tions, with different stakeholders with different purposes, growing increasingly apart and rarely reunited (Prinsen & Nijhof, 2015). Even in rare cases where the tools have been developed in a symbiotic pro cess, this overlap and mutual reinforcement might not be easily understood or realized in the subsequent separation and use of each as stand-alone tools. Rather than trying to uncover any lost mysteries, the program-implementation team is caught up in the busyness of delivering the program and, when required, simply dust off the program’s theory of change to show that the process remains true to its original calling—a ritualistic exercise. Invariably the tool that is most oft en used for external communication of program progress becomes the logframe, which displays or contains aspects that are most easily explained. In an attempt to explain progress, the program resorts to the quantifiable and focuses on logframe results. Indeed, this is the “go-to” reporting point for donors of the program as well—how best to assess the success of a program than by determining how close it is to quantifiable targets. As a result, the program staff, the donors or funders, and others within the development community lose a rich layer of analysis of how the program is working in context as well as the implementation infl uences and influencers (Freer & Lemire, 2016; Mayne, 2012). By extension, and in the use of these tools in bridging data and decision mak ing within the program, the differing purposes are further attenuated. In many situations, as evaluators, we limit ourselves to gathering data from the logframe as it contains more quantifiable data, and the success or failure of the program is so often assessed on this basis. A theory of change—while intended as such—is rarely regarded as an interim reporting tool, with occasional final evaluations asking for a review of the ToC, assuming there was one at inception. Using the logframe as an interim reporting tool without regularly consulting or reviewing the ToC is comparable to thinking, “As long as I maintain this speed, I should reach my des tination,” disregarding the fact that you may be heading in the wrong direction or running low on fuel, finally thinking, “How did I end up here?” In taking this step, we assume that the initial logic underpinning the logframe continues to hold true, an all-too-oft en flawed assumption within the dynamic and fl uid implementation environments of international development. In practice, then, programs often end up with two stand-alone tools that are separate in their purpose, design, and application, whereby their mutually reinforcing integration is easily forgotten and lost. Whether because of a lack of understanding regarding the initial design or purpose of the tools, or because of a lack of appreciation of the tools’ potential complementarity in implementation, the program’s reporting and its evaluation are all the poorer for this situation,

© 2019 CJPE 33.3, 336–353 doi: 10.3138/cjpe.53007 Can’t See the Wood for the Logframe 345 being able to report on two, related program stories when a single, integrated, and more comprehensive tale could be told. Th is lack of integration comes with a number of consequences, of which two real-world examples will be provided. As one illustrative example, and as we expe rienced on a recent evaluation, the lack of integration may lead to fervent pursuit and progress toward quantifi able targets that might not refl ect the development ethos of the overall program. Those involved in a market development program that was being evaluated, committed to a specific development methodology, real ized that they were falling behind in their women-related targets. Th e logframe specified that the program reach a specific number of women within the stated timeframe. To their credit, the program management called for assistance and developed a gender strategy specific to their needs. However, it later emerged that while these activities contributed to the overall women-specifi c program targets, there was often weak adherence to the required development methodol ogy. As evaluators we raised a few concerns, especially around the methodological commitment of the program. How did this situation arise? For a few reasons: the original women target appeared to have little basis in the contextual reality of the program, and in pursuing quantifiable targets, the program paid less attention to the quality of the impact, as required by the methodology. A well-draft ed theory of change, speaking to the logframe targets, might have given voice to this concern in the program design phase, and definitely in an early program review, when the shortfall in the women-related target became obvious. Making use of the ToC as a reporting tool might also have provided more detail on the methodology of the women-specific activities, leading to questions and discussion around their rationale. These types of deliberations, grounded in the interplay between the logframe and the theory of change, could have served well to inform future iterations of the program, potentially making it more effective and effi cient. Another recent evaluation by Freer also serves well to illustrate the severe implications that may emerge from a lack of purposeful integration of logframes and theories of change. In this instance, the program was an innovative, iterative learning opportunity, funded by a range of donors, all of whom had agreed and signed off on its purpose and design. In its implementation, however, the intran sigence of the logframe and the relative impotency of the theory of change were revealed. In the evaluation process it was discovered that while the innovativeness of program partners was part of the selection process, a number of partners were specifi cally identified due to their potential to substantially contribute to program targets. The theory of change was regarded as peripheral to this point, stating broad objectives and general process steps and providing very little detail of these real challenges faced by the program. This was a concrete example of an innovative program “playing safe” and, having realized its program targets, three-quarters of the way though implementation, broadened its selection criteria and identifi ed partners that spoke to innovation, risk, and invention rather than to scale and nu merical, target-dictated achievement. While the program moved toward and was committed to innovation and learning to this point, the realized freedom from

doi: 10.3138/cjpe.53007 CJPE 33.3, 336–353 © 2019 346 Freer and Lemire logframe targets allowed program implementers to wholeheartedly pursue their envisaged ideal. A collaborative use of both tools in a mutually supportive manner might have provided the program implementers with a more flexible approach to selection of program partners or allowed them to report on qualitative indicators or on a range of quantitative metrics, in lieu of quantitative specifi cs. From our perspective, then, there are several noteworthy reasons why a more purposeful integration of logframes and theories of change should be pursued. Integrating these tools in program planning and evaluation enhances both the quantitative measures that are often the ambit of the logframe and the “story of the program,” which, when it is told, often relies heavily on the theory of change. Indeed, the value-added by integrating logframes and ToC more explicitly is what motivates this article. To understand what these benefits are and, perhaps more importantly, how these might be achieved, we now turn our attention to a recent theory-based evaluation, where program logframe and the theory of change were integrated.

MAKING THE CASE FOR A STRONGER INTEGRATION OF LOGFRAMES AND THEORIES OF CHANGE: LESSONS LEARNED AND IMPLICATIONS FOR PRACTICE Freer is currently acting as evaluator on a development program focusing on regional water infrastructure crossing a number of international borders. Th e infrastructure is under the control of several national and international bodies, all of whom need to cooperate to ensure effi cient operations. The program is working with this wide range of parties to ensure that the water infrastructure is mutually beneficial, well maintained, and efficiently operated, and to guide these bodies to use it to achieve developmental goals, such as contributing toward gender equal ity and poverty alleviation. The evaluation seeks to determine how and to what extent the program manages to achieve this rather ambitious developmental goal. As part of the initial design process, the evaluation team was tasked with re focusing the suggested logframe and designing a program theory of change. Th is was done in collaboration with the day-to-day program staff and managers. In this process, the tools were developed in a synergistic manner, where changes in one tool influenced changes and alterations in the other. This was done deliberately by trying to determine, as just one example, how the ToC might be able to verify information on the number of beneficiaries, a key objective in the logframe. Simi larly, the evaluation team tried to determine where the logframe might record data that spoke more directly to specific causal relationships, such as changes in opera tional or strategic focus, reflecting possible program influence. In discussions with the program donors, very few changes were approved to the logframe design, but this still allowed Freer to record this process within the ToC—that changes were suggested but rejected—and, as a result, the program has been obligated to follow a particular route to its outcomes. Freer is currently reviewing both the logframe and the ToC as part of the program’s first year of implementation.

The one-year review speaks directly to the need for a stronger integration between the logframe and the ToC for several reasons. As mentioned above, the water infrastructure program works with a range of institutions with different needs and varying capacities. While the logframe simply requires a quantifi able indicator as proof of progress, an initial draft of the ToC did not recognize the nuances of these institutions, suggesting that all institutions would be en gaged in the same manner. A year into the program, Freer recognized that the institutional differences require subtly different, distinct steps, and as a result, when in providing evidence against the quantifiable indicator, the evaluation will be able to demonstrate these. A revised draft of the ToC takes these dis tinctions into account. In a similar vein, these institutional differences might require a refining of the logframe targets. The logframe itself thus becomes a more nuanced management and reporting tool, reflecting implementation distinctions. In the fluid, less rigid world of programs in development contexts, the best a logframe can do is provide a snapshot of the program’s status at a certain point in time. If you have a series of logframes for the same program, you see a series of snapshots for this program. In a dynamic, shifting world filled with individu als and institutions driven by their own agendas and objectives, such as multiple partners on a water infrastructure program, a program’s progress cannot be ad equately captured when it is corralled by a program logic developed and debated in an office a continent away. To determine how a program has implemented its activities or processes and how these have led to changes in people’s lives, you don’t need a tool; you need a toolbox. In the present example, using the ToC and the logframe in an integrated supportive manner permits insights into program progress, challenges, and refinements, giving more granular detail of what works, what does not, and why. Moreover, the case also illustrates how the issue of perverse incentives is a challenge that pervades programs, which are bound to immoveable logframe targets. Programs allocate signifi cant effort and resources to achieving targets, sometimes with little thought to the rationale or story underlying program design or activity implementation. A complementary use of a logframe and a ToC may go some way to reducing the tendency to take advantage of perverse incentives. In the example of the water infrastructure program provided above, by using the ToC as a framework to determine how a target is reached, the program might de termine that in spite of high numbers of reach to institutions, the depth of impact on these institutions is not as effective as envisaged. A number of mid-course cor rections could be proposed: to reduce the quantity of organizations being reached but increase the opportunity for impact; or to recognize that the impact will be influenced by a range of factors beyond the control of the program and decide to reach as many organizations as possible, recognizing the limited impact; or to try to influence some factors currently beyond program control to increase impact on the already identified organizations. Each scenario would map a slightly diff erent path of the ToC and would require viewing the logframe targets through diff erent

doi: 10.3138/cjpe.53007 CJPE 33.3, 336–353 © 2019 348 Freer and Lemire lenses. When reviewed together, the harmonization of the two tools’ stories would present a more textured program history. If one deliberates on these nuances, the opportunities for program (and sector) learning are enhanced. A retroactive overview of program documenta tion would illustrate that the program opted to engage with a specifi c range of institutions for a variety of reasons, rather than to try and meet all of the potential organizations’ needs. How much more valuable would this be than a report stating that a program achieved a numerical target? A future iteration of the program, or another initiative in a similar field, might take this lesson on board and build institutional variance into its design, potentially making it more effective and effi cient. As the example illustrates, the integration of logframes and theories of change may serve well to more accurately track program implementation rationale, give more substance and credence to program targets, and promote a more textured and accurate program history. The challenges, then, for program designers, do nors, funders, implementers, and evaluators can be framed by the following questions:

• How to build the complementarity of the tools at each stage of the pro gram, from conceptualization through to design and implementation? • How to draw on both tools to supplement and complement the informa tion contained in the other?

Informed by the integration of logframes and ToCs in the preceding case, we pro pose five steps that can be taken to advance this type of integration. Th e purpose of these suggested actions, developed on the basis of a real-world example, is to provide food for thought and a possible scaffolding for integrating the program tools in other contexts, rather than to prescribe a blueprinted process.

Step one: Recognize the distinct purposes of each tool Th e first step toward a stronger integration is for various stakeholders to un derstand the purpose and functionality of both tools, as well as their respective strengths and weaknesses. Logframes are generally well understood in terms of purpose and reporting, but their weaknesses might be less explicit. We have already mentioned a few of these weaknesses above, including unidimensional measurement and an assumed logic that imposes itself through implementation. Th eories of change, in contrast, are less well understood but are common par lance in international development circles, with appropriate defi nitions, purposes, and uses regularly being debated. Aside from the weaknesses that can arise from this lack of clear parameters (something that the logframe, by its very structured nature, has managed to eliminate), we need to understand that a theory of change takes as its background the social, economic, political, and cultural characteristics of the program environment and tries to distil the most pertinent aspects that either will be influenced by or will influence the program.

Step two: Recognize that in spite of their diff erent purposes, the tools can be used in a complementary fashion Program implementers (and donors) need to move away from reliance on the logframe to measure success and perhaps instead take to heart the common ad age, “Not everything that counts can be counted, and not everything that can be counted counts.” The planned impact that the program has on the targeted population is important and is reflected in logframe reporting. The way in which this impact is implemented and its intended and unintended consequences are similarly important, but this process cannot be quantitatively measured and re flects a different facet of the program’s success. Programs report regularly on their progress. In many cases, changes to or implications of the planned implementation are explained, sometimes in great detail. Donors have often been participants in this ongoing conversation and agree with the recorded decision. However, rarely are these changes or implications mapped against the program theory of change, with alternatives (and their impli cations) to the planned route explained. If we intend developmental programs to be replicable or to learn from their failure, then recording these decisions should receive more attention. “How?” and “Why?” should be asked and answered regu larly and should have similar weight to the current refrain of “How many?” and “How much?”

Step three: Seek complementarity at the outcome level In the water infrastructure example, program outcomes request quantifi able proof of institutional initiatives. However, as mentioned, there is a range of insti tutional capacities with which the program is working, increasing the complexity of achieving these targets. The program might choose to work with the easier but less strategic organizations. Or it might work with more demanding and more strategic institutions. As a result, it might fail to achieve its logframe targets but lay solid groundwork for future institutional arrangements. Should the program “fail” for choosing a more difficult but more considered institutional partnership? Th ese options of either working with more diffi cult but strategically more important organizations or choosing the easier “low hanging fruit” (and the im plications of such a choice) need to be recorded. One of the places to do so is in aligning the logframe outcomes with the program theory of change, providing a map of alternative routes and explaining the choice to follow one over the other.

Step four: Revise the tools in tandem To gain any benefit from the complementarity of the tools, they should be brought into play at the same time, both for reporting and for revision. One way of ensur ing this is to regularly review the theory of change, at least as often as the logframe—annually for many programs. The tools need to be finalized in conjunction with one another, and changes to one tool should be checked for implications against the other. While one small change might not need a corresponding change

doi: 10.3138/cjpe.53007 CJPE 33.3, 336–353 © 2019 350 Freer and Lemire on the second, a series of small changes, incremental on their own, might have a profound impact on the second tool. During the review process, we need to ascertain whether the planned theory of change was followed, whether the program was influenced or was an infl uencer, and the implications of these for the program’s quantifiable targets, recording any deviations to the planned process. Similarly, a logframe review of the program’s progress should try to uncover reasons for achievement (or under- or overa chievement) and map these answers to the theory of change. Questions such as “At what point did we realize that ...?” and “Did this decision help (or hinder). . . .?” begin to weave the reviews together. In some cases, decisions reflecting questions like these may lie buried in program progress reports or meeting minutes, justifying a particular decision, but the surrounding context and reasons (and dissenting opinions) might not be recorded or reflected in the program theory of change. As a result, at the end of a program, when we ask, “How did we end up here?,” we have little institutional memory on which to draw, weakening our evaluation with statements starting with “it is possible” and “we reasonably suggest.”

Step five: Call for both tools to present evidence when making strategic decisions Th is, we think, will possibly be the most difficult step for both implementers and donors to adopt. But it might also be the most important. Quantifi able in dicators often drive decisions regarding success and failure. We do not deny the importance of these indicators but rather point out that in some cases, alternative quantifiable indicators can be presented, if the opportunity arises. Programs not achieving set targets have been closed when, in some cases, alternative options of quantifiable targets could have been considered. Instead of building on these “alternative successes,” closed programs negate the sunk costs of infrastructure and undermine the irrecoverable costs of established relationships. Development programs regularly need to make strategic decisions about their future—to expand, withdraw, close out, extend a timeframe, and so on. Some of these decisions require evidence of progress, provided by logframe reporting. What is often not called upon to lend context and colour to the stark numbers is progress (or the lack thereof) against the theory of change. At worst, calling for and reviewing this evidence in parallel to logframe evidence would provide no added value. At best, an extra source of substantiated explanatory evidence would add value to the process, give alternative explanations, and possibly provide alter native options for the decision makers. Given the number of international development programs that are currently underway, we should regularly ask a range of questions of them. During program implementation we often ask, “How fast did we get here?” and “How much did it cost?” Only on completion do we think about “Was that the best route?” and “Is this our planned destination?” We suggest that we might want to ask ourselves these (and other) questions more often during implementation, as well as on

© 2019 CJPE 33.3, 336–353 doi: 10.3138/cjpe.53007 Can’t See the Wood for the Logframe 351 program completion, and that one way of doing so might be to use program management and evaluation instruments as complementary tools providing re lated evidence to better serve our ultimate developmental goal. Toward this end, stakeholders, from donors to designers to implementers, need to understand the purpose of both tools and to call on their use, in order, in the case of the imple menters, to report against progress, and in the case of the donors, to hold the implementers to account.

CONCLUDING THOUGHTS Development programs take place in complex environments, and developmen tal goals are growing in complexity and interconnectedness. Yet the tools we use to plan, monitor, report on, and evaluate these same programs are stand-alone and static, some little evolved from their initial debut almost 50 years ago. As planners, as donors, as implementers, and as evaluators, in our commitment to development we should recognize the limitations that these tools impose and seek to overcome them. One possibility is to view the tools diff erently. Rather than seeing them as authorities to which the program must prostrate itself, we should view them as collaborative and supportive tools that the program can utilize in parallel in order to more fully document and explain its implementa tion. Used collaboratively, a program logframe and its theory of change can track not only the progress and speed at which a program is travelling but also its direction and whether it has encountered a more scenic route. Th e fi ve steps we outline above suggest one method of establishing this collaborative working relationship.

REFERENCES Bamberger, M., Rugh, J., & Mabry, L. S. (2012). Realworld evaluation: Working under budget, time, data, and political constraints . Thousand Oaks, CA: Sage. Channell, W. (2005). Lessons not learned: Problems with Western aid for law reform in post- communist countries. Carnegie Papers: Rule of Law Series, 57. Retrieved from https:// carnegieendowment.org/files/CP57.Channell.FINAL.pdf Connell, J. P., & Kubisch, A. C. (1998). Applying a theory of change approach. In K. Ful bright Anderson, A. C. Kubisch, & J. P. Connell (Eds.), New approaches to evaluating community initiatives (Volume 2): Theory, measurement, and analysis (pp. 15–45). Washington, DC: The Aspen Institute. DFID. (2011). Guidance on using the revised Logical Framework. DFID Practice Paper. London, England: Department for International Development (DFID). Retrieved from https://www.gov.uk/government/publications/dfi d-how-to-note-guidance-on using-the-revised-logical-framework Eyben, R., Kidder, T., Rowlands, J., & Bronstein, A. (2008). Thinking about change for development practice: A case study from Oxfam GB. Development in Practice, 18(2), 201–212. https://doi.org/10.1080/09614520801898996

Freer, G., & Lemire, S. (2016, April). “The keystone node approach: Conducting theory based evaluations of complex programmes.” United Kingdom Evaluation Society, London, England. Fujita, N. (Ed.) (2010). Beyond logframe: Using systems concepts in evaluation. Tokyo, Japan: Foundation for Advanced Studies on International Development. Retrieved from https://www.fasid.or.jp/_files/publication/oda_21/h21-3.pdf Gasper, D. (1997). Logical frameworks: A critical assessment managerial theory. Pluralistic Practice Working Paper 264. The Hague, Netherlands: Institute for Social Studies. Gasper, D. (2000a). Evaluating the “logical framework approach” towards learning-oriented development evaluation. Public Administration Development, 20(1), 17–28. https://doi. org/10.1002/1099-162X(200002)20:1<17::AID-PAD89>3.0.CO;2-5 Gasper, D. (2000b). Logical frameworks: Problems and potentials. Institute of Social Studies. Retrieved from http://www.academia.edu/6665953/Logical_Frameworks_Problems_ and_Potentials Hummelbrunner, R. (2010). Beyond logframe: Critique, variations, and alternatives. In N. Fujita (Ed.), Beyond logframe: Using systems concepts in evaluation (pp. 1–33). Tokyo, Japan: Foundation for Advanced Studies on International Development. Retrieved from https://www.fasid.or.jp/_fi les/publication/oda_21/h21-3.pdf Imas, L. G. M., & Rist, R. C. (2009). The road to results: Designing and conducting eff ective development evaluations. Washington, DC: World Bank. James, C. (2011). Theory of change review: A report commissioned by Comic Relief . Comic Relief. Retrieved from http://www.actknowledge.org/resources/documents/James_ ToC.pdf Mayne, J. (2012). Contribution analysis: Coming of age? Evaluation , 18 (3), 270–280. https://doi.org/10.1177/1356389012451663 Perrin, B. (2003). Implementing the vision: Addressing the challenges to results-focused man agement and budgeting. Paris, France: OECD. Retrieved from http://www.oecd.org/ dataoecd/4/10/2497163.pdf Prinsen, G., & Nijhof, S.(2015) .Between logframes and theory of change: Reviewing de bates and a practical experience. Development in Practice, 25(2),234–246. https://doi. org/10.1080/09614524.2015.1003532 Rogers, P. (2008). Using programme theory to evaluate complicated and complex aspects of interventions. Evaluation, 14 (1), 29–48. https://doi.org/10.1177/1356389007084674 Rosenberg, L. J., & Posner, L. D. (1979). The logical framework: A manager’s guide to a scientific approach to design and evaluation . USAID. Stein, D., & Valters, C. (2012). Understanding ‘Theory of Change’ in international develop ment: A review of existing knowledge . The Asia Foundation and the Justice and Security Research Programme. Retrieved from http://www.theoryofchange.org/wp-content/ uploads/toco_library/pdf/UNDERSTANDINGTHEORYOFChangeSteinValtersPN. pdf Vogel, I. (2012). Review of the use of ‘Theory of Change’ in international development. DFID. Retrieved from https://assets.publishing.service.gov.uk/media/ 57a08a5ded915d3cfd00071a/DFID_ToC_Review_VogelV7.pdf

White, H. (2009). Theory-based impact evaluation: Principles and practice. International Ini tiative for Impact Evaluation. Working Paper 3. Retrieved from http://www.3ieimpact. org/media/filer_public/2012/05/07/Working_Paper_3.pdf

A UTHOR INFORMATION Gordon Freer is a lecturer in international relations at the University of the Witwatersrand, Johannesburg, South Africa. Sebastian Lemire is a postdoctoral scholar in the Social Research Methodology Division in the Graduate School of Education and Information Studies, University of California, Los Angeles.

Jane Whynot, Catherine Mavriplis University of Ottawa Annemieke Farenhorst University of Manitoba Eve Langelier University of Sherbrooke Tamara Franz-Odendaal Mount Saint Vincent University Lesley Shannon Simon Fraser University

Abstract: Gender equality has made its way to the forefront of discussions across various sectors in the Canadian context. Yet the intentional inclusion of gender and other intersectional identity dimensions is just beginning to permeate the realities of performance measurement and evaluation practitioners, particularly those using program theory. There is a vast body of knowledge regarding the measurement of women’s empowerment, gradually declining availability of resources targeting the inclusion of gender in theory, and even less guidance on integrating gender in theory in the context of gendered programming. Similarly, coordinated efforts from multiple sectors have resulted in an abundance of theory regarding girls and women’s repre sentation, recruitment, retention, and promotion within STEM (Science, Technol ogy, Engineering, and Math) but less guidance on the measurement and evaluation in these areas. This article shares recent efforts to bridge the divide using theory knitting to develop a performance measurement framework addressing the decreas ing representation of girls and women across the STEM “leaky pipeline” using the COM-B theory of change model. Keywords: engineering, gender, gender equality, integrating gender in theory, knit ting, science, STEM

Résumé : La question du genre est dorénavant un sujet central dans plusieurs sect eurs au Canada. Pourtant, la considération intentionnelle du genre ainsi que d’autres

Corresponding Author: Jane Whynot, Feminist and Gender Studies, University of Ottawa, 120 University, Social Sciences Building, Room 11002, Ottawa, ON, K1N 6N5; [email protected]

005_53011_Whynot_Mavriplis et al4.indd 354 21-02-2019 11:03:01 AM Knitting Theory in STEM Performance Stories 355

dimensions intersectionnelles de l’identité n’est qu’à ses début chez les praticiens de l’évaluation pour ce qui concerne la mesure de la performance , particulièrement chez les adeptes de l’évaluation axée sur la théorie (theory-based). Il existe un vaste corpus de connaissances au sujet de la mesure de l’émancipation des femmes. Il y a moins de ressources pour l’inclusion du genre dans les théories d’intervention, et encore moins d’information sur l’intégration du genre dans les théories dans le con texte de la planification visant à tenir compte de la dimension du genre. De même, les efforts coordonnés de divers secteurs ont mené à une abondance de théories liées à la représentation, au recrutement, à la rétention et à la promotion des filles et des femmes au sein des STIM (sciences, technologies, ingénierie et mathématiques), mais il y a peu de données sur la mesure et l’évaluation dans ces domaines. Le présent article décrit les efforts récents entrepris pour pallier cette lacune en combinant des théories pour développer un cadre de mesure de la performance prenant en compte la représentation décroissante des filles et des femmes dû au phénomène de « tuyau percé » (leaky pipeline) en STIM en utilisant le modèle de théorie du changement COM-B. Mots clés : ingénierie, genre, égalité homme-femme, intégration du genre dans une théorie, combinaison, science, STIM

Context and program baCkground Since the election and assumption of power of the Canadian Liberal Party’s leader Justin Trudeau in 2015, gender equality has been reimagined as a federal govern ment policy priority in a significant way. This extends across all federal government sectors and applies to all organizations and their expenditures. It is reflected in dis cussions emphasizing foundational concepts related to gender, diversity, and inclu sion at both the policy and program levels. Furthermore, gender equality and one of its measurement mechanisms, gender-based analysis, have been mandatorily embedded in the policy cycle in innovative ways that include federal government budgets), Treasury Board Submissions, and Memoranda to Cabinet (TBS, 2016, 2017). In addition, federal government efforts are underway to legislate gender- based analysis in the near future (SWC, personal communication, 2018)), which means that these once-foreign concepts will become part of an institutionalized performance dialogue for federal government programs, those in receipt of their funding, and those involved in measuring and reporting on performance. It is with UNESCO’s words of advice—“the way in which data related to STEM are currently predominantly collected renders women and their concerns, needs, and responsibilities relatively invisible” (UNESCO, n.d.)—that this effort begins. Without specifically articulating and making women and girls visible in performance products, performance practitioners can be guilty of being gender- blind in their inattention to gender issues, specifically those related to program theory, and the gendered nature of accompanying assumptions (Hivos, 2014). By not attending to these aspects affecting performance, the disregard serves to perpetuate experienced individual and systemic barriers; the Government of

005_53011_Whynot_Mavriplis et al4.indd 355 21-02-2019 11:03:01 AM 356 Whynot et al.

Canada has committed to addressing these barriers with its renewed commitment to gender equality and gender-based analysis plus (GBA+) through its lead de partment on gender equality, Status of Women Canada (SWC). For the first time since the government’s first evaluation policy circular in 1977, gender equality has been specifically articulated as federal government priority within the results domain (Whynot, 2015). While slower to follow, this is starting to be reflected in associated performance guidance documents in the federal government, in cluding the Interim Guidance on the Policy on Results (TBS, 2017). As a funding organization, the Natural Sciences and Engineering Council of Canada (NSERC) deserves recognition for its early leadership on gendered reporting; in 2010 it un dertook its first study regarding Women in Science and Engineering. Additionally, that contributed to the implementation of the multi-year Gender Equity Action Plan in 2016 (NSERC, 2016), and the tabling of its Equity, Diversity and Inclusion Framework in 2017. Cumulatively, these organizational commitments provide a strong basis for making women and girls in STEM visible. The NSERC Chairs for Women in Science and Engineering (CWSE) Pro gram is one of these initiatives. The CWSE Program was launched in 1989 with one position in engineering, and later expanded to five Chairs, to include both science and engineering and with the recognition that the scope of the challenge was too significant for one individual to address alone. Individual Chairs from the Atlantic, Quebec, Ontario, Prairies, and British Columbia/Yukon regions have been delivering unique regional programs targeting various aspects of what has historically been referred to as the “leaky pipeline” of girls and women in STEM. The original thinking behind the CWSE Program’s creation was to address the underrepresentation of women and to provide successful, accomplished, and recognized mentors (NSERC, personal communication, 2017). This thinking progressed to include addressing barriers encountered by women in STEM. As a critical dimension in this endeavour, it is necessary to make the distinction between gender equity and gender equality. Simply put, gender equality in STEM means equal representation, whereas gender equity recognizes that women, men, and gender-diverse individuals have different needs that require different inter vention supports. The CWSE Program’s goal is to “increase the participation of women in science and engineering, and to provide role models for women active in, and considering, careers in these fields”(CWSE, 2012). Program objectives emphasize (1) the development, implementation, and communication of strate gies to raise the level of participation of women in science and engineering as students and professionals in the field, (2) the provision of female role models who are accomplished, successful, and recognized researchers in science and en gineering, and (3) the development and implementation of a communication and networking strategy to ensure regional and national impact on opportunities for women in science and engineering (NSERC, 2017). Chair activities are balanced amongst science promotion, research into factors and institutional mechanisms that influence the participation rates of women in science and engineering, public advocacy, role-modelling, and more. The balance is highly dependent upon the

005_53011_Whynot_Mavriplis et al4.indd 356 21-02-2019 11:03:01 AM Knitting Theory in STEM Performance Stories 357

research interests of the individual Chairs, the regional/national contextual fac tors, and the needs within their regional academic institutions.

Linking theories of and about the roLe of women in sCienCe and engineering performanCe stories This effort begins by engaging with the various theories surrounding the roles of women in and for science as a precursor to the development of a program theory that will comprise the backbone of a performance measurement framework. Conceptually, the “leaky pipeline” notion can be traced back to the 1970s U.S. education sector, in which STEM fields were envisioned to contribute to both development and workforce diversity (Brown, Brown, Reardon, & Merrill, 2011). Concerns arose when it appeared that an insufficient number of individuals would be available to fulfill future STEM jobs, careers, economic, and educational competitiveness projections, as women’s presence decreased as career stages ad vanced. This concern was not isolated to the United States, and Canada found itself sharing these same apprehensions that continue to the present day (Carey, 2014; CCA, 2012, 2014, 2015; Krug, 2012; Mishagina, 2012; PEA, 2012; Plesca & Summerfield, 2014). Early related research found evidence of both vertical and horizontal segrega tion experienced by women in STEM education and careers (see, e.g., Schiebin ger, 1999). The solution regarding girls’ and women’s underrepresentation and retention in STEM fields is not a simple one. Nobel Peace Prize winner Carol Greider summarizes: “[o]nce women have entered STEM, at every subsequent stage of their career, they run a gauntlet of subtle practical, psychological, and social holes in the way of their promotions, appointments to boards, and other indicators of seniority. While slapping patches on the pipe may help stop some of the leaks, and help women get ahead, it is often a simplistic fix because the root of the problem isn’t just practical” (Future Tense, 2014). A review of the literature found that girls and women in STEM study and careers have been significantly researched both in Canada and internationally. Broadly, the research includes the identification of context regarding how girls and women are situated in these areas of study/work (Barbercheck, 2001; Blickenstaff, 2005; Harding, 1991; Hill, 2010; Huhman, 2012; Keller, 1985; NSERC, 2010; Polacheck,1979); structural, social, and economic barriers encountered in these environments (Bebbington, 2002; Chang, 2014; Lane, Goh, & Driver-Linn, 2012; OECD, 2015; Polacheck, 1987; Settles, 2014; Settles, Cortina, Buchanan, & Miner, 2013; Storrie, 2012); and potential policy and strategies supporting solutions (Battison, 2015; Dasgupta & Stout, 2014; Müller, Castano, González, & Palmén, 2011; Simard & Gammal, 2012). Each of these areas could independently create years of reading. Crasnow, Wylie, Bauchspies, and Potter (2016) summarize that “the presence of women in the sciences, feminist critiques and feminist theories have contributed to changes in modern science as well as the studies of science.” Cumulatively, this research points out that the analogy of a linear leaky pipeline is not perfect and recognizes

005_53011_Whynot_Mavriplis et al4.indd 357 21-02-2019 11:03:01 AM 358 Whynot et al.

that a more systems-focused, non-linear approach incorporating the inclusion of dynamic social norms and values as key facets of change, and its measurement, are necessary. Cumulatively, the research identifies and situates these efforts across various settings, such as various orders of education, life in the academy, and other workplace settings. These explorations serve to highlight the potential role of theory at both the micro- and meso-levels, which are important given the non linearity of girls’ and women’s occupation of various spaces, at various junctures, across the leaky pipeline. While relevant to STEM, this may also be important to other areas in which gender equity struggles have been experienced. The intent of this article is to build on these contributions, made by diverse theorists to bridge feminist, science, and evaluation contributions. Given the increasing worldwide emphasis on gender equality and girls’ and women’s participation in STEM, it is surprising how little related performance measurement and evaluation information is available globally to support the evaluation of strategies and actions used to address the leaky pipeline. Synthesis efforts directed at examining the impacts of regional and national award schemes in North America and Europe were undertaken by GENDER-NET (2015). This policy-focused effort funded by the European Commission under the Science in Society initiative found that only two of the eight regional and national awards schemes had completed robust evaluations. These two programs were Athena SWAN and Project JUNO. Recent theory-based evaluation efforts of Athena SWAN echoed these sentiments, noting that “empirical research on this process, and its impact is rare” (Ovseiko, Chapple, Edmunds, & Ziebland, 2017). Measure ment and reporting progress has been hampered by inconsistent measurement indicators across jurisdiction and stakeholders; UNESCO (n.d.) has identified that “[a]s a consequence, the lack of data and indicators, as well as of the availability of analytical studies, can obstruct the design, monitoring, and evaluation of policies aimed at successfully tackling gender inequality in STEM.” Their STEM and Gen der Advancement (SAGA) initiative is intended to address these shortcomings in standardizing measurement dimensions; however, final study results are not anticipated until after the publication date of this article. The first working paper issued as part of this initiative proposes policy-level, standardized, predominantly quantitative indicators for use at various stages of the leaky pipeline. Echoing this demand for performance-measurement and evaluation information capturing both individual and structural changes, the National Science Foundation’s (NSF) ADVANCE program included a longitudinal evaluation stream in its recent call for funding. No applications were received in response, however, nor does ADVANCE have an evaluation strategy in place at the program level (J. Dearo, personal com munication, 2018), although it is on the agenda for future discussion. While the issues regarding the measurement of empowerment of girls and women in STEM are presented briefly here, in this article we take one step back ward to reflect on the integration of elements of gender, and on what gender entails as a social construct in program theory. Program theory is not a clear-cut issue (Leeuw & Donaldson, 2015; Astbury & Leeuw, 2010), although efforts promoting

005_53011_Whynot_Mavriplis et al4.indd 358 21-02-2019 11:03:01 AM Knitting Theory in STEM Performance Stories 359

its use in the context of complexity have gained traction in the last decade, some thing that certainly applies in gender equality discussions as well. Despite caution ary advice on theory knitting by Leeuw and Donaldson that “largely normative evaluation theories with explanatory theories may be difficult or impossible” (p. 474), efforts related to this initiative addressing women’s empowerment of women in and of science and evaluation are well aligned. This is partly attributable to similarities in orientation, which are described in further detail below. Women’s empowerment theories of change materialized from the development context, in large part attributable to evolving efforts of feminist theorists including Boserup (1970), Moser (1989), and Overholt, Anderson, Cloud, and Austin (1984). It is here that measurement and reporting dimensions stemming from these various models began to emerge (Podems, 2010), with the appearance of gender roles and relationships (including power and influence) as key analytical variables. As these theories have matured and related understandings of key dimensions become more sophisticated, so too have measurement and reporting had to evolve to keep pace. Despite these advancements, measuring women’s empowerment in the development context has been recently described as “measuring the immeasur able” (Kloosterman, Benning, & Fyles, 2012), “a distorted metric” (Anderson & Langford, 2012), and the “art of the impossible” (Langford, 2012). The argument could be made, however, that program theory as conceptual ized in the context of this special issue remains a relatively unexplored domain. Only recently have program theory related policy briefs and guidance materi als emphasizing the inclusion of gender as a unique concept begun to emerge (CCAFS, 2015; Hivos, 2014) to address measurement shortcomings. These mate rials identify that numerous reasons are driving the integration between gender and program theory, beginning with an increasing reliance on theories of change to guide monitoring, evaluation, and learning efforts; a need for gender trans formative results to address power imbalances; and, perhaps most significantly, because theories of change have the potential ability to circumvent some of the challenges presented by traditional gender mainstreaming approaches. Similar applications of theory knitting may be beneficial to other initiatives grappling with experienced inequalities.

the need for a performanCe measurement strategy In September 2006, the five regional CWSE Chairs were formally linked through the creation of a National Network Grant (NSERC, 2017) which facilitates inter action among the five Chairs in order to (a) increase the effectiveness of the five regional programs through shared information and resources; (b) enhance com munication among the regional Chairs through regular face-to-face meetings; (c) increase the visibility and impact of activities at a national level; and (d) under take research activities that support the common objectives of the CWSE Program. Towards these goals, the National Network undertakes a range of collabora tive research, communication, and networking activities. Since 2011, the regional

005_53011_Whynot_Mavriplis et al4.indd 359 21-02-2019 11:03:01 AM 360 Whynot et al.

Chairs have annually reported to funders on twelve indicators (predominantly focusing on activities, outputs, and reach. What differentiates each of the re gional Chairs’ programs, in addition to individual research and expertise, are the students, the industry, partner, and program beneficiary stakeholders, the jurisdiction (including rural/urban differences), the target audiences, the budg ets, and the delivery mechanisms that are specific to what juncture of the leaky STEM pipeline the Chairs target. Program-recipient direct-delivery mechanisms include camps, information/sensitization and capacity-building training sessions and workshops, mentoring activities, panels/talks, and academic publications. Figure 1 graphically depicts the geographic dispersion of activities, including direct interventions, catalysts, and representations undertaken by the five Chairs. However, regional Chairs are also involved in a wider, but no less time-consuming array of indirect delivery mechanisms, including behind-the-scenes influencing activities that extend from institutional hiring and diversity committees, industry- and sector-specific gender-equality initiatives, professional association initiatives, and the array of initiatives that these stakeholders promote. Additionally, regional

figure 1: The CWSE program Chair activities

005_53011_Whynot_Mavriplis et al4.indd 360 21-02-2019 11:03:02 AM Knitting Theory in STEM Performance Stories 361

Chairs individually submit progress reports every 24 months based on elements highlighted in their individual action plans; these include activity reporting and monitoring mechanisms, as well as assessments of the impact and effectiveness of regional activities. To synthesize, cumulatively and collectively, the diverse and varied chair activities address various, and sometimes simultaneous, components of the leaky pipeline. Specific goals related to the development of a program performance measure ment strategy included analyzing common goals and elements of individual Chair programs to assist in reporting on the impact and effectiveness of each Chair in a coordinated manner (CWSE, 2017). Prior to the implementation of the Policy on Results (TBS, 2016), guidance materials for federal government programs on the development of performance measurement strategies were tabled identify ing that strategy components should include a program profile, a logic model, a performance measurement strategy, and an evaluation strategy (TBS, 2010). The CWSE Program itself has never been formally evaluated as part of its parent organization’s portfolio, as it was seen as a lower-risk initiative with a low material risk. Despite the organization implementing an expansion of the CWSE Program in its 2016–17 tabling of its Report on Plans and Priorities (NSERC, 2016), an evaluation plan was subsequently not included in the performance measurement strategy components, as this is generally undertaken by federal departments with input from the program. In this article, we highlight the development of one dimension of the performance measurement framework, the theory of change, which has been substituted for the logic model. Accepted definitions of a logic model identify “the depiction of the causal or logical relationships between activi ties, inputs, outputs, and the outcomes of a given policy, program or initiative” (TBS, 2015). Logic models and theories of change are interrelated, where the latter is expected to unpack how and why an intervention is expected to achieve the anticipated result, rather than solely providing a simplistic description (TBS, 2012). The complexity of addressing the entire leaky STEM pipeline by the vari ous regional Chair programs requires performance tools beyond a logic model’s simplistic description of anticipated results, which led to the adoption of a theory of change approach to frame current and future performance discussions.

approaCh and theoretiCaL framework seLeCtion Measuring changes in social norms and dynamics has long captured the attention of those involved in women’s empowerment. Related methodological advancements accompanying the measurement of women’s empowerment, however, have not yet translated into the development of program theory. One of the key dimensions of program theories is their situation in, and identification of, a specific context. When this is done for girls and women of STEM, not only is the visibility enhanced but the social norms and dynamics that may impede gender equality (CCAFS, 2015) are also identified. Evolving understandings of gender imply related understandings of structural and relational factors (Hankivsky et al., 2014), which are the touchstones

005_53011_Whynot_Mavriplis et al4.indd 361 21-02-2019 11:03:02 AM 362 Whynot et al.

of the current federal government’s approach to gender equality and the application of its GBA+ tool. Few federal government organizations have established related competencies; others are initiating plans for the growth and development of this competency. Program-specific explorations into these structural and relational fac tors include social-identity threats in professional naturalistic environments outside academia (Hall, Schmader, & Croft, 2015); the benefits of organizational and work force diversity (Croft & Pelletier, 2012); hiring practices and career progression, including tenure and promotion (Smit Quosai, Davidson, Ghazzali, Moloney, & Vassileva, 2009), graduate study and career commitment (Darisi, Davidson, Kora bik, & Desmarais, 2010); career choices and influences Franz-Odendaal,( Blotnicky, French & Joy, 2016; Blotnicky, Franz-Odendaal, French & Joy, 2018); representation in STEM study fields (Perreault, Franz-Odendaal, Langelier, Farenhorst, Mavriplis, & Shannon, 2018), and multiple infographic/facts sheets on related topics such as unconscious bias, mentoring, stereotype threats, microaggressions, gendered com munications, and supporting diversity in the workplace (Parker, Pelletier, & Croft, 2015). Needless to say, consideration of the various social norms and dynamics adds another layer of complexity in the design of a program theory and related perfor mance measurement strategy. This holds true especially given that different socials norms and dynamics are in play for each of the regional Chair programs. Embedding the knowledge, experience, and expertise of girls and women in STEM was a key facet of program theory construction, hence the significant investment in the literature review. Bringing theories from women in, and of science together with evaluation resulted in theory knitting, in which integrative strengths were emphasized. In theory knitting, “the best aspects of a set of given theories with one’s own ideas regarding the domain under investigation” are employed (Kalmar & Sternberg, 2008). Doing this effectively removed the debate regarding which epistemological orientation would take precedence between positivists and constructivist epistemologies. Additionally, a theoretical frame work would be required that was flexible enough to respond to both positions. Program theory has been recognized for its ability to do this and it additionally addresses both the individual and systemic shifts required to enact behavioural change for the empowerment of girls and women in STEM. Options considered included the Bennett Hierarchy, based on knowledge, aspirations, skills, and at titudes (KASA) (Mayne, 2015); Sen’s (2004) and Nussbaum’s (2011) Capability Approach (CA), with strong ties to policy and structural changes; and Michie, van Stralen, and West’s (2011) COM-B approach, elaborated by Mayne (2017), which identifies that behavioural (B) change occurs only when the three dimensions of capacity—comprised of Capabilities (C), Opportunities (O), and Motivations (M)—are present. Consistent across these models is the introduction of the op portunity dimension as an integral component of results achievement. The CA was ultimately not selected for two reasons, including its normative orientation— “it is not a theory that will explain poverty, inequality, or well-being, but rather a theory that helps us to conceptualize these notions” (Robeyns, 2016; emphasis in original)—and despite its use in studies comparing economic impacts in STEM

005_53011_Whynot_Mavriplis et al4.indd 362 21-02-2019 11:03:02 AM Knitting Theory in STEM Performance Stories 363

areas (Battison, 2015), economic impact was not flagged as a longer-term outcome in early discussions regarding the performance measurement strategy with CWSE program stakeholders.

baLanCing theoretiCaL framework seLeCtion and praCtiCaL appLiCation Any theoretical framework serving as the foundation for program theory requires both the ability and sufficient flexibility to coalesce multiple theories to ensure that the experiences and expertise of girls and women are centrally located in performance discussions. Early discussions with CWSE program stakeholders, in cluding the regional Chairs and the funding organization, were initiated to gather various perspectives for consideration in the development of a performance meas urement strategy including program theory. These discussions generally echoed what was found in the related literature discussed above. It was noted that the return on the investment in the Chairs program was perceived to be significant, given the level of resourcing previously mentioned in relation to the activities undertaken. It was acknowledged that the limited number of Chair positions, and the subsequent breadth of their activities across Canada, would not likely be sufficient to enact sustainable, structural change. This supports conclusions drawn by the GENDER-NET study, which stated that “impact has been demonstrated within schemes that are adequately resourced, and so consideration must be given to how a transnational gender equality award scheme is resourced to be sustain able” (2015, p. 96). The complex nature of addressing multiple barriers to girls’ and women’s representation, recruitment, retention, and promotion in STEM takes time to address; it was important for the theory to reflect the anticipated entire change process, rather than just what occurred during the occupation of individual positions. This is also consistent with findings from the literature on integration of gender in theories of change to address structural and relational barriers to support increased representation of women at higher levels, where power and influence are accumulated. It was noted that a measure of the Chair’s influence would be helpful to incorporate in subsequent performance frame works, and that while the CWSE National Network was particularly strong in its quantitative reporting, it would also be helpful to build on existing qualitative reporting dimensions in any future efforts. Recent briefings on integrating gender in program theory have identified that the impact of requiring qualitative data has yet to be demonstrated (GENDER-NET, 2015, p. 101); however, anecdotal lessons learned suggest that social-learning processes have been the most effective in developing gender capacities in partners (CCAFS, 2015, p. 4). The selection of the COM-B framework and discussions with the funding organization provided a point of departure to begin sketching the various com ponents of the theory of change. The COM-B program theory model includes the following dimensions: outputs/activities; stakeholder reach and reaction; capacity change, composed of motivation, capability, and opportunity; and behavioural

005_53011_Whynot_Mavriplis et al4.indd 363 21-02-2019 11:03:02 AM 364 Whynot et al.

change. Each of these phases in the causal-impact pathway is accompanied by related assumptions (Michie et al., 2011; Mayne, 2017). Discussions surrounding the development of the causal-impact pathway involved highlighting possible indicators for inclusion in the performance measurement framework. Indicators tabled for discussion included gender, both explicitly articulated and implicitly. It should be noted that while “reach” is an important early phase in the causal- impact pathway, to flag only gender at this juncture can contribute to gender- blindness, which is discussed in further detail below. To support the development of a skeleton causal-impact pathway, each of the regional Chairs’ websites was reviewed, and direct outreach to Chairs was undertaken by way of requesting related program documentation. These re sources were then used to draft a CWSE COM-B program theory causal-impact pathway to serve as the foundation for the future performance measurement strategy. This process was facilitated by the regional Chairs’ prior development of individual performance strategies to fulfil their jurisdictional action plans and related reporting efforts, as well as National Network updates undertaken at regu lar intervals. Beyond the components previously identified in the COM-B model, the first iteration of the causal-impact pathway had additionally contained related conceptual areas (such as awareness, access, participation, etc.) aligned to affect pathway dimensions. Additionally, the causal-impact pathway highlighted what was under the control or influence of the CWSE Program. This was to facilitate understanding among Chairs who have limited experience with performance measurement and evaluation. It was found that the early stages of the causal- impact pathway (activities/outputs/reach and reaction) of the COM-B model were easily addressed, but latter stages required articulating various criteria to assist in facilitating a common understanding. For example, communication-related metrics regarding online social media presence were included at the activity phase, but discussion was required to define and differentiate between informa tion or awareness-raising sessions and capacity-building activities. To support differentiating between the two, the level of program effort required was used as a defining criterion. The first round of input from stakeholders was garnered through site visits to each of the jurisdictions accommodating the regional Chairs. The purpose of the site visits was two-fold: observation of or participation in program-sponsored interventions; and an opportunity to discuss, with the regional Chairs and their program staff, the overall state of performance measurement in their jurisdic tions, the vision for a National Network performance measurement strategy, and any feedback that they may have had on the draft version of the National CWSE Network COM-B program theory. These site visits ranged in duration from one to two days and involved discussions with various program stakeholders in attend ance at events, ranging from individuals in industry, representatives of academic institutions, program beneficiaries and their parents, as well as all of the Chairs and their program personnel. These site visits were not considered to be repre sentative of Chair programs but rather provided a snapshot into their activities.

005_53011_Whynot_Mavriplis et al4.indd 364 21-02-2019 11:03:02 AM Knitting Theory in STEM Performance Stories 365

Continual updating of the draft occurred throughout the site visits based on information obtained. It is important to note that in the development of out come statements associated with specific causal-impact pathway stages, girls and women were specifically articulated at every stage to avoid gender-blindness. This ensured that girls and women were not omitted from the performance discussions at any stage of the causal-impact pathway. While this appears to be a rather practi cal, common-sense approach, it is critical for several reasons. These reasons were referenced by keynote speakers Maria Klawe at the fall 2017 President’s Dream Colloquium held at Simon Fraser University, and by Yves Desjardins-Siciliano at the 2017 Gender Summit 11, both of whom summarized that making these things explicit in performance stories ensures not only visibility but also a continued vis ibility. If gender and diversity are not specifically articulated, then attention can be diverted due to emerging and/or competing priorities, and what achievements have been made may not be sustainable.

proCess findings on knitting theory in stem: Lessons Learned Several key process-oriented findings emerged that have both practical and more conceptual performance-measurement and evaluation implications for the in tegration of gender and theory for the CWSE Program. These findings may also have implications for others who are considering integrating gender in their respective theories of change. At the indicator level At the indicator level, the knitting of various program theories was very apparent, incorporating both quantitative and qualitative information for both individual and structural environments based on various aspects regarding the representa tion, recruitment, retention, and advancement of women and girls in STEM. Examples of these structural indicators include the total number of policy changes within Chairs’ academic institutions (influenced by Chairs), as well as the total number of policy changes outside the Chairs’ academic institutions that support girls’ and women’s representation, recruitment, retention, and advancement in STEM. As part of the development of the causal-impact pathway-development process, identifying assumptions for each of the stages was also a critical discus sion point. Regarding the identification of assumptions Integrating gender considerations in assumptions requires making explicit what has remained hidden in the past, allows for discussion focusing on learning, and allows for the creation of evidence as part of a “collective construction” (Hivos, 2014, p. 8). It was interesting to note that gendered assumptions did not begin to explicitly emerge until the capacity-change stage of the causal-impact pathway, which is the juncture at which the Chairs’ control is relinquished and influence

005_53011_Whynot_Mavriplis et al4.indd 365 21-02-2019 11:03:02 AM 366 Whynot et al.

begins. The inclusion of specific gendered assumptions at this stage of the causal- impact pathway reflects the need for supportive environments, as well as changing social norms and values to support girls’ and women’s representation, recruitment, retention, and advancement in STEM. The articulation of making explicit the as sumptions also necessarily involved the identification and assumptions related to the roles and responsibilities of stakeholders involved in the various aspects of the STEM leaky-pipeline continuum. Reporting at regional Chair levels As previously discussed, regional Chairs contribute to multiple-level reporting efforts. Reporting at the regional Chair level is mandated at 24-month intervals. These regional reports are predicated on action plans comprising individual Chair funding agreements. These agreements are valid for an initial five-year period and are renewable for an additional two- to five-year period afterward. After several years, and after the implementation of effective individual reporting strategies, the development of a program theory and accompanying performance measure ment strategy has forecast impending reporting changes. It is anticipated that the indicators having been identified as critical for sharing the National Network performance story will serve as both additional and complementary to individual Chair reports. The addition of these new indicators adds a qualitative element that was previously less explicit, and it also holds space for reporting across the entire leaky pipeline to address changes beyond the tenure of individual Chairs. This is important, as the regional focus may also shift when new Chairs are introduced in a jurisdiction. The inclusion of these new indicators has associated methodologi cal implications for tool development, data collection, and subsequent analysis, which will have to be balanced with existing resources in order to be successfully implemented. Creating common understandings To gather feedback on the draft COM-B program theory and elements of the performance- measurement strategy, teleconferences were held and an electronic survey were undertaken to garner input from the regional Chairs on the various aspects of the outcome statements and associated indicators at each results stage of the COM-B model. As part of the electronic survey, Chairs were asked, for each of the suggested indicators, whether they currently reported on it, whether it was viewed as important to the National Network, whether it was viewed as important to their region, the frequency of reporting, whether their region’s performance story could be told without it, whether a tool had been developed to support reporting, and whether the indicator was reflective of the outcome to address the plausibility and coherence factors. All of the regional Chairs, or their personnel, responded to the survey, and their responses provided useful information. It was also critical to note that key questions were raised regarding the language used in the outcome, indicator, and assumptions components of the causal-impact pathway. This flagged a need for greater specificity for many

005_53011_Whynot_Mavriplis et al4.indd 366 21-02-2019 11:03:02 AM Knitting Theory in STEM Performance Stories 367

of the terms used and for how they would be reported on in the future. It was noted that differing interpretations of terminology existed among each of the regional Chairs. Ultimately, the questions posed contributed to the development of much more specific indicators associated with the various outcome stages in the COM-B framework for inclusion as the basis for the performance measure ment strategy. Previously mentioned policy briefs and guidance on integrating gender and theory (CCAFS, 2015; Hivos, 2014) specifically reference program theory as a mechanism for facilitating understanding and creating opportunities to facilitate understanding from various situated knowledges, which was what occurred in this instance.

ConCLusions and next steps The academic literature can be broadly grouped into specific areas that include the representation, recruitment, retention, and promotion of girls and women in STEM; explorations of the environmental factors, including challenges and enablers affecting progression in these areas; and research synthesizing this in formation to guide future policy directions. The vast array of literature alludes to, and confirms, the complexity of the leaky-pipeline STEM continuum. This same literature also serves to highlight the deficit of resources on integrating gender in performance discussions, particularly when using program theory. Sector-specific areas have also emphasized this emerging area, one that will undoubtedly con tinue to flourish given the emphasis on gender equality, both within Canada and internationally, for STEM. Theory can assist in highlighting and explaining perverse effects, such as those recently reported by Stoet and Geary (2018, p. 590), who noted that “coun tries with lower levels of gender equality had relatively more women among STEM graduates than did more gender-equal countries”(p. 590). This is an opportunity to draw on theories of change from multiple sources. Scientists, performance practitioners, and feminists can bridge intersections across these disciplines to contribute to the conversation regarding integrating gender in program theory for girls and women in STEM. Dedicated segments from these populations all have long-standing traditions representing the interests of those individuals who continue to be underrepresented in various aspects of the leaky pipeline. In part, this is attributable to a lack of access to and participation in opportunities that others have had as a result of structural impediments that impede accumula tions of power and influence. It is these structural impediments that have gar nered the attention of authorities internationally to coalesce efforts to standardize performance measurement and evaluation efforts related to girls’ and women’s representation, recruitment, retention, and promotion in STEM. These efforts have situated gender as a key analytical variable in performance discussions, and specific efforts related to performance measurement and evaluation have begun exploring integrating gender in program theory to reflect the more sophisticated metrics and understandings required.

005_53011_Whynot_Mavriplis et al4.indd 367 21-02-2019 11:03:02 AM 368 Whynot et al.

A significant opportunity exists in the future for performance practitioners, as well as for the CWSE Program, given the dynamic landscape of standardizing gender-equality measurements in STEM. This rings particularly true, given the federal government’s emphasis on the gender results framework informing budget discussions and the roles made visible for girls and women in STEM. Not only are girls and women specifically articulated in STEM, but future directions emphasiz ing new data-collection methods in this area have also been flagged by the federal government. Longitudinal data collection at the individual level throughout the entire STEM continuum is beyond the current ability of the Chairs due to limited financial resources and capacities and their existing responsibilities, but there ex ists a window in which this opportunity could be explored to spread longitudinal data-collection burdens across multiple stakeholders, including government and academia. These initiating efforts to develop a performance measurement strategy that embeds across the outcomes, indicators, and assumptions dimensions of gender in program theory is a novel undertaking. Guidance materials (CCAFS, 2015; Hivos, 2014; TBS, 2017) reflecting on either gender equality or the related gender- based analysis outline a direct requirement to consider the needs of diverse program beneficiaries in the context of performance discussions. The experi ence offers, and validates in some instances, early insights offered on the process of integrating gender in program theory resulting from applied experiences. This has important implications for performance practitioners both conceptu ally and practically, which can guide future related actions for this sector, but more importantly across other sectors as they begin to think about the various complexities associated with gender equality and ensuring its representation in performance stories. One of the key dimensions that remains yet unaddressed, but is alluded to by Podems (2010), is in being specific about language, particularly in distinguishing between gendered and feminist approaches. Canada’s current government has self-declared as “feminist” with gender equality and gender-based analysis being entrenched across the policy cycle in novel ways. This, however, may not be the case for future political leadership. The repercussions of feminist declarations have been well noted worldwide (Chant & Sweetman, 2012; Podems, 2014), and prac tising feminist approaches without labelling it as such has been advised in order to contribute to sustainable practice. This has been reiterated in more recent studies of the federal government’s Gender Focal Points, in which possible discursive strategies identified delinking gender and feminism (Paterson & Scala, 2016) and instead refocusing on evidence-based decision making. At a practical level, the draft CWSE COM-B performance measurement strategy is moving toward finalization. Next steps include coordinating with the National Network to review the next iteration of the performance measurement strategy matrix and supporting regional Chairs in the process of aligning current individual reporting frameworks with the National Network model to ensure that consistency is maintained and that efficiencies are maximized where possible.

005_53011_Whynot_Mavriplis et al4.indd 368 21-02-2019 11:03:02 AM Knitting Theory in STEM Performance Stories 369

aCknowLedgements It is with grateful appreciation that the authors acknowledge the contributions of the Chairs for Women in Science and Engineering Program’s personnel, including Danniele Liven- good, Sally Marchand, and Mahalia Lepage.

referenCes Anderson, E., & Langford, M. (2012). A distorted metric: The MDGs and state capacity. University of Oslo Faculty of Law Research Paper No. 2013-10. Retrieved from: SSRN:https://ssrn.com/abstract=2217772 Astbury, B., & Leeuw, F. (2010). Unpacking black boxes: Mechanisms and theory building in evaluation. American Journal of Evaluation, 31(3), 363–381. Barbercheck, M. (2001). Mixed messages: Men and women in advertisements in science. In M. Wyer, M. Barbercheck, D. Geisman, H. O. Ozturk, & M. Wayne (Eds.), Women, science, and technology: A reader in feminist science studies (pp. 117–131). New York, NY: Routledge. Battison, S. (2015). Fixing the leaky pipeline to break the glass ceiling: An exploratory study on the economic impacts of gender equality initiatives targeting female representation in the STEM industries. University of Hertfordshire Business School. Bebbington, D. (2002). Women in science, engineering and technology: A review of the issues. Higher Education Quarterly, 56(4), 360–375. https://doi.org/10.1111/1468 2273.00225 Blickenstaff, J. C. (2005). Women and science careers: Leaky pipeline or gender filter? Gender and Education, 17(4), 369–386. Blotnicky K. A., Franz-Odendaal T. A., French F., & Joy P. (2018). A study of the correlation between STEM career knowledge, mathematics self-efficacy, career interests, and career activities on the likelihood of pursuing a STEM career among middle school students. International Journal of STEM Education, 5, 22. https://doi.org/10.1186/s40594-018 0118-3 Boserup, E. (1970). Women’s role in economic development. London, England: Allen & Unwin. Brown, R., Brown, J., Reardon, K., & Merrill, C. (2011). Understanding STEM: Current perceptions. Technology and Engineering Teacher, 70(6), 5–9. Carey, D. (2014). Overcoming skills shortages in Canada. OECD Economics Department Working Paper No. 1143. Paris, France: OECD. Chairs for Women in Science and Engineering [CWSE]. (2017). Performance management strategy advisory job posting. Chairs for Women in Science and Engineering [CWSE]. (2012). CWSE coast to coast update 2012. Chang, J. (2014). Young women avoid STEM careers due to negative images, stereotypes. Control Engineering, 61(10), 341. Chant, S., & Sweetman, C. (2012). Fixing women or fixing the world: “Smart economics,” efficiency approaches, and gender equality in development. Gender & Development, 20(3), 517–529.

005_53011_Whynot_Mavriplis et al4.indd 369 21-02-2019 11:03:03 AM 370 Whynot et al.

Climate Change Agriculture and Food Security [CCAFS]. (2015). Lessons in theory of change: Gender and inclusion. CCSL Learning Brief No. 14. Retrieved from https:// cgspace.cgiar.org/bitstream/handle/10568/61900/Learning%20Brief%2014.pdf Council of Canadian Academies [CCA]. (2012). Strengthening Canada’s research capac ity: The gender dimension. Ottawa, ON: The Expert Panel on Women in University Research. Council of Canadian Academies [CCA]. (2014). Science culture: Where Canada stands. Ottawa, ON: The Expert Panel on the State of Canada’s Science Culture. Council of Canadian Academies [CCA]. (2015). Some assembly required: STEM skills and Canada’s economic productivity. Ottawa, ON: The Expert Panel on STEM Skills for the Future. Crasnow, S., Wylie, A., Bauchspies, W., & Potter, E. (2016). Feminist perspectives on sci ence. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. Retrieved from https://plato.stanford.edu/archives/win2016/entries/feminist-science/ Croft, E., & Pelletier, J. (2012, September/October). Diversity in organizations: Why and how. Innovation, 18–20. Darisi, T., Davidson, V., Korabik, K., & Desmarais, S. (2010). Commitment to graduate studies and careers in science and engineering: Examining women’s and men’s experi ences. International Journal of Gender, Science and Technology, 2(1), 48–64. Dasgupta, N., & Stout, J. (2014). Girls and women in science, technology, engineering, and mathematics: STEMing the tide and broadening participation in STEM ca reers. Policy Insights from the Behavioral and Brain Sciences, 1(1), 21–29. https://doi. org/10.1177/2372732214549471 Franz-Odendaal T. A., Blotnicky K., French F. & Joy P. (2016). Experiences and Perceptions of STEM. Subjects and Careers and Engagement in STEM Activities Among Middle School Students. Canadian Journal of Science, Math and Technology Education, 16(2), 153–168. Future Tense. (2014). “From nowhere to Nobels”: A Future Tense event about women in STEM. Retrieved from www.slate.com/blogs/future_tense/2014/03/14/from_nowhere_to_ nobels_a_future_tense_event_about_women_in_stem.html GENDER-NET. (2015). Analysis report: Award schemes, gender equality and structural change. Retrieved from http://www.ecu.ac.uk/wp-content/uploads/2015/10/ECU- Gendernet-full-report-Oct-2015.pdf Hall, W. M., Schmader, T., & Croft, E. (2015). Engineering exchanges: Daily social identity threat predicts burnout among female engineers. Social Psychological & Personality Science, 6(5), 528–534. https://doi.org/10.1177/1948550615572637 Hankivsky, O., Grace, D., Hunting, G., Giesbrecht, M., Fridkin, A., Rudrum, S., ... & Clark, N. (2014). An intersectionality-based policy analysis framework: Critical reflections on a methodology for advancing equity. International Journal for Equity in Health, 13, 119. https://doi.org/10.1186/s12939-014-0119-x Harding, D. (1991). Whose science? Whose knowledge? Thinking from women’s lives. Milton Keynes, England: Open University Press. Hill, C. (2010). Why so few? Women in science, technology, engineering, and mathematics. Washington, DC: AAUW.

005_53011_Whynot_Mavriplis et al4.indd 370 21-02-2019 11:03:03 AM Knitting Theory in STEM Performance Stories 371

Hivos. (2014). Gender and theories of change. Retrieved from https://www.academia. edu/11475035/Gender_and_THEORIES_OF_CHANGE Huhman, H. (2012). STEM fields and the gender gap: Where are the women? Forbes. Retrieved from https://www.forbes.com/sites/work-in-progress/2012/06/20/stem fields-and-the-gender-gap-where-are-the-women/#15811b8141ba Kalmar, D. A., & Sternberg, R. J. (2008). Theory knitting: An integrative approach to theory development. Philosophical Psychology, 1(2), 153–170. https://doi. org/10.1080/09515088808572934 Keller, E. F. (1985). Reflections on gender and science. New Haven, CT: Yale University Press. Kloosterman, J., Benning, E., & Fyles, R. (2012). “Measuring the unmeasurable”: Gender mainstreaming and cultural change. Gender & Development, 20(3), 531–545. http:// doi.org/10.1080/13552074.2012.731752 Krug, D. (2012). STEM education and sustainability in Canada. Paper presented at 2nd International STEM in Education Conference, Beijing, China. Lane, K. A., Goh, J. X., & Driver-Linn, E. (2012). Implicit science stereotypes mediate the relationship between gender and academic participation. Sex Roles, 66(3–4), 220–234. http://doi.org/10.1007/s11199-011-0036-z Langford, M. (2012). The art of the impossible: Measurement choices and the post-2015 development agenda background paper. New York, NY: United Nations Development Programme. Leeuw, F., & Donaldson, S. (2015). Theory in evaluation: Reducing confusion and encour aging debate. Evaluation, 21(4), 467–480. Mayne, J. (2015). Useful theories of change. Canadian Journal of Program Evaluation, 32(2), 119–142. Mayne, J. (2017). Working draft: The COM-B theory of change model. Retrieved from: https://www.researchgate.net/publication/314086441_The_COM-B_Theory_of_ Change_Model_V3. Michie, S., van Stralen, M. M., & West, R. (2011). The behaviour change wheel: A new method for characterising and designing behaviour change interventions. Implemen tation Science, 6, 42. Mishagina, N. (2012). The state of STEM labour markets in Canada literature review. Mon treal, QC: Centre interuniversitaire de recherche en analyse des organisations. Moser, C. O. N. (1989). Gender planning in the Third World: Meeting practical and strategic gender needs. World Development, 17(11), 1799–1825. https://doi.org/10.1016/0305 750X(89)90201-5 Müller, J., Castano, C., González, A., & Palmén, R. (2011). Policy towards gender equality in science and research. Brussels Economic Review, 54(2–3), 295–316. Natural Sciences and Engineering Research Council of Canada [NSERC]. (2010). Women in science and engineering in Canada. Retrieved from http://www.nserc-crsng.gc.ca/_ doc/ReportsRapports/Women_Science_Engineering_e.pdf Natural Sciences and Engineering Research Council of Canada [NSERC]. (2016). Report on plans and priorities—2016–17. Retrieved from http://www.nserc-crsng.gc.ca/NSERC CRSNG/Reports-Rapports/RPP-PPR/2016-2017/index_eng.asp

005_53011_Whynot_Mavriplis et al4.indd 371 21-02-2019 11:03:03 AM 372 Whynot et al.

Natural Sciences and Engineering Research Council of Canada [NSERC]. (2107). Chairs for Women in Science and Engineering Program. Retrieved from http://www.nserc-crsng. gc.ca/Professors-Professeurs/CFS-PCP/CWSE-CFSG_eng.asp Nussbaum, M. (2011). Creating capabilities: The human development approach. Cambridge, MA: The Belknap Press of Harvard University Press. Organisation for Economic Co-operation and Development [OECD]. (2015). Main science and technology indicators. OECD StatExtracts. Retrieved from http://stats.oecd.org/ Index.aspx?DataSetCode=MSTI_PUB Overholt, C., Anderson, M. B., Cloud, K., & Austin, J. E. (1984). Gender roles in develop ment projects: A case book. West Hartford, CT: Kumarian Press. Ovseiko, P. V., Chapple, A., Edmunds, L., & Ziebland, S. (2017). Advancing gender equality through the Athena SWAN Charter for Women in Science: An explora tory study of women’s and men’s perceptions. Health Research Policy and Systems, 15(12), 1–13. Parker, R., Pelletier, J., & Croft, E. (2015). WWEST’s gender diversity in STEM: A briefing on women in science and engineering. San Francisco, CA: Blurb. Paterson, S., & Scala, F. (2016). The prospects and challenges of GBA+: Stories from the frontlines. Presentation. Perreault, A., Franz-Odendaal, T., Langelier, E., Farenhorst, A., Mavriplis, C., Shannon, L. (2018). Analysis of the distribution of females and males in STEM fields in Canada. Re trieved from http://wiseatlantic.ca/wp-content/uploads/2018/03/WISEReport2017_ final.pdf Plesca, M., & Summerfield, F. (2014). Skill demand, supply, and mismatch in the Canadian economy: Knowledge synthesis report. Ottawa, ON: Social Sciences and Humanities Research Council (SSHRC). Podems, D. (2010). Feminist evaluation and gender approaches: There’s a difference? Jour nal of Multidisciplinary Evaluation, 6(14), 1–17. Podems, D. (2014). Feminist evaluation for nonfeminists. In S. Brisolara, D. Seigart, & S. SenGupta, (Eds.), Feminist evaluation and research: Theory and practice (pp. 113–142). New York, NY: Guildford Press. Polacheck, S. W. P. (1979). Occupational segregation among women: Theory, evidence and a prognosis. In C. B. Lloyd, E. S. Andrews, & C. L. Gilroy (Eds.), Women in the labor market (pp. 137–157). New York, NY: Columbia University Press. Polacheck, S. W. P. (1987). Occupational segregation and the gender wage gap. Population Research and Policy Review, 6(1), 47–67. https://doi.org/10.1007/BF00124802 Prism Economics and Analysis [PEA]. (2012). The engineering labour market in Canada: Projections to 2020. Ottawa, ON: Engineers Canada. Robeyns, I. (2016). The capability approach. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy. Retrieved from https://plato.stanford.edu/archives/win2016/entries/ capability-approach/ Schiebinger, L. (1999). Has feminism changed science? Cambridge, MA: Harvard University Press. Sen, A. (2004). Capabilities, lists, and public reason: Continuing the conversation. Feminist Economics, 10(3), 77–80.

005_53011_Whynot_Mavriplis et al4.indd 372 21-02-2019 11:03:03 AM Knitting Theory in STEM Performance Stories 373

Settles, I. H. (2014). Women in STEM: Challenges and determinants of success and well being. American Psychological Association science brief. Retrieved from http://www. apa.org/science/about/psa/2014/10/women-stem.aspx Settles, I. H., Cortina, L. M., Buchanan, N. T., & Miner, K. (2013). Derogation, discrimi nation, and (dis)satisfaction with jobs in science: A gendered analysis. Psychology of Women Quarterly, 37(2), 179–191. https://doi.org/10.1177/0361684312468727 Simard, C., & Gammal, D. L. (2012). Solutions to recruit technical women. Anita Borg Insti tute Solutions Series. Palo Alto, CA: Anita Borg Institute for Women and Technology. Smit Quosai, T., Davidson, V. J., Ghazzali, N., Moloney, C., & Vassileva, J. (2009). Defining equity indicators for benchmarking women’s participation in science and engineering faculties across Canada. Women in Engineering and Technology Research, PROMETEA International Conference, Paris (France), November 26–27. Stoet, G., & Geary, D. (2018). The gender equality paradox in science, technology, engineer ing and mathematics education. Psychological Science, 29(4), 581–593. Storrie, M. (2012). The new business imperative: Recruiting, developing and retaining women in the workplace. UNC Kenan-Flagler Business School. Retrieved from http://www. kenan-flagler.unc.edu/executive-development/customprograms/~/media/3A15E5EC 035F420690175C21F9048623.pdf Treasury Board Secretariat [TBS]. (2010). Supporting effective evaluations: A guide to devel oping performance measurement strategies. Retrieved from https://www.canada.ca/en/ treasury-board-secretariat/services/audit-evaluation/centre-excellence-evaluation/ guide-developing-performance-measurement-strategies.html Treasury Board Secretariat [TBS]. (2012). Theory-based approaches to evaluation: Concepts and practices. Retrieved from https://www.canada.ca/en/treasury-board-secretariat/ services/audit-evaluation/centre-excellence-evaluation/theory-based-approaches evaluation-concepts-practices.html#toc3 Treasury Board Secretariat [TBS]. (2015). Results-based management lexicon. Retrieved from: https://www.canada.ca/en/treasury-board-secretariat/services/audit-evaluation/ centre-excellence-evaluation/results-based-management-lexicon.html Treasury Board Secretariat [TBS]. (2016). Policy on results. Retrieved from https://www. tbs-sct.gc.ca/pol/doc-eng.aspx?id=31300Treasury Board Secretariat (TBS). (2017). Interim guidance on the policy on results. UNESCO. (n.d.). STEM and gender advancement (SAGA). Retrieved from https:// en.unesco.org/saga Whynot, J. (2015). Integrating gender into the Canadian federal government evaluation func tion. Paper presented at Evaluation Conclave, Kathmandu, Nepal.

author information Jane Whynot is the past president of the National Capital Chapter of the Canadian Evalu ation Society, a contract instructor at Carleton University, and a Ph.D. candidate at the University of Ottawa. Catherine Mavriplis is a professor of Mechanical Engineering at the University of Ottawa, where she has held the NSERC Chair for Women in Science and Engineering for Ontario since 2011. She has led nationally-funded leadership development activities for women in doi: 10.3138/cjpe.53011 CJPE 33.3, 354–374 © 2019

005_53011_Whynot_Mavriplis et al4.indd 373 21-02-2019 11:03:03 AM 374 Whynot et al.

STEM since 1996 in the US and since 2007 in Canada. She is the co-author and co-editor of FORWARD to Professorship: Inclusive Faculty Development Strategies That Work (Elsevier, 2016). Her engineering research specialty is in computational fluid dynamics and, in this capacity, she has been President of the Computational Fluid Dynamics Society of Canada and a Councilor for the Canadian Aeronautics and Space Institute. Annemieke Farenhorst is a professor of Soil Science and the Associate Dean Research in the Faculty of Agricultural and Food Sciences at the University of Manitoba. She holds the Prairie NSERC Chair for Women in Science and Engineering. Dr. Farenhorst conducts research on pesticides, natural steroid estrogens, and antibiotics in soil and water and on drinking water quality in First Nations communities. Dr. Farenhorst is the Canadian representative on the Division VI Chemistry and the Environment Committee of the In ternational Union of Pure and Applied Chemistry and an associate editor for the Journal of Environmental Science and Health, Part B: Pesticides, Food Contaminants, and Agricultural Wastes. Dr. Farenhorst has earned a range of teaching, research, and outreach awards, including being a WXN 2016 Canada’s Most Powerful Women Top 100 Award winner. Dr. Farenhorst is on the Board of Directors of Canadian Centre for Women in Science, Engineering, Trades and Technology. Eve Langlier is a professor at the Mechanical Engineering Department at Université de Sherbrooke. She holds the Quebec NSERC Chair for Women in Science and Engineering. Dr. Langelier conducts research on human mobility, more specifically on soft tissue biomechanics and mechanobiology and adapted sports equipment. She is serving on different committees, including the Advisory Committee on Equity, Diversity and Inclusion Policy of the Canada Research Chairs, and the 30by30 Committee of Ordre des ingénieurs du Canada. Tamara Franz-Odendaal is a professor of Biology at Mount Saint Vincent University. She has held the NSERC Chair for Women in Science and Engineering (Atlantic) since 2011. Dr. Franz-Odendaal’s research focuses on the evolution and development of the vertebrate skull. Her program, WISEatlantic, aims to shift gendered STEM stereotypes. WISEatlantic empowers girls to consider STEM-based careers by raising their awareness of the diversity of jobs within these fields and supports early career women in STEM through professional development and networking opportunities. She received the Mount’s Research Excellence Award (2015) for her outstanding contributions to the research community at the Mount and the Crystal Award from the Diversity Journal as a Woman Worth Watching in STEM (2017). She currently serves on the Board of Directors of the Canadian Coalition of Women in Science, Engineering, Trades and Technology (CCWESTT) and as the Diversity Officer on the Executive of the PanAmerican Society for Evolutionary-Developmental Biology. Lesley Shannon is an associate professor and Chair for the Computer Engineering Op tion in the School of Engineering Science at Simon Fraser University (SFU). She is also the NSERC Chair for Women in Science and Engineering for the British Columbia/Yukon region. Dr. Shannon conducts research on computer systems design, a rapidly growing field that combines custom computing hardware with software to design and implement application-specific computer systems. Her research has applications in a wide range of areas including robotics, machine learning, aerospace and biomedical systems, multimedia applications, and cloud computing. Dr. Shannon received the 2014 EGBC Teaching Award of Excellence in recognition of her mentoring activities and leading a redesign of the School’s undergraduate curriculum at SFU. Dr. Shannon also received the 2017 Greater Vancouver Board of Trade’s Wendy McDonald Award for Diversity Champion in recognition of her work as a sponsor of diversity and equality and promoting systematic change.

005_53011_Whynot_Mavriplis et al4.indd 374 21-02-2019 11:03:03 AM Till Time (and Poor Planning) Do Us Part: Programs as Dynamic Systems—Incorporating Planning of Sustainability into Theories of Change

Sanjeev Sridharan Bill and Melinda Gates Foundation April Nakaima Evaluation Centre for Complex Health Interventions

Abstract: This article describes the need for, and challenge of, representing the sus tainability of a program as a dynamic process. Part of what enhances the complexity of programs is the challenge of dynamic complexity—the complexity of the program evolves over time through the interaction of actors and their environment. The problem is not just one of representation but also of planning—specifically planning for sustainability. We argue that an essential part of any accountability regime is planning for sustainability. Using the concept of programs as dynamic process, we argue that plan ning for sustainability needs to be a critical aspect of the impact chains of all theories of change. Both the representation and testing aspects of such a formulation are discussed. Keywords: dynamic change, evaluative thinking, planning, sustainability, theory of change

Résumé : Cet article décrit la nécessité et le défi de représenter la pérennité d'un programme en tant que processus dynamique. La dynamique de la complexité – le fait que le programme évolue dans le temps en raison de l’interaction entre les ac teurs et leur environnement – explique la complexité croissante des programmes. Le problème n’en est pas seulement un de représentation, mais aussi de planification, particulièrement quand il s’agit de planifier pour favoriser la pérennité du programme. Tout système d’imputabilité se doit de planifier la pérennité. En nous appuyant sur une conception dynamique des programmes, nous soutenons que la planification à des fins de pérennité doit être un aspect important des chaines de résultats pour toutes les théories du changement. Nous abordons à la fois la question de la représentation de cette proposition et la question de sa mise à l’épreuve. Mots clés : changement dynamique, pensée évaluative, planification, pérennité, théorie du changement

Corresponding author: April Nakaima, The Evaluation Centre for Complex Health Interventions, St. Michael’s Hospital, 30 Bond Street, Toronto, ON, M5B 1W8, Canada; [email protected]

006_53055_Sridharan_Nakaima4.indd 375 21-02-2019 11:07:21 AM 376 Sridharan and Nakaima

Interventions are “complex systems thrust amidst an existing complex system” (Pawson, 2006, p. 106; Pawson, Greenhalgh, Harvey, & Walshe, 2004). The focus of evaluation is frequently on immediate impacts, and whether such impacts are sustained after the intervention’s funding has ended is often not formally explored (Cekan, 2016). For example, researchers at Valuing Voices reported that accord ing to the Organisation for Economic Co-operation and Development (OECD), $5 trillion has been spent by governments on development aid since 1945, and $137 billion spent in 2014 alone, yet in reviewing approximately 4,000 documents from databases of major donors and major multilateral banks, the Valuing Voices researchers found only 370 publicly available ex-post or post-completion evalua tions out of a total of 950 evaluations that were said to have been conducted after the projects ended (presumably to determine what remained or what was sus tained), and nearly 600 were desk studies (Cekan, 2017). However, the challenge does not simply rest on developing empirical evidence of the connection between the intervention and its context in producing sustained impacts. A theoretical imagination is often missing in conceptualizing the planning for sustainability as part of the impact chain. Sustainability can be thought of in three ways: sustaining components of the intervention, sustaining impacts after the program ends, and mainstreaming or incorporating active ingredients of the intervention into other programs. We also subscribe to the following definition of sustainability by Scheirer and Dearing (2011, p. 2060):

the continued use of program components and activities for the continued achievement of desirable program and population outcomes. Other terms that have been used by previous researchers in this domain include continuation, confirmation, maintenance, durability, continuance, and institutionalization. There are some nuanced differences among these terms, but they all usually refer to the continued use of program components and activities beyond their initial funding period and sometimes to continuation of desired intended outcomes; this is what we mean by sustainability. Generally speaking, the likelihood of sustainability is heightened when there is an alignment, compatibility, or convergence of (1) problem recognition in the external organizational environment or community, (2) the program in question, and (3) internal organizational objectives and capacities.

Our main interest in the above definition is focussed on the “continuation of desired intended outcomes.” We explore the role that planning for sustainability can play in sustaining impacts and argue for the inclusion of sustainability considerations in theories of change (ToCs), both for program planning and for evaluative purposes in assessing programs. Shediac-Rizkallah and Bone (1998) argue for the importance of moving from a passive to an active approach to sustainability: sustainability is not an accident but requires active planning. In their words, “[u]nderstanding the conditions under which programs are most likely to continue is required to move from a ‘latent’ or passive approach to sustainability towards active attempts to modify

006_53055_Sridharan_Nakaima4.indd 376 21-02-2019 11:07:22 AM Till Time (and Poor Planning) Do Us Part 377

conditions to maximize the potential for long-term sustainability” (p. 98). They further identify that the factors that influence planning for sustainability include project-design and implementation factors, organizational factors, and factors in the community environment. In this paper we use a case example of a poverty-reduction project and examine the intervention against the framework of the COM-B (Capabilities, Opportunities, Motivation, and Behaviour) Based Theory of Change model developed by John Mayne (2017). The COM-B model is a further iteration of Mayne’s (2015) useful theories of change paper, which has been well received in the evaluation community. Compounding the above-mentioned challenge in complex problems like poverty reduction or improving maternal health in resource-constrained environments is the fact that the “impact journey” typically encountered by a client rarely takes the form of a linear trajectory. For example, consider the case of a poor client living in a disadvantaged community, with very few job opportunities, trying to set up a viable business with the assistance of a community development agency. From the client’s perspective, the impact journey rarely corresponds to just a series of training workshops in building their knowledge and skills that will lead to their setting up a business and earning an income. The metaphor we find useful is that the typical journey is one of a “rugged landscape” in which there are multiple hills to climb and descend. The process by which an intervention can change clients’ capabilities, influence their motivation, and provide them with multiple opportunities to set up and then run a business can be circuitous and heterogeneous—in other words, different clients might have very different journeys (Sridharan, Jones, Caudill, & Nakaima, 2016). The evaluator’s challenge in exploring this impact journey is further compounded by the fact that impacts might occur well after the time frame of the evaluation (Sridharan, Campbell, & Zinzow, 2006). Interventions need to be planned with an understanding of the temporalities involved in affecting capacities, motivations, and opportunities for the difficult impact journeys (Mayne, 2017; Michie, 2015). Despite the short length of some interventions, they need to incorporate some planning for sustainability (Sridharan, Go, Zinzow, Gray, & Gutierrez Barrett, 2007) that integrates knowledge of the timeline of the impact journeys and the heterogeneous nature of the impact journeys of different clients. Providing support at different stages of the impact journey, which involves “multiple hills” that need to be traversed, is part of the successful implementation of complex interventions. This paper focuses on how thinking about theories of change can help with facilitating such complex impact journeys. We focus on planning for sustainability as an explicit part of how interventions can affect individuals over time. The pa per has important implications for programs that seek to take a person-centered lens (see, e.g., the American Geriatrics Society Expert Panel on Person-Centered Care, 2016) in program planning. A person-centered lens recognizes that differ ent individuals have different needs, preferences, and values, and that programs

006_53055_Sridharan_Nakaima4.indd 377 21-02-2019 11:07:22 AM 378 Sridharan and Nakaima

need to incorporate knowledge of such heterogeneities into their planning and implementation. Our focus is on the need to incorporate planning for sustainability (Sridharan et al., 2007) as part of theories of change. These ideas are important, given the push toward evidence-based programming. In this context, there is an implicit assumption with the commissioning of most evaluations that the evaluation findings will help in deciding whether or not to sustain a program. The implicit causal chain of programming decision making and the role that evaluations play is often as follows:

Plan intervention → Implement → Evaluate → Decide whether to sustain the program

In our experience, the above sequence rarely corresponds to how programming decisions are made. Decisions about sustainability often need to be made well before the evaluation provides evidence of impacts—this is because, as noted above, realistic timelines of impacts are often not considered in planning evaluations (Cook, 2000) and because decision-making cycles by policymakers may not be aligned with the timing of evaluation results (Leviton & Hughes, 1981). Hence, planning for sustainability needs to happen much earlier, and we argue that planning for sustainability should be an integral part of what we consider to be a useful theory of change. This is important because incorporating planning for sustainability can change the nature of the program itself (Sridharan & Gillespie, 2004). We believe that these arguments may have far-reaching consequences. The arguments in this paper are important because most guidance on theories of change (Chen, 2015; Funnell & Rogers, 2011; Mayne 2015, 2017; Morra Imas & Rist, 2009; Patton, 2008; Rossi, Lipsey, & Freeman, 2004) ignores planning for sustainability as a construct that needs to be incorporated into the theory of change itself. As stated above, there is a need to move beyond demonstrating immediate impacts towards a greater focus on sustained impacts (Cekan, 2016). The title of our paper, “Till Time (and Poor Planning) Do Us Part,” is driven by our experience in multiple evaluations focused on reducing poverty and inter ventions focused on enhancing health equities. In our experience, programs often come to a sudden halt as funding dies out without attention to providing a continu ity of care to clients or implementing strategies to better ensure that the outcomes achieved during the intervention can be maintained after the program ends. We think that theory-driven evaluation, with its potential focus on theorizing clients’ impact journeys, can help mitigate this problem by more explicitly recognizing that individuals have very heterogeneous landscapes underpinning such impact jour neys and that periods of success and failure are typically part of participants’ impact journeys. Our theories of change therefore need to be driven by knowledge of such heterogeneous impact journeys and what context of supports, capacities, opportuni ties, and motivational incentives clients might need after the program ends. This is not to suggest that a program is fully responsible for what happens after it ends, but

006_53055_Sridharan_Nakaima4.indd 378 21-02-2019 11:07:22 AM Till Time (and Poor Planning) Do Us Part 379

rather that planners and policymakers envision what needs to be in place (perhaps at the system level) to at least maintain the gains achieved and then to plan and take steps toward, for example, partnerships with other agencies that have the capacity to carry on, or equipping participants with needed capabilities and resources to carry on themselves. We argue that planning for sustainability is one useful device in addressing such challenges. This paper is organized as follows. We first demonstrate our ideas through an economic empowerment program aimed at reducing poverty for women in an immigrant community in Canada. We demonstrate how a linear, somewhat me chanical view of the change process can be contrasted with a view that explicitly incorporates planning for sustainability, arguing that the economic empowerment program would have been very different if planning for sustainability had been taken more seriously at the initial planning and implementation stages. We end by discussing the implications of what planning for sustainability would mean for the practice of theory-driven evaluation and how, as a field, we can be more focused on sustainable impacts by incorporating issues of sustainability into theories of change.

Case study: an example of an eConomiC empowerment program We use the case example of a women’s economic empowerment and entrepre neurial program in a neighbourhood with a predominantly immigrant popula tion in a large Canadian city to demonstrate both the importance of planning for sustainability and how the program could have benefitted from a more serious application of the COM-B model. The strength of the COM-B model is that it gets program planners and implementers to reflect on how the program will affect participants’ behavior change through addressing capabilities, opportunities, and motivation of individuals. One shortcoming of the COM-B model as it currently is represented in the literature (Mayne, 2017) is that as a tool for planners and im plementers of programs or for use in analysis by evaluators, it does not explicitly lead the user of the tool to consider notions of sustained impacts, or long-term and dynamic notions of capabilities, opportunities, and motivation. For example, what can a program do now that may influence a participant’s motivation after contact with the program has ended? The case example is a place-based intervention that was developed to help lift residents out of low-income ranges by providing appropriate supports to assist them in generating more income though home-based businesses. The implement ing community organization was aware of informal networks in the community composed of women supporting each other with provisions of child care, food preparation, and selling clothing and jewellery to one another from their homes. The main idea of the intervention was to leverage these skills and build other en trepreneurial skills needed by individual women, and also to help facilitate their access to business financing and markets outside of the neighbourhood.

006_53055_Sridharan_Nakaima4.indd 379 21-02-2019 11:07:22 AM 380 Sridharan and Nakaima

The program supports included one-on-one counseling; training workshops on topics such as a business planning, financing and loans, and social media, and a food handler’s course delivered by a public-health unit; a resource “library” for borrowing sewing machines, chaffing dishes, and large cooking pots; referrals to other services as needed, such as language classes; facilitation of group meetings to encourage cooperative or collective business arrangements between women with complementary skills, such as fashion designers with seamstresses; communica tions sent out to participants about upcoming community markets, opportunities to showcase one’s business, and so on; and coordination of business-related field trips. The program was run by a director, a case manager who delivered all of the client counseling, and two part-time community animators. The neighborhood is a multicultural community where 70% of the residents are immigrants—some are recent immigrants, but the majority have lived in Canada for 8 to 20 years. In a reflection of the difficulty for immigrants to enter the job market in Canada, the neighborhood has a high unemployment rate (more than 1.5 times higher than the average for the city as a whole) and a substantial low-income population (twice as high as the average for the city as a whole). Most of the women participating in the economic empowerment program were well educated (holding university degrees), had substantial work experience in their former countries, and spoke English. At baseline, a clear majority of participants rated their competence in various business-related skills and leadership as average to very high. Many of the women had left employment to care for their young children at home; but many more than expected either did not have children or had children who were grown adults (therefore, the need for child care was not a barrier to employment). One view of the entrepreneurial program is depicted below (see Figure 1) using the COM-B (Capabilities, Opportunities and Motivation, Behaviour) model, which “postulates that behaviour (B) occurs as the result of interaction between three necessary conditions, capabilities (C), opportunities (O), and motivation (M)” (Mayne, 2017):

Capability is defined as the individual’s psychological and physical capacity to engage in the activity concerned. It includes having the necessary knowledge and skills. Motivation is defined as all those brain processes that energize and direct behaviour, not just goals and conscious decision-making. It includes habitual processes, emotional responding, as well as analytical decision-making. Opportunity is defined as all the factors that lie outside the individual that make the behaviour possible or prompt it. (Michie, Van Stralen, & West, 2011, p. 4).

Mayne (2015) states that the research suggests that the causal package for behaviour change needs to include each of the components: knowledge, skills, aspirations, attitudes, and opportunities. One of the strengths of the COM-B model is its recognition that for a program to affect behaviour and benefit the participant, it requires an attention to multiple types of capacities, including paying attention to the individual’s capabilities, opportunities, and motivation.

006_53055_Sridharan_Nakaima4.indd 380 21-02-2019 11:07:22 AM Till Time (and Poor Planning) Do Us Part 381

If not for the COM-B model, the evaluation team likely would have focused on the delivery of the services and the results for participants in terms of changes in their knowledge, skills, attitudes, and behaviour; the focus on opportunities might have been missed. Many of the well-accepted theories in the health- promotion and behaviour-change fields—for example, the Health Belief model (Becker, Maiman, Kirscht, Haefner, & Drachman, 1977; Rosenstock, 1974), the Transtheoretical model (Prochaska, DiClemente, & Norcross, 1992; Prochaska & Velicer, 1997)—do not explicitly focus on the idea of “opportunities” (although the literature often identifies a lack of opportunities under the construct of “barriers to access”—for example, see Daly, Sindone, Thompson, Hancock, Change, & Davidson, 2002; Walker, Keane, & Burke, 2010). A key challenge often cited in the health-promotion and behaviour-change literature is long-term maintenance of behaviour change (Kwasnicka, Dombrowski, White, & Sniehotta, 2016; Middleton, Anton, & Perri, 2013)—long-term maintenance is a sustainability issue. There has, for example, been a lot of research done on cardiac rehabilita tion (Sridharan et al., 2008); the participants are highly motivated after the scare of death from a heart attack. The literature is rife with studies of exercise programs that show good results after 12-week programs; however, after these programs end, follow-up studies show that very few participants have main tained their practice of exercise (Daly et al., 2002). Some reasons provided have to do with opportunities—for example, there are no appropriate gyms, or safe walking paths, or convenient transportation near their homes. Some initiatives have been undertaken to address these barriers that limit the opportunities for individuals to exercise—for example, cardiac rehab program staff at hospitals train neighbourhood gym staff in how to safely and appropriately accommodate elderly clients who may have experienced a heart attack (Sridharan et al., 2008). What we learn from the cardiac rehab literature is that while individuals may be highly motivated and their capabilities enhanced (e.g., through exercise train ing), if there is a lack of opportunities, the sustainability of behaviour changes will be unlikely. Consequently, individual benefits and well-being will not be affected, despite successful accomplishment of all the previous steps along the theory of change. The COM-B representation is useful in thinking about the economic em powerment program described in this paper. In discussion with the program director we were able to highlight early on that participants would need op portunities both to showcase their products and services and to access markets outside of their neighbourhood (see Figure 1). Although the program was very successful in its reach efforts, successful in initially counseling participants to help them find their focus, direction, and setting of action steps, it failed in creating opportunities to reach the marketplace. This was partially because the program budget was too limited to hire a staff member with the expertise to facilitate access to markets, but it was also due to lack of sufficient planning for sustainability. The extent of the planning for sustainability consisted of the organization looking to apply for additional funding (with unsuccessful results)

006_53055_Sridharan_Nakaima4.indd 381 21-02-2019 11:07:22 AM 382 Sridharan and Nakaima Capacity Change—Capability, Opportunity, Motivation: Opportunity, Change—Capability, Capacity observe opportunities connected to similar Clients customers, try potential to from out their business idea, get feedback easier further have of a visible business; clients input for constructive development provides program businesses in operation; holistic supports meet their to receive women peer mentors; business experts, connected to loans; clients mentors, to access families and community; own participant enhanced women’s contacts individual needs; expanded and networks beyond entrepreneurial skills and leadership business and skills, financial literacy, life of system, knowledge competence, confidence, Services and Activities: customers, and connecting markets help in locating and potential to Individual coaching/counseling, services: training, to improve to clients including hands-on suggesting strategies microfinance help with filling in applications, to access facilitate and collectives coops forming and facilitating investigating history loans/grants, larger their credit and eligibility for partnerships loans; program for micro-loans develops/maintains and larger with financial institutions program Partnerships: community; recruits the immediate within and beyond markets and customers program to access or enlarges enhances to women immigrant refers participants; mentor and potentially program areas business experts advanced advise on more to needs other supports emerging address to Reach and Reaction: access to strategies develop reach; for target Map of the community; networks to composition families, women, identify and external and events programs internal build trust; advertise through and recruit build community, individuals, and reach adjustments/ devise program community and partner engaged in reach; community animators agencies/events; and attends depending on who is reached accommodations Wellbeing: mobility; upward social integration for potential of living; poverty increased standard alleviated; increased empowered; Women Direct Beneﬁts: products and services community; in the neighborhood and broader competitive available more relevant, more Improved, sales and income increased isolation; reduced Change: Behaviour participants start/register a business, Participants sell their products/services neighbourhood; within and outside the immediate skills, use their knowledge, and goods and services; entrepreneurs and competitive women relevant participants more produce other women and mentor inspire and to other workers hire leadership to The COM-B model for women’s economic empowerment empowerment economic women’s model for COM-B The

Timeline Timeline figure 1: figure

006_53055_Sridharan_Nakaima4.indd 382 21-02-2019 11:07:23 AM Till Time (and Poor Planning) Do Us Part 383

to keep the program staff employed (not to say that’s not important). Some ad hoc attempts were made to increase connections outside of the community—for example, when municipal representatives visited the community organization, the food catering entrepreneurs were invited to cater the event. However, such attempts at providing opportunities for the women were not executed with the same strategic vigor that was paid to reach and counseling activities. The plan was to sustain the one-to-one counseling and training workshops—i.e., more of the same—but little thought was given to facilitating opportunities so that participants could make progress in their business development beyond need ing the organization for more support. At the end of the program, clients were thankful for the support they received and had very good things to say about the program staff, and some reported that their knowledge and skills had increased, but no one had appreciably increased their sales, income, or standard of living through the entrepreneurial efforts (a few individuals found jobs during the course of the intervention and reported increased income). Some challenges of implementing the economic empowerment program Most of the key challenges in the implementation of the program related to a lack of planning for sustainability—by this we mean an absence of an explicit process of thinking about the dynamic nature of supports needed for clients through their impact journeys, despite the good intentions of the organization and the program staff and their commitment to improving the lives of clients. Key lessons for us from this case study included the following:

(a) Change is not a mechanical process: We think that the problem of enhancing capacities, whether it is individual capabilities, motivation, or opportunities, requires very careful thought about the level of support involved. It is not clear that a set of trainings would suffice to build an individual’s capability to develop a business. It might require more sustained efforts in which the set of workshops is supplemented with a number of other dynamic supports to build the client’s capabilities (including supports that might be needed after the end of the program funding), locate opportunities, and sustain the client’s motivation. Similarly, as any job seeker knows, developing linkages to employers or markets is a dynamic problem that requires a sustained strategy. In a similar sense, there is also a need to understand what it takes to change an individual’s motivation. Our key point is that a set of dynamic supports is needed to help facilitate the impact journey. Thinking about sustainability should occur not only at the organizational level but also at the client level: the theory of change is one instrument to promote such thinking. Further, there was little discussion within the program leadership regarding which of the mechanisms related to building capabilities and motivations and enhancing opportunities would directly benefit spe

006_53055_Sridharan_Nakaima4.indd 383 21-02-2019 11:07:23 AM 384 Sridharan and Nakaima

cific clients. There was a mechanical view that training could enhance the skill sets of developing a business plan without paying attention to the inspirational, motivational aspects needed to do the work of writ ing a business plan. The sequencing between the capability, opportu nity, and motivational components were not considered or deliberated on. Further, there was little discussion about how the offered supports aligned with the staff ’s strengths in terms of helping clients to access markets. For example, there was a sewing and design group component of the program in which the facilitator did not have deep knowledge of design businesses (for example, she did not know where the fashion industry was located in the city, or where to purchase wholesale fabric, and suggested that the client search online; and because some of the keenest clients were newcomers, they needed help in locating streets where wholesale outlets were in relation to their neighborhood and in formation on how to get there). Unfortunately, the evaluation abounds with such examples of simple client needs not being addressed. It was unclear how the opportunities for clients could be enhanced without such knowledge. Given that establishing a new business, making con tacts and developing relationships with customers, getting advertising out, product testing, getting a handle on the flow of production, setting up systems, and so on all takes a tremendous amount of time and might take one to two years or longer to establish, having support during the various uphill struggles for clients seems necessary if the organization is serious about clients establishing businesses—not just developing a business plan or developing samples, but actually getting the machin ery rolling and income coming in. There is a need to pay attention to the supports that clients would need over time in navigating the above complex processes. (b) Not paying attention to heterogeneities: There must be an active focus both on the heterogeneous needs of clients and the required capacities of staff to address such a diversity of needs. In the intervention, there was no formal systematic process adopted to understand and respond to the heterogeneity of person-centered needs, or to the contexts and supports required to address those needs. (c) The limited role of relationships: In our experience, one mechanism of sustaining the impact journey is building the relationships between clients and staff. Person-centered care and impact journeys require a focus on relationships. For example, credit for the successful reach effort can be given to both their reach strategy and the two community animators hired specifically for this task. Reach was ongoing, so the community animators were employed until nearly the end of the funded term. However, as they and the case counselor developed relationships with the participants who engaged more with the program, more thought should have been given to how the organization could retain these staff

006_53055_Sridharan_Nakaima4.indd 384 21-02-2019 11:07:23 AM Till Time (and Poor Planning) Do Us Part 385

members in some capacity. Instead, the community animators were let go with little notice when their contracts were ending. Such abrupt departures had consequences for clients’ engagement with the program over time. (d) Not paying sufficient attention to timelines: The timeline of the intervention was 21 months. The assumption was that this timeframe would suffice in enhancing an individual’s capacity to develop a business. When it did occur, discussion of timelines focused on the timeline of activities; there was very limited focus on timelines of impacts or of client trajectories. For example, there was very limited focus on what it would take to enhance capabilities (and consequently very limited discussion on the timelines of impact of capabilities). A discussion around the mechanisms by which project-level funds could serve as a catalyst to build longer-term relationships with the clients did not occur. As noted earlier, the assumption was that just by providing these services and activities there would be an almost mechanical movement toward the benefits. (e) Building and enhancing organizational capacities: It is also important to pay attention to organizational capacities themselves in order to implement such complex poverty-reduction initiatives. For example, although there was discussion around what staff backgrounds would be needed to increase motivation, build opportunities to markets, and enhance capabilities, the actual execution was only partially realized. The program staff knew that their team lacked expertise in accessing markets, yet they did not bring in an expert consultant. They made funding decisions in favor of more general workshops. The “opportunity costs” of such decisions meant that clients did not have the benefit of expert advice, coaching, or in-person introduction to markets outside of the immediate community.

thinking about sustainability There are multiple definitions of sustainability in the literature.Lennox, Maher, and Reed (2018) divide the multiple dimensions of sustainability into the follow ing five distinct classes of definitions: continued program activities, continued benefits, capacity building, further adaptation, and recovering costs. Johnson, Hays, Center, and Daley (2004) describe the following ten terms related to sustainability: confirmation, continuation, durability, incorporation, institutionalization, level of use, maintenance, routinization, stabilization, and sustained use. Rather than delving too deep into the multiple definitions in the literature for each dimension, our interest in sustainability is related to the concept of an impact journey. We think the important question to ask is this: What would it take to maintain and make the journey (in our example from not having employment toward starting a business and earning an income)?

006_53055_Sridharan_Nakaima4.indd 385 21-02-2019 11:07:24 AM 386 Sridharan and Nakaima

Our focus in this paper is informed by three different dimensions of sustainability aided by how the organization can continue to help clients with their impact journeys. We raise the following three questions related to sustainability:

• Sustainability as mainstreaming—What is being done to ensure that the interventions (in some form) have been sustained over time? • Planning for sustainability—What specific processes are adopted to plan for sustainability? How does planning for sustainability figure into the theory of change? • Sustainable impacts (and not just immediate impacts)—Will the intervention have sustainable impacts on clients after the funding has ended?

As described earlier, given the heterogeneous landscapes associated with such a journey, the question becomes: what would it take to sustain supports for a client to make the journey? From a mainstreaming perspective, what can be done to ensure that the supports from a project for which the funding is supposed to end are mainstreamed into the organization to provide post-funding supports? From a planning-for-sustainability perspective, how does the organization plan to mainstream the support structure that was developed to support a specific intervention? From a sustainable-impacts perspective, how does the organization ensure that the impacts are sustainable post-funding? The causation model that guides the view of sustainability is also important. It is useful to differentiate between a successionist and generative view of causation:

Sucessionists [sic] locate and identify vital causal agents as “variables” or “treatments.” Research seeks to observe the association between such variables by means of surveys or experimental trials. Explanation is a matter of distinguishing between associations that are real or direct.... Generativists, too, begin with measurable patterns and uniformities. It is assumed that these are brought about by the action of some underlying “mechanism.” Mechanisms are not variables or attributes and thus not always directly measurable. They are processes describing the human actions that have led to the uniformity. Because they depend on this choice making capacity of individuals and groups, the emergence of social uniformities is always highly conditional. Causal explanation is thus a matter of producing theories of the mechanisms that explain both the presence and absence of the “uniformity.” (Pawson, 2008, n.p.)

This distinction between the different models of causation is important because we need to move from a preoccupation with the variables that generate sustainability to the mechanisms that are informed by the “choice making capacity of individuals.” A view of mechanisms that pays attention to the resources and reasoning of individuals as they make the impact journey needs to inform our theoretical understandings of sustainability. The question from a generative perspective is this: What kinds of resources (supports, etc.) does a program

006_53055_Sridharan_Nakaima4.indd 386 21-02-2019 11:07:24 AM Till Time (and Poor Planning) Do Us Part 387

provide to help impact the “choice making capacity of individuals” to navigate the long-term impact journey? The key idea here is that sustaining an impact journey, given limited funding, is not an accidental phenomenon. It requires recognition of the following:

1) Planning for sustainability needs to be an explicit part of the consideration of the impact journey. 2) An intentional explicit plan for sustainability is needed. Such a plan needs to focus on mechanisms that can help sustain the impact journey. 3) There needs to be clarity on how the implemented program connects with the rest of the organizational system; the nature of such connections matters.

In other work, we have focused on planning for sustainability in comprehensive community-initiatives settings (Sridharan et al., 2007). We used eight items to analyze multiple community strategic plans to uncover their plans for sustainability: (i) a plan and timetable for ongoing data collection; (ii) a process to revisit goals in an ongoing manner; (iii) a clear organizational structure to oversee implementation of recommendations; (iv) clear communication mechanisms established between collaborating members; (v) an understanding of staff needs and anticipated turnover; (vi) clear identification of funding sources; (vii) clear establishment of accountability mechanisms; and (viii) proof of interagency collaboration established through a memorandum of understanding. In light of the concept of an impact journey, the thinking for planning for sustainability needs to be substantially different from the example presented above for comprehensive community initiatives. Key questions that need to guide the thinking in planning for sustainability to make the journey include the following:

• Is there an appreciation of the heterogeneous needs of clients? Are the planned activities able to address such needs? • Do the program and the organization have the capacities to address the heterogeneous needs of clients? • What resources are available from the program’s organization to help clients with their impact journeys? Is the funding enough to bring about change? If not, what is the role of the evaluator in helping bring realism to what impacts can be expected? • What plans are in place to provide supports to the clients after the funding for the program ends? Will partnering organizations be able to provide such supports after the program’s funding runs out?

In our view, such questions need to be answered as the theory of change is devel oped. A theory of impact that aims to explain the potential impact journeys of clients should be informed by answers to the above questions. There is a need to move from a theory of impacts to a theory of sustained impacts.

006_53055_Sridharan_Nakaima4.indd 387 21-02-2019 11:07:24 AM 388 Sridharan and Nakaima

disCussion In this paper we have described the importance of planning for sustainability us ing the concept of an impact journey of clients. Adopting the COM-B approach, and using a case study of economic empowerment of immigrant women, we have argued that the increases in capacities are rarely ever a mechanical process. We have argued that theories of change need to be better guided by the barriers and the metaphors of heterogeneous landscape and multiple hills that are involved in the journey toward achieving outcomes. The above comments are especially true when addressing difficult problems like health inequities, poverty, and maternal and neonatal mortality in resource-constrained contexts. Another focus we have found missing in descriptions of guidance on devel oping theories of change is that theories of change in most settings that we have encountered are highly incomplete (Sridharan & Nakaima, 2012; Sridharan et al., 2016). For example, we tend to have only partial knowledge of the timelines of im pacts and heterogeneities at the outset of an evaluation (Sridharan et al., 2016). We think that there needs to be an explicit attempt to understand both the heterogenei ties and timelines of impact in developing such theories of change. We have argued in prior papers (Sridharan & Nakaima, 2012; Sridharan et al., 2016) that given that initial theories of change usually are incomplete, we need to be more explicit about the initial uncertainties in the theory of change, as illustrated in Figure 2. This concern about incompleteness is of course related to uncertainties in our understanding of the theory of change at the outset of the intervention. One area where such uncertainties abound is in a plan for mapping how supports needed for heterogeneous impact journeys of clients can be sustained over time. There is a need to be far more explicit about our uncertainties about the theory of change, including knowledge of how best to plan for sustainability. Our view is that the evaluation has a role in making such uncertainties explicit. Figure 3 describes some of the uncertainties that evaluators need to be more explicit about. Another implication of our thinking is that the implied causal logic in models like the one presented in Figure 1 needs to more comprehensively address the

Initial Program Theory Initial Impacts

Emergent Program Theory Areas of Learning from Uncertainty Innovative Including Planning Methods about for Sustainability Uncertainties

figure 2: An approach to being explicit about uncertainties in developing a theory of change

006_53055_Sridharan_Nakaima4.indd 388 21-02-2019 11:07:24 AM Till Time (and Poor Planning) Do Us Part 389

Taking Stock of the Intervention Theory of Change Map the uncertainties in the theory of change Map the connections between the intervention and the rest of the organizational system Explore the heterogeneities of client-level needs Explore key barriers that individuals will face in the intended impact journey Explore if the program resources are consistent with the resources required to make the impact journey Explore the potential timelines of impact Explore the dynamic supports needed both during and post-funding stages to assist with capabilities, motivation, and opportunities Explore role of boundary partners in enhancing organization capacities and provide dynamic supports over time

figure 3: Areas of initial uncertainties that need to be made explicit in devel oping the theories of change

relationship between the causal impacts of the program and long-term outcomes as well as the specific mechanisms by which the program can assist clients to achieve such long-term outcomes. One important mechanism by which such outcomes can be achieved is to plan to sustain supports over time, including after the funding for the program has ended. It is possible that some of these supports need to be provided by organizations outside the program. Revising the theory of change: Incorporating sustainability How would incorporating a planning for sustainability lens alter the theory of change and the implemented program in the case study described above? In our view, there needed to be a more explicit focus on how the project-level funds could be leveraged to expand opportunities in the wider community and build relationships between program staff and clients, and retain key staff in the organi zation well beyond the life of the project. Key staff members were told with only one week’s notice that the project funding for their positions was ending. Other aspects that needed greater attention include the following, which build upon the recommendations in Figure 3:

a) a greater focus on mapping individuals and understanding needs; b) more deliberate processes (Sridharan et al., 2016) to better understand the heterogeneities of impact journeys; c) a better understanding of the dynamics of supports needed; d) a better understanding of the sequencing needed for the multiple interventions targeting capacity, motivation, and opportunities;

006_53055_Sridharan_Nakaima4.indd 389 21-02-2019 11:07:24 AM 390 Sridharan and Nakaima

e) a better upfront understanding of the resources needed to make the impact journey; f) a clearer role of partners: one important shortcoming in the causality implied in Figure 1 is that a single organization might not have the capacity to facilitate client access to markets on its own. The theory of change needs to explicitly recognize the importance of building partnerships to enhance opportunities for clients.

Figure 4 describes one illustrative approach to modifying the COM-B model to incorporate a planning for sustainability lens that is explicit about dynamic post- funding supports into the case study described in this paper. Rethinking what constitutes a useful theory of change If the arguments we have made in this paper are correct, then one of the conse quences is the need to re-think what constitutes a useful theory of change (Mayne, 2015). Our concern is that a number of theories of change that we encounter tend to focus on the activities/services of the program without enough representation of the client’s perspective/journey and without explicitly depicting how long- term outcomes are theorized to be affected. In Mayne’s (2015) paper on useful theories of change, he explains that “[t]heories of change represent how and why it is expected that an intervention will contribute to an intended result” (p. 127). Often the intended result is meant to be ongoing and remain in the long term (for example, healthy eating, or exercise-routine compliance, or income from a business); therefore, in evaluating interventions with such intended results, issues of sustainability have to be taken into account, and as the focus on learning and

Wellbeing

Direct Beneﬁts Dynamic Support Behaviour Change

Capacity Change Capability, Opportunity, Motivation

End of funding

Behaviour Change Dynamic Support Capacity Change

figure 4: Incorporating a dynamic support perspective into the theory of change: A conceptual illustrative model

006_53055_Sridharan_Nakaima4.indd 390 21-02-2019 11:07:25 AM Till Time (and Poor Planning) Do Us Part 391

accountability grows in Canada and internationally (see, e.g., UN Sustainable Development Goals [United Nations, n.d.]), accountability agreements will likely include achieving sustainable impacts. Mayne goes on to explain that “[t]he inter vention activities can then be said to be a contributory cause to the results. In these terms, a theory of change is a model of the intervention as a contributory cause; it is a model of the causal package showing just how the contribution to the results are [sic] to be brought about” (pp. 127–128). This is not inconsistent with what we argue in this paper for sustainability considerations to be included in developing and assessing theories of change. We suggest that a useful theory of change should:

(i) be grounded in the realism of the heterogeneous impact journey of clients; (ii) highlight the barriers, challenges, and heterogeneous landscape of the impact journeys; (iii) help programs plan for sustainable supports to help clients make the journeys, bring greater reflection on the organizational capacity to support such complex impact journeys, and, if such capacities are found wanting, go through a process of planning that can complement the organizational capacity by bringing in other boundary partners; (iv) help understand the sequencing between the different activities and build a better understanding of the resources needed to make the impact journey. While the mechanisms of capacity, capability, motivation and opportunity are useful, they imply multiple different causal processes. For example, it is important to be clear about the skill sets needed for lead staff and frontline staff who are charged with building capacities, enhancing motivations, and strengthening opportunities. Organizations need to be encouraged to sustain the capacities and recognize that the roles played by these actors are very complex and that proper incentives need to be given to keep such individuals around.

ConClusions To summarize, we suggest that taking a planning-for-sustainability lens will en courage a recognition of programs as complex systems thrust into other complex organizations; a sustainability lens will not simply argue for the causal impacts of a specific program but will also demonstrate how a program implemented within specific organizational contexts can serve as a catalyst for a variety of organizational-level interventions and inputs to provide supports for the clients through their impact journeys. A sustainability lens will also pay attention to realistic timelines of impact (Sridharan et al., 2006), rather than seek to establish impacts within a pre-determined bureaucratic frame. A sustainability lens will also be guided by the heterogeneity of needs and capabilities of clients. Different individuals will need different levels of support, and the potential timelines of

006_53055_Sridharan_Nakaima4.indd 391 21-02-2019 11:07:25 AM 392 Sridharan and Nakaima

impacts might depend both on individual-level capabilities and the capabilities of the organization to provide the needed supports. The ideas presented in this paper suggest that there also needs to be greater recognition of organizational-level capabilities to actually bring about sustainable change. Programs should not be seen as a mechanical set of activities that can automatically build individual-level capabilities. An organization needs to pay careful attention to the types of capabilities that can help clients make the journey from needs to outcomes.

referenCes American Geriatrics Society Expert Panel on Person-Centered Care. (2016). Person- centered care: A definition and essential elements. Journal of the American Geriatrics Society, 64(1), 15–18. https://doi.org/10.1111/jgs.13866 Becker, M. H., Maiman, L. A., Kirscht, J. P., Haefner, D. P., & Drachman, R. H. (1977). The health belief model and prediction of dietary compliance: A field experiment. Journal of Health and Social Behaviour, 18(4), 348–366. https://doi.org/10.2307/2955344 Cekan, J. (2016). How to foster sustainability. Global Policy, 7(2), 293–295. https://doi. org/10.1111/1758-5899.12284 Cekan, J. (2017, April 27). Sustained and emerging impacts. Presentation at the Evalua tion Centre for Complex Health Interventions, Li Ka Shing Knowledge Institute at St. Michael’s Hospital, Toronto, ON. Chen, H. T. (2015). Practical program evaluation: Theory-driven evaluation and the integrated evaluation perspective (2nd ed.). Thousand Oaks, CA: Sage. Cook, T. D. (2000). The false choice between theory-based evaluation and experimentation. New Directions in Evaluation, 87, 27–34. https://doi.org/10.1002/ev.1179 Daly, J., Sindone, A. P., Thompson, D. R., Hancock, K., Chang, E., & Davidson, P. (2002). Barriers to participation in and adherence to cardiac rehabilitation programs: A critical literature review. Progress in Cardiovascular Nursing, 17(1), 8–17. https://doi. org/10.1111/j.0889-7204.2002.00614.x Funnell, S., & Rogers, P. (2011). Purposeful program theory: Effective use of theories of change and logic models. San Francisco, CA: Jossey-Bass. Johnson, K., Hays, C., Center, H., & Daley, C. (2004). Building capacity and sustainable prevention innovations: A sustainability planning model. Evaluation and Program Planning, 27(2), 135–149. https://doi.org/10.1016/j.evalprogplan.2004.01.002 Kwasnicka, D., Dombrowski, S. U., White, M., & Sniehotta, F. (2016). Theoretical explanations for maintenance of behaviour change: A systematic review of behaviour theories. Health Psychology Review, 10(3), 277–296. https://doi.org/10.1080/17437199.2016.1151372 Lennox, L., Maher, L., & Reed, J. (2018). Navigating the sustainability landscape: A systematic review of sustainability approaches in healthcare. Implementation Science, 13(1), 1–17. https://doi.org/10.1186/s13012-017-0707-4 Leviton, L. C., & Hughes, E. F. X. (1981). Research on the utilization of evaluations: A review and synthesis. Evaluation Review, 5(4), 525–548. https://doi.org/10.1177/01 93841X8100500405

006_53055_Sridharan_Nakaima4.indd 392 21-02-2019 11:07:25 AM Till Time (and Poor Planning) Do Us Part 393

Mayne, J. (2015). Useful theory of change models. Canadian Journal of Program Evaluation, 30(2), 119–142. Retrieved from https://evaluationcanada.ca/system/files/cjpe entries/30-2-119_0.pdf Mayne, J. (2017). Theory of change analysis: Building robust theories of change.Canadian Journal of Program Evaluation, 32(2), 155–173. http://doi.org/10.3138/cjpe.31122 Michie, S. (2015). The Behaviour Change Wheel: A new method for characterising and designing behaviour change interventions. Retrieved from https://ktcanada.org/wp content/uploads/2016/03/Susan-Michie-slides_nov_12_2015.pdf Michie, S., Van Stralen, M. M., & West, R. (2011). The behaviour change wheel: A new method for characterising and designing behaviour change interventions. Implementation Science, 6(42), 11 pages. Retrieved from http://www.implementationscience.com/ content/pdf/1748-5908-6-42.pdf Middleton, K. R., Anton, S. D., & Perri, M. G. (2013). Long-term adherence to health behavior change. American Journal of Lifestyle Medicine, 7(6), 395–404. https://doi. org/10.1177/1559827613488867 Morra Imas, L. G., & Rist, R. C. (2009). The road to results: Designing and conducting effective development evaluations. Washington, DC: World Bank. Patton, M. Q. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: Sage. Pawson, R. (2006). Evidence-based policy: A realist perspective. London, England: Sage. Pawson, R. (2008). Causality for beginners. NCRM Research Methods Festival 2008. Retrieved from http://eprints.ncrm.ac.uk/245/ Pawson, R., Greenhalgh, T., Harvey, G., & Walshe, K. (2004). Realist synthesis: An introduction. Working Paper Series. UK: ESRC Research Methods Programme. Prochaska, J. O., DiClemente, C. C., & Norcross, J. C. (1992). In search of how people change: Applications to addictive behaviors. American Psychologist, 47(9), 1102–1114. http://doi.org/10.1037/0003-066X.47.9.1102 Prochaska, J. O., & Velicer, W. F. (1997). The transtheoretical model of health behavior change. American Journal of Health Promotion, 12(1), 38–48. http://doi. org/10.4278/0890-1171-12.1.38 Rosenstock, I. M. (1974). The health belief model and preventive health behavior.Health Education Monographs, 2(4), 354–386. https://doi.org/10.1177/109019817400200405 Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach (7th ed.). Thousand Oaks, CA: Sage. Scheirer, M. A., & Dearing, J. W. (2011). An agenda for research on the sustainability of public health programs. American Journal of Public Health, 101(11), 2059–2067. http://doi.org/10.2105/AJPH.2011.300193 Shediac-Rizkallah, M. C., & Bone, L. R. (1998). Planning for the sustainability of com munity-based health programs: Conceptual frameworks and future directions for research, practice and policy. Health Education Research, 13(1), 87–108. http://doi. org/10.1093/her/13.1.87 Sridharan, S., Campbell, B., & Zinzow, H. (2006). Developing a stakeholder-driven an ticipated timeline of impact for evaluation of social programs. American Journal of Evaluation, 27(2), 148–162. https://doi.org/10.1177/1098214006287990

006_53055_Sridharan_Nakaima4.indd 393 21-02-2019 11:07:25 AM 394 Sridharan and Nakaima

Sridharan, S., & Gillespie, D. (2004). Sustaining collaborative problem solving capacity. Criminology and Public Policy, 3(2), 601–631. Sridharan, S., Gnich, W., Moffat, V., Bolton, J., Harkins, C., Hume, M., . . . & Docherty, P. (2008). Evaluation of cardiac rehabilitation intervention: Have a Heart Paisley phase 2. Glasgow, Scotland: NHS Health Scotland. Sridharan, S., Go, S., Zinzow, H., Gray, A., & Gutierrez Barrett, M. (2007). Analysis of strategic plans to assess planning for sustainability of comprehensive community initiatives. Evaluation and Program Planning, 30(1), 105–113. https://doi.org/10.1016/j. evalprogplan.2006.10.006 Sridharan, S., Jones, B., Caudill, B., & Nakaima, A. (2016). Steps towards incorporating heterogeneities into program theory: A case study of a data-driven approach. Evaluation and Program Planning, 58, 88–97. https://doi.org/10.1016/j.evalprogplan.2016.05.002 Sridharan, S., & Nakaima, A. (2012). Towards an evidence base of theory-driven evaluations: Some questions for proponents of theory-driven evaluation. Evaluation, 18(3), 378–395. https://doi.org/10.1177/1356389012453289 United Nations. (n.d.). About the sustainable development goals. Retrieved from https:// www.un.org/sustainabledevelopment/sustainable-development-goals/ Walker, R. E., Keane, C. R., & Burke, J. G. (2010). Disparities and access to healthy food in the United States: A review of food deserts literature. Health & Place, 16(5), 876–884. https://doi.org/10.1016/j.healthplace.2010.04.013

author information Sanjeev Sridharan was the director of the Evaluation Centre for Complex Health Interventions at St. Michael’s Hospital and associate professor at the Institute of Health Policy, Management and Evaluation at the University of Toronto. He has recently taken a position as Country Lead, Systems Evaluation and Learning Systems, in the India Country Office of the Bill and Melinda Gates Foundation. April Nakaima is an evaluation specialist at the Evaluation Centre for Complex Health Interventions at St. Michael’s Hospital. She is a graduate of the University of California, Irvine and Santa Cruz, and the Kamehameha Schools.

006_53055_Sridharan_Nakaima4.indd 394 21-02-2019 11:07:25 AM Meta-Modeling Housing First: A Theory-Based Synthesis Approach

Sebastian Lemire and Christina A. Christie University of California, Los Angeles

Abstract: Research synthesis has become an increasingly popular approach for summarizing primary research. In the past two decades, interest in mixed methods reviews has steadily grown, followed, more recently, by an increased attention to theory-based syntheses. This article advances and illustrates a practical application of meta-modeling—a mixed methods, theory-based synthesis approach. The pro posed methodology combines meta-analytic and qualitative comparative techniques in developing a program theory—a meta-model—of how and why a program works. As the article illustrates, meta-modeling provides for a structured and transparent synthesis approach for building program theories across existing studies. Keywords: causation coding, Housing First, meta-analysis, meta-modeling, mixed methods, qualitative comparative analysis, theory-based synthesis

Résumé : Les synthèses sont de plus en plus populaires pour résumer les travaux de recherche. Au cours des vingt dernières années, on a observé un intérêt croissant pour les méthodes mixtes puis, plus récemment, pour les synthèses basées sur la théorie. Cet article décrit une application pratique de la méta-modélisation – une approche de synthèse basée sur la théorie qui repose sur l’utilisation de méthodes mixtes. La mé thodologie proposée combine des techniques de comparaison méta-analytiques et quali tatives visant à élaborer la théorie d’une intervention – un méta-modèle – expliquant le comment et le pourquoi du fonctionnement d’un programme. Comme le montre cet article, la méta-modélisation permet une synthèse des résultats d’études qui est à la fois structurée et transparente et qui permet l’élaboration de théories de programmes. Mots clés : causalité, Housing First, méta-analyse, méta-modélisation, méthodes mixtes, analyse qualitative comparativet, synthèse théorique

Summarizing existing research has a long and rich tradition in the social sciences. Over the past two decades, interest in mixed methods and theory-based reviews has steadily grown (Bronson & Davis, 2012). As a result, a burgeoning literature showcasing approaches and methods for conducting these kinds of syntheses has emerged (Saini & Shlonsky, 2012). Emphasizing the need to move beyond identi fying “what works” (the sine qua non of traditional systematic reviews), these new approaches share a commitment to synthesizing a broader range of evidence with Corresponding author: Sebastian Lemire, Social Research Methods, University of California, Los Angeles, Moore Hall 2005, Los Angeles, CA, 90095-1521, USA; [email protected]

007_52945_Lemire_Christie4.indd 395 21-02-2019 11:08:44 AM 396 Lemire and Christie

the aim of answering a broader range of questions, including “how” and “why” interventions work (Pawson, 2006). Despite the growing and sustained interest in mixed methods and theory-based synthesis approaches, published applications are still relatively scarce: only a few illustrative examples of mixed methods and theory-based reviews have been published in evaluation journals. Notable exam ples include two applications of realist syntheses (Pawson, 2002; van der Knaap, Leeuw, Bogaerts, & Nijssen, 2008) and an application of a meta-analysis combined with a narrative review (Scott-Little, Hamann, & Jurs, 2002). Motivated by the growing interest in explaining how and why programs work on the basis of existing studies, we introduce and present an application of meta-modeling—an operational approach for mixed methods, theory-based syntheses. Meta-modeling structures the integration of findings from different types of studies around the development of a “meta-model”—a visualization of the program components and mechanisms that generate a specific program outcome. Meta-modeling also relies on transparent and systematic procedures for integrating mixed evidence when developing and testing hypotheses about the extent to which and how these program components work (or fail to work). In this way, meta-modeling offers procedural guidance on how and in what way to extract, analyze, and integrate findings from different types of studies.1 The remainder of the article is structured as follows. We first situate the meta modeling approach within the broader landscape of mixed methods and theory- based syntheses approaches, paying particular attention to the EPPI-Centre and realist synthesis approaches. We then provide an outline of the six steps compris ing the meta-modeling approach. Advancing toward operational guidance, we then illustrate these six steps in a meta-modeling application on Housing First—a popular and widely implemented housing model for homeless individuals. We conclude with a discussion of the benefits, limitations, and further development of the meta-modeling approach.

The meTa-modeling approach: inTellecTual rooTs and procedural sTeps The intellectual roots of meta-modeling The meta-modeling approach emerges from the growing literature on mixed meth ods and theory-based synthesis approaches. A comprehensive presentation of the growing range of these approaches is beyond the scope of the present article (see Saini & Shlonsky, 2012, for a masterful review of these). For the present purposes, two distinct and commonly cited approaches to mixed methods reviews are worth considering in more detail: the EPPI-Centre approach and the realist synthesis approach—both of which provide the intellectual foundation for meta-modeling. The most well-developed and empirically tested approach for mixed meth ods synthesis is arguably the EPPI-review (Saini & Shlonsky, 2012). Promoted by Harden and Thomas (2005), and labeled according to their affiliation with the

007_52945_Lemire_Christie4.indd 396 21-02-2019 11:08:45 AM Meta-Modeling Housing First 397

Evidence for Policy and Practice Information and Coordinating Centre (EPPI- Centre) at the University of London, the approach is structured around the paral lel development of individual syntheses of qualitative and quantitative evidence, subsequently merged into a combined synthesis (Thomas et al., 2004). The latter combined synthesis takes the form of a thematic triangulation of quantitative and qualitative data. Following Thomas et al., this integration involves the jux taposition of findings in a matrix, that is, the matching of “barriers, facilitators, and implied recommendations against the actual interventions that had been implemented and evaluated” (p. 1011). As Thomas et al. note, the resultant matrix allows for a better understanding of the experiences of the target groups, which in turn “could lead to the development of more appropriate and effective interven tions” (p. 1012). As an extension of the EPPI-reviews, more recent applications of mixed methods reviews have emphasized the use of logic models as a way to integrate findings across different types of studies (Allmark, Baxter, Goyder, Guillaume, & Crofton-Martin, 2013; Anderson et al., 2011; Baxter, Blank, Woods, Payne, Rim- mer, & Goyder, 2014; Baxter, Killoran, Kelly, & Goyder, 2010). These applications utilize thematic coding and analysis techniques, often combined with matrices for structuring and summarizing findings, to develop and refine logic models across existing studies. As described by Baxter et al. (2014, p. 3),

In our approach, extracted data from the included papers across study designs are combined and treated as textual (qualitative) data. A process of charting, categorizing and thematic synthesis of the extracted quantitative intervention and qualitative data is used in order to identify individual elements of the model.

The resultant logic model is in some applications further verified and refined on the basis of feedback from relevant stakeholders (see Baxter et al., 2014, for an illustrative example). Another prevalent approach—and one that has gained significant traction in evaluation circles—is that of realist synthesis (Pawson & Boaz, 2004; Pawson, Greenhalgh, Harvey, & Walshe, 2005). Developed in response to traditional sys tematic reviews, the premise for Pawson’s (2006) realist synthesis is the emphasis on understanding how, for whom, and under what circumstances programs work. More specifically, the realist synthesis revolves around the development of context-mechanism-outcome configurations (CMOs)2 corresponding to the underlying logic of the program under study. In its practical application, the realist modus operandi is to develop an initial CMO configuration on the basis of a subset of findings, qualitative as well as quantitative, and then through itera tive rounds of inclusion and synthesis of additional findings, again qualitative as well as quantitative, to refine the initial CMO configuration of the program. The underlying idea is that this step-wise, reiterative synthesis of findings will serve to refute or confirm salient aspects of the CMO, resulting in an increasingly refined understanding of how the program works.

007_52945_Lemire_Christie4.indd 397 21-02-2019 11:08:45 AM 398 Lemire and Christie

The meta-modeling approach is both informed by and extends beyond the promising and inspiring approaches outlined above. In its purpose, the meta modeling approach shares much with the realist synthesis approach, among others, in its aim of better understanding how and why programs work (or fail to work). Extending its scope further, the meta-modeling approach also aims to address the extent to which programs generate a specified set of outcome by calcu lating standardized effect sizes as part of the synthesis (a meta-analytic technique typically associated with more traditional systematic reviews). The position we hold is that the latter provides salient information for a more complete under standing of the extent to which and how programs work. In its structure, the meta-modeling approach aligns with the EPPI-review in that it emphasizes separate syntheses of quantitative and qualitative evidence, before merging these into a fully integrated mixed-evidence synthesis (Harden & Thomas, 2005). However, in marked contrast with the EPPI-review approach, the qualitative synthesis is intentionally conducted prior to the quantitative synthesis in meta-modeling (see Table 1). As illustrated in the application presented later in this article, this sequential approach allows for hypothetical causal strands about how the programs work to be developed on the basis of qualitative findings, fol lowed by the subsequent testing of these on the basis of quantitative findings. This sequential approach also aligns with a common social scientific principle that the same data should not be used to both develop and test hypotheses.

Table 1: The six steps of meta-modeling

Step 1: Define the research question • Define research question in terms of Population, Intervention, Context and Outcome (PICO standard) Step 2: Search and retrieve relevant studies using explicit search parameters • Define search terms and inclusion/exclusion criteria • Conduct search for empirical papers by using multiple avenues Step 3: Conduct a relevance appraisal of the studies • Appraise each study abstract for its relevance to the research question Step 4: Qualitative synthesis (identify causal chains) • For each study, apply causation coding to identify causal chains • Summarize the causal chains in a causal chain matrix Step 5: Quantitative synthesis (compute effect sizes) • For each study, estimate relevant effect sizes • Summarize the effect sizes using meta-analytic techniques Step 6: Develop integrated meta-model • Apply QCA to identify causal recipe for the intervention • Develop meta-models

Adapted from Greenhalgh, Robert, Macfarlane, Bate, and Kyriakidou (2004)

007_52945_Lemire_Christie4.indd 398 21-02-2019 11:08:46 AM Meta-Modeling Housing First 399

Finally, the meta-modeling approach departs from the existing approaches in its emphasis on using structured analytical strategies for the extraction, analysis, and integration of findings from different types of studies. More specifically, and as illustrated in the case example below, meta-modeling applies causation coding, standardized effect-size calculations, and qualitative comparative analysis to en sure a more transparent and systematic synthesis. The end product is a systematic, transparent, and operational approach for mixed methods, theory-based synthe ses. Advancing toward operational guidance on the meta-modeling approach, we now turn to an application on Housing First programs.

meTa-modeling The inner workings of housing firsT In its procedural approach, meta-modeling consists of six steps: (1) defining the research question(s), (2) searching for and retrieving candidate studies, (3) con ducting a relevance appraisal, (4) synthesizing qualitative findings, (5) synthe sizing quantitative findings, and (6) developing an integrated synthesis using Qualitative Comparative Analysis (QCA). The steps are briefly outlined in Table 1 and illustrated in more detail in the case application on Housing First presented in what follows. Before advancing the case application, a brief description of the Housing First program is provided. The case: Housing First Housing First (HF) is a widely used approach to addressing homelessness. Cur rently, HF programs exist in major cities across Canada, the United States, and most European countries (Groton, 2013). The core idea of HF is to provide homeless individuals with immediate housing of their own choice. In support of sustained housing retention, supportive services (e.g., substance use treat ment) are made available but not required by HF programs (Tsemberis, 1999). The provision of immediate housing stands in marked contrast to the traditional housing programs that require homeless individuals to progress and graduate through different steps of treatment and/or sobriety before earning their access to permanent housing. In their implementation, HF programs are guided by five principles: (1) provide immediate, low-barrier access to independent, permanent housing, (2) provide comprehensive case management, (3) provide housing in building blocks with less than 15% of HF tenants, (4) emphasize client choice in regard to supportive services, and (5) support community involvement in the transition from homeless to housed (Tsemberis & Eisenberg, 2000). Underlying these five principles is a philosophy of promoting self-efficacy and independence among homeless individuals as a pathway to sustaining permanent housing. The HF model is considered by many researchers and practitioners to be “best practice” and is increasingly referred to as “evidence-based” (Pearson, Mont gomery, & Locke, 2009). In support of this coveted label, over the past 20 years a diverse body of research has examined the effectiveness of HF programs on

007_52945_Lemire_Christie4.indd 399 21-02-2019 11:08:46 AM 400 Lemire and Christie

a number of housing-related outcomes (Groton, 2013). While two systematic reviews have been conducted to determine the effectiveness of HF programs (Leff, Chow, Pepin, Conley, Allen, & Seaman, 2009; Nelson, Aubry, & Lafrance, 2007), a systematic mixed-methods synthesis of the program components and mechanisms by which such programs work has not been undertaken. The present mixed-methods theory-based synthesis of HF programs is the first of its kind. Meta-modeling Housing First: A worked example In the following, each of the six steps of the meta-modeling approach is illus trated. The motivation for the meta-modeling application on HF programs— while addressing a gap in the existing literature on HF programs—was primarily methodological: to develop and apply a systematic and transparent approach for mixed methods, theory-based synthesis. Because the novel aspects of meta modeling—at least in the context of building program theories—primarily pertain to the application of causal coding of qualitative findings and the use of qualitative comparative techniques in developing the meta-model, these steps are covered in more detail. Step 1: Define the research question. Defining the research question constitutes an important first step of meta-modeling. Informed by the existing literature on HF programs, the research question driving the present systematic review was two-fold:

1. To what extent do HF programs increase independent housing tenure among chronic homeless individuals, as compared with alternative con tinuum of care housing programs? 2. What are the critical ingredients in HF programs that drive increased housing tenure among chronically homeless individuals?

These questions not only concern whether HF programs promote housing tenure but also demand information about how HF programs promote housing tenure— two equally relevant types of information when trying to understand the extent to which and how programs work. These two questions also fall right between the purviews of traditional systematic reviews (which tend to focus on the first question) and existing mixed methods approaches (which tend to focus on the second question). Step 2: Search and retrieve studies. The second step in meta-modeling re volves around the search and retrieval of relevant primary studies—the empirical foundation for the subsequent analyses. In the present synthesis, the studies were identified through an electronic literature search using Scopus, PsycINFO, Web of Science, and Sociological Abstracts. The key word “housing first” was used for the search. No restriction was placed on the date or the location of the studies. In addition to the electronic search, manual searches of relevant studies in the most salient journals were carried out. These journals included the American Journal of Community Psychology, the American Journal of Public Health, the Journal of

007_52945_Lemire_Christie4.indd 400 21-02-2019 11:08:47 AM Meta-Modeling Housing First 401

Community Psychology, Psychiatric Services, and Research on Social Work Practice. This manual search was motivated by the expected time lag between journal publication (e.g., online first publication) and subsequent inclusion in the databases listed above. Finally, seven identified literature reviews on supported housing were examined for relevant studies. This manual citation search allowed us to reach a point of saturation—a moment where reference lists were no longer providing new, additional studies. The results of the search and retrieval are presented in Figure 1. A total of 346 unique titles and abstracts were initially identified and retrieved. The expansion of the search to include grey literature (i.e., studies not published in peer-reviewed journals) is compatible and even encouraged in the context of meta-modeling. However, for the present purposes of developing and applying a new methodological approach, focusing on published studies was deemed sufficient.

346 Titles and Abstracts Retrieved

Relevance 316 Excluded Screening

Yes

Quantitative Qualitative

16 Studies 14 Studies

Meta-Analysis Causation Coding

Effect Sizes QCA Causal Conditions

Figure 1: Flowchart of meta-modeling of Housing First

007_52945_Lemire_Christie4.indd 401 21-02-2019 11:08:48 AM 402 Lemire and Christie

Step 3: Relevance appraisal. The third step in meta-modeling is a relevance appraisal. In the present synthesis, primary studies for the quantitative synthesis were included if they (1) focused on an HF intervention as treatment, (2) involved a comparison group design, (3) included housing tenure as an outcome measure, and (4) contained sufficient information to compute standardized effect sizes. This includes any experimental study design, including randomized controlled trials as well as quasi-experimental designs (with non-equivalent comparison groups). A total of 16 studies matched these criteria. For the qualitative synthesis, studies were included if they (1) focused on Housing First intervention as treatment, and (2) provided qualitative data on the experience of individuals in HF programs (sufficient information to support cau sation coding). While these included a broad range of studies—qualitative as well as mixed methods—only studies that provided rich, detailed descriptions of how and why participants benefited from HF programs were included (e.g., verbatim descriptions or testimonials of the lived experiences of HF program participa tion). A total of 14 studies satisfied these criteria. Informed by these relevance criteria, a total of 316 studies were excluded from the synthesis. Collectively, all the excluded studies failed either to provide quantitative data to support effect-size computation or to provide qualitative data to support causation coding. Step 4: Qualitative synthesis. The first synthesis track in meta-modeling is the qualitative synthesis. The primary aim of this synthesis is to identify candidate “critical ingredients” driving the program’s desired outcomes (i.e., housing ten ure). Informed by Saldaña’s (2013) “causation coding,” causally relevant informa tion is identified using causal chain codes. These are codes capturing the causally relevant information in the primary studies, typically from sections of the articles describing how and why the HF program works. More specifically, the coding aims to map out causal chains (CODE1 → CODE2 → CODE3), corresponding to a causal catalyst, an outcome, and a mechanism linking the causal catalyst and outcome. Moderators (e.g., influencing factors) may also be coded and included. As Saldaña reminds us, these causal triplets are often made more complex by involving interactions between multiple causal catalysts, multiple mechanisms and moderators, and multiple outcomes. As such, the causal chains may include subsets of codes: (CODE1A + CODE1B → CODE2A + CODE2B → CODE3 → CODE4). An example might serve to illustrate these causal chains. In the present syn thesis, several studies describe how a service philosophy of client choice in relation to frequency and duration of supportive services, without any formal requirement of participation, results in a sense of empowerment among the homeless indi viduals, which in turn generates an incentive among them to actively pursue and participate in supportive services. This in turn supports sustained housing. To illustrate, the causal chain is composed of a process in which “client choice for supportive services” (causal catalyst #1), in combination with “no requirement for participation” (causal catalyst #2), leads to a “sense of empowerment” (mechanism

007_52945_Lemire_Christie4.indd 402 21-02-2019 11:08:48 AM Meta-Modeling Housing First 403

#1), which in turn results in the clients “actively pursuing supportive services” (mechanism #2) and “participating in supportive services” (mechanism #3), both of which promote sustained housing (outcome #1). By describing how core activities of the HF programs generate a sequence of attitudinal and behavioral changes, these causal chains shed light on how and why HF programs promote housing tenure. Echoing Saldaña (2013), one practical point about causation coding is that it is highly interpretive. This is in part because the causal chains are rarely sum marized in a neat three-part sequence from causal catalyst(s) to mechanism(s) to outcome(s). In our experience, and as correctly noted by Saldaña, the authors “may tell you the outcome first, followed by what came before or what led up to it, and sometimes explain the multiple causes and outcomes in a reverberative back-and-forth manner” (p. 164). As such, causation coding often involves a high degree of sensitivity to words such as “because,” “in effect,” “therefore,” and “since,” which might indicate an underlying causal logic (Saldaña; see also Lemire & Freer, 2015, for a discussion on this). Another equally important practical guideline is to resist the urge to code causal chains during the first read-through of the studies. Rather, it is advisable to read through all the sampled studies once before initiating the causation cod ing. In the first read-through, the purpose is simply to make note of the types of causal catalysts mentioned and the general language and terminology used by the authors and participants to define these core program components. On a similar note, we also found it useful to focus on the causal catalysts that cut across multiple studies, suggesting their broader salience and potential importance in explaining how HF programs bring about change. We thus identified four preva lent causal catalysts:

1. Housing choice and structure: The provision of immediate access to in dependent, scattered-site permanent housing with less than 15% other HF program participants in the building. 2. Supportive services: The provision of a broad range of supportive services, such as substance-abuse services, employment services, educational services, volunteer services, medical services, social integration, and so forth. 3. Harm reduction: The reliance on low-threshold admission, no sobriety/ treatment/medication requirements to access or maintain housing, as well as limited staff crossover between housing and supportive services. 4. Client choice: The emphasis on client choice of duration, frequency, and intensity of treatment, harm reduction, and no sobriety/treatment/ medication requirements.

Each of these causal catalysts represents a core component of HF programs and is thus identifiable as causally relevant across multiple primary studies. Collectively, then, these four identified causal catalysts serve as candidate core components of

007_52945_Lemire_Christie4.indd 403 21-02-2019 11:08:48 AM 404 Lemire and Christie

how and why HF programs promote sustained housing tenure among chronically homeless individuals. More than that, the causal catalysts provide the building blocks for a more in-depth understanding of how these core components connect with the outcome(s) of interest. As part of this analysis, we found it helpful to organize the identified causal chains for each catalyst in a causal chain matrix (Miles, Huberman, & Saldaña, 2014), providing an overview and facilitating the identification of patterns (see Table 2 below). The matrix summarizes the causal chains identified for each causal catalyst. The matrix specifies the “causal catalyst,” the “causal chain” for each catalyst, and a specification of any “influencing factors” inhibiting or enhanc ing the causal chain. A final column contains a verbatim description of how the mechanism functions, as described in the primary study. This anchoring of each causal chain with the language from the individual studies serves double duty: (1) It provides analytical depth to the causal chains, and (2) it provides a transpar ent chain-of-evidence that allows other researchers to examine the grounding for the final synthesis and conclusions drawn. This latter point is important for the purpose of methodological transparency. The testing of these causal catalysts will be the focal point of the final meta modeling synthesis in step 6. However, before advancing this integration, the quantitative synthesis is to be completed. Step 5: The quantitative synthesis. The aim of the quantitative synthesis is to examine the overall effectiveness of HF programs through a meta-analysis of the experimental and quasi-experimental studies identified. The effectiveness of HF is considered in terms of housing tenure. In the present review, a total of 16 com parison-group studies, covering the period from 2000 to 2016, were identified. Most of the HF studies used an experimental design with comparable treatment and control groups at baseline (11 studies). The remaining five studies were quasi- experimental studies, of which four used matching or other statistical techniques

Table 2: Causal chain matrix

Causal Causal chain Influencing Explanation catalyst factor(s)

Provision home + Permanent “The housing is there for a couple of of housing stability → housing years so . . . it lends a little stability at self-efficacy least to your life for a short period of time and enables you to get some things done.” Home → Permanent “A place to live and then from there I stability → housing can start doing my things, like getting recovery → Access to better and going out. Getting into a employment training routine. Finding a job, getting the training for something else.” Access to jobs

007_52945_Lemire_Christie4.indd 404 21-02-2019 11:08:48 AM Meta-Modeling Housing First 405

to adjust for baseline differences. One quasi-experiment did not use any statistical adjustment for baseline differences (Tsai, Mares, & Rosenheck, 2010). None of the studies reported large baseline differences between the comparison/control groups, showed high or uneven attrition rates, or indicated any other major im plementation issues that potentially could bias the effect-size estimates. Each of the studies was reviewed and information relevant for the estimation of standardized effect sizes (Cohen’s d) was retrieved, including study sample size, mean housing-tenure statistics for treatment and control/comparison, as well as corresponding standard deviations/standard errors. On the basis of the retrieved information, effect sizes were calculated for each study (Cohen’s d, the standardized mean difference statistic). Individual effect sizes were calculated using the Practi cal Meta-Analysis Effect Size Calculator (Lipsey & Wilson, 2001) and adjusted for small sample bias, using the Hedges g correction (Hedges & Olkin, 1985). Inverse variance weighting was used when calculating combined effect sizes across the primary studies, whereby each study is weighted by the precision of its respective effect-size estimate (Lipsey & Wilson, 2001). The estimated effect sizes are provided in Figure 2. As shown in the figure, the studies reveal consistently positive effect sizes, favoring the HF programs in comparison with continuum of care programs. More specifically, the combined effect size of 0.97 (95% CI: 0.72–1.22) indicates a markedly stronger effect on housing tenure among participants in HF programs, as compared with participants in continuum of care programs. The primary purpose of the effect size estimates is, in combination with the causal strands identified in the qualitative synthesis, to comprise the building blocks for the final meta-modeling synthesis.

figure 2: Effect-size estimates for Housing First (housing tenure)

007_52945_Lemire_Christie4.indd 405 21-02-2019 11:08:48 AM 406 Lemire and Christie

Step 6: Develop the meta-model. The final step in the synthesis is the integra tion of findings from the qualitative and quantitative syntheses. To integrate the findings from these two syntheses, Qualitative Comparative Analysis (QCA) was applied. Developed by Charles Ragin, QCA is perhaps best described as a set of comparative analytical techniques that aim to identify the sets of causal conditions that trigger a specific outcome (Ragin, 2014; Schneider & Wagemann, 2013). In the present synthesis, QCA allowed us to identify the configuration(s) of the four causal conditions that promote housing tenure. Informed by Rihoux and Ragin (2009), the QCA in the present synthesis involves six steps:

1. gather evidence on core program components and outcomes for each study in the review (extracted from primary studies); 2. develop a matrix with core program components and outcomes (calibra tion); 3. use QCA software to create a “truth table”; 4. minimize solutions; 5. resolve contradictory configurations; and 6. present final interpretation of solutions.

These are the standard steps in QCA, adapted slightly for the purpose of re search synthesis. In the first step, the 16 studies in the quantitative synthesis were recoded according to the causal chains identified in the qualitative syn thesis. Recall that these causal chains were identified as potential explanations of how and why HF programs promote housing tenure. Recall also that these causal chains involved four primary causal conditions: housing choice and structure, separation of services, service philosophy, and service array. To illus trate, the qualitative synthesis revealed that the provision of immediate access to scattered-site, independent housing is a salient catalyst for sustained housing tenure. As such, the extent to which each of the 16 HF studies involves immedi ate access to scattered-site, independent housing is relevant to code and test as part of the QCA. A note on this assessment and coding is called for. In traditional crisp-set QCA, cases (i.e., the 16 studies in the present synthesis) are coded on a binary scale, whereby a “zero” or a “one” denotes the presence or lack of presence of a given causal condition. However, this type of coding does not reflect the fact that the presence of causal conditions in relation to HF programs is often one of degrees. Accordingly, the extent to which each of the four causal conditions is present in the studies is more appropriately assessed according to four levels: 0 (no presence), .33 (low presence), .67 (high presence), and 1 (full presence). The results of the coding are provided in Table 3. Each row represents a pri mary study. The extent to which the four causal conditions are present is noted for each study. For instance, the HF program in the study by Tsai, Mares, and Rosen- heck (2010) is characterized by a full adherence to housing choice and structure (1) and a relatively low adherence to supportive services (.33), harm reduction

007_52945_Lemire_Christie4.indd 406 21-02-2019 11:08:48 AM Meta-Modeling Housing First 407

Table 3: QCA coding of studies

Study Housing Harm reduction Supportive Client choice Outcome services

TSE(2000) 1 .67 1 1 1 GUL(2003) 1 1 1 1 .67 TSE(2003) 1 1 1 1 1 TSE(2004) 1 1 1 1 .67 GRE(2005) 1 1 1 .67 1 SIE(2006) .67 .67 .67 .67 .67 STE(2007) 1 1 1 1 .67 TSA(2010) 1 .33 .33 .33 0 HAN(2011) 1 .33 .33 .33 1 APP(2012) 1 .33 1 .33 1 MON(2013) 1 .67 1 1 .67 PAL(2013) 1 .67 1 1 1 SOM(2015) 1 .67 1 1 1 STE(2015) 1 .67 1 1 .67 AUB(2016) 1 .67 1 1 .67 BRO(2016) .67 .67 .67 .67 1.

(.33), and client choice (.33). In addition to these codes of causal conditions, the effect-size estimates are recoded according to the four-level coding scheme. A couple of important points about this recoding are called for. First and foremost, the coding is intentionally qualitative in the sense that each code rep resents a qualitative judgment by the researcher. As such, the presence of relevant information (or lack thereof) curbs the confidence in these judgements. Another important point relates to the importance of not simply using the estimated effect sizes as the outcomes. As noted by Schneider and Wagemann (2013), the outcome scores in QCA should always emerge from a qualitative judgment. For instance, and as demonstrated in the present case, simply relying on the individual effect-size estimates would fail to account for the informa tion provided in the corresponding confidence intervals, which offers important information about the outcome variations for each HF program. As such, these ranges should be taken into consideration when defining the outcome scores for the purpose of QCA. In the second step of QCA, the values for each study on these codes is ar ranged in a truth table, displaying the logical configurations of causal conditions that elicit a positive outcome. In the present synthesis, FsQCA (a software devel oped by Ragin, Drass, & Davey, 2006) was applied to produce the truth table pre sented in Table 4. In this truth table, each row represents a specific configuration

007_52945_Lemire_Christie4.indd 407 21-02-2019 11:08:48 AM 408 Lemire and Christie

Table 4: Truth table

Housing Harm reduction Supportive Client choice Number Consistency services

1 1 1 1 13 (81%) 0.88 1 0 0 0 2 (93%) 0.67 1 0 1 0 1 (100%) 0.83 1 (94%) 0.79 1 (100%) 0.49

of causal conditions (i.e., “causal recipe”) that elicits a positive outcome. To illus trate, the first row represents the adherence to all of the four causal conditions—13 (81%) of the studies reflect this combination. Of these, 88% elicit a high outcome (indicated by the internal consistency score). In the subsequent step, the FsQCA software applies inferential logic (Boolean set algebra) to simplify the truth table into the causal recipes that are sufficient to produce a positive outcome. The results are summarized in Table 5. As the table shows, there appears to be two causal recipes:

1. ~Choice*Services*~Harm*Housing: Housing First programs with a strong fidelity to immediate housing and supportive services compo nents combined with low fidelity to client choice and harm reduction promote housing tenure; 2. Choice*Services*Harm*Housing: Housing First programs with high fi delity to all four program components: provision of immediate hous ing, supported serviced, harm reduction, and client choice (i.e., the full Housing First model).

In summary, then, these two causal recipes indicate the critical ingredients in HF programs that promote housing tenure among chronically homeless individuals. Moreover, the first QCA solution suggests that the provision of immediate access to independent, scattered-site permanent housing in combination with a broad range of supportive services are sufficient critical ingredients for positive housing tenure.

Table 5: Causal recipes for Housing First

coverage unique coverage consistency

~CHOICE*SERVICES*~HARM*HOUSING 0.13 0.03 0.83 CHOICE*SERVICES*HARM*HOUSING 0.76 0.66 0.88 Solution Coverage: 0.79 Solution consistency: 0.88

007_52945_Lemire_Christie4.indd 408 21-02-2019 11:08:48 AM Meta-Modeling Housing First 409

Harm Housing Reduction Tenure

Stability Perseverance

Housing Hope Self-efficacy Positive identity Supportive Services

Empowerment Recovery

Client choice

figure 3: Meta-model for Housing First

In extension of the above findings, these causal recipes are visualized in the development of a final meta-model, that is, a program theory of the causal recipe identified above. To illustrate, the meta-model for the second causal recipe is pre sented in Figure 3. The meta-model was structured around the critical ingredients identified above and substantiated by the causal chains identified in the qualitative synthesis (Step 4).

discussion The intent of this article is to present meta-modeling as an operational and prom ising approach for theory-based synthesis, an approach for developing program theories across existing studies. As illustrated in the preceding pages, the meta modeling approach relies on a structured and sequential synthesis process, in which critical program components are first identified within individual stud ies (as part of the qualitative synthesis) and subsequently verified (with effect sizes from the quantitative synthesis) as part of a final integrated synthesis. Meta-modeling furthermore relies upon established analytical approaches and techniques—causation coding, effect-size calculations, and qualitative compara tive analysis—to ensure a methodical and transparent synthesis. As also indicated in the preceding pages, the meta-modeling approach comes with both benefits and limitations. One benefit of the meta-modeling approach is that it not only allows for the identification of the most salient critical ingre dients in HF programs (other synthesis approaches do this also), but it further more pushes for a more transparent and systematic integration of qualitative and quantitative findings in identifying and testing these. The use of systematic and transparent procedures for the extraction, analysis, and integration of different types of findings provides for a more systematic and transparent synthesis. The causal chain coding, the summary tables, and the qualitative comparative tech niques collectively provide a visible chain of evidence which in turn allows for more verifiable and transparent findings. From a methodological perspective, this is an important benefit.

007_52945_Lemire_Christie4.indd 409 21-02-2019 11:08:49 AM 410 Lemire and Christie

Another central benefit of the meta-modeling approach is that it relies on a firm division of evidentiary labor, a principle corresponding with a core tenet of scientific investigation: “once data have been used to develop a theory they cannot be used to test it” (Wachter & Straf, 1990, p. xxv). Following this principle, the meta-modeling approach relies on one body of evidence to generate hypotheses (causal strands identified in studies as part of the qualitative synthesis) and an other body of evidence to test these hypotheses (effect sizes from studies in the quantitative synthesis). From our perspective, clearly demarcating the evidentiary roles optimizes the advantages of having different types of studies, providing dif ferent types of evidence, by having them serve different—yet complementary— purposes within the same integrated synthesis. No methodological approach is without its limitations. One practical and important limitation concerns the difficulty of distinguishing between the im plementation of the primary study and the reporting of the primary study. These two can be very different. Many potentially important aspects of studies are never reported. This lack of reporting on salient aspects of the program studied is par ticularly problematic in relation to the recoding of the studies in the quantitative synthesis as part of the final integrated qualitative comparative analysis. The best strategy to counter limited program information in the published studies is to seek out additional information from program websites, other publications on the program, authors of the studies, or even fieldwork on the program sites. However, all of these strategies can be time-consuming, even impossible within the time and resource constraints of a commissioned synthesis. Another limitation pertains to the limited real-world applications of meta modeling. At the time of this writing, the meta-modeling approach has been applied and refined in only three different contexts (one of which is described in the present article). To be sure, the approach is still in its infancy, a work in pro gress. To earn a place among the burgeoning array of mixed methods synthesis approaches, it has to be applied across a broad range of contexts and settings and must show comparative methodological and practical advantages in relation to other mixed methods, theory-based approaches. Our modest hope is that the present case application will serve to motivate and further advance the practical application and examination of meta-modeling across different programs, studies, settings, and contexts. Future applications and modifications of the approach are therefore highly encouraged and warmly welcomed.

noTes 1. Studies included in the synthesis can be obtained by contacting the first author. 2. While there is no definitive consensus on the definition of these terms, “mechanism” generally refers to the underlying social or psychological processes that generate one or more outcomes of interest. The latter typically refers to changes in attitude, knowledge, and/or behaviors. Context usually involves any contextual factors that enable or prevent or in any way influence the mechanism’s ability to generate the outcome(s).

007_52945_Lemire_Christie4.indd 410 21-02-2019 11:08:49 AM Meta-Modeling Housing First 411

references Allmark, P., Baxter, S., Goyder, E., Guillaume, L., & Crofton-Martin, G. (2013). Assess ing the health benefits of advice services: Using research evidence and logic model methods to explore complex pathways. Health & Social Care in the Community, 21(1), 59–68. http://doi.org/10.1111/j.1365-2524.2012.01087.x Anderson, L. M., Petticrew, M., Rehfuess, E., Armstong, R., Ueffing, E., Baker, P., Francis, D., & Tugwell, D. (2011). Using logic models to capture complexity in systematic reviews. Research Synthesis Methods, 2(1), 33–42. http://doi.org/10.1002/jrsm.32 Baxter, S. K., Blank, L., Woods, H. B., Payne, N., Rimmer, M., & Goyder, E. (2014). Using logic model methods in systematic review synthesis: Describing complex pathways in referral management interventions. BMC Medical Research Methodology, 14(62). http://doi.org/10.1186/1471-2288-14-62 Baxter, S. K., Killoran, A., Kelly, M. P., & Goyder, E. (2010). Synthesizing diverse evidence: The use of primary qualitative data analysis methods and logic models in public health reviews. Public Health, 124(2), 99–106. http://doi.org/10.1016/j.puhe.2010.01.002 Bronson, D. E., & Davis, T. S. (2012). Finding and evaluating evidence: Systematic reviews and evidence-based practice. New York, NY: Oxford University Press. Greenhalgh, T., Robert, G., Macfarlane, F., Bate, P., & Kyriakidou, O. (2004). Diffusion of innovations in service organizations: Systematic review and recommendations. Milbank Quarterly, 82(4), 581–629. http://doi.org/10.1111/j.0887-378X.2004.00325.x Groton, D. (2013). Are Housing First programs effective? A research note. Journal of So ciology & Social Welfare, 40(1), 51–63. Retrieved from https://scholarworks.wmich. edu/jssw/vol40/iss1/4 Harden, A., & Thomas, J. (2005). Methodological issues in combining diverse study types in systematic reviews. International Journal of Social Research Methodology, 8(3), 257–271. https://doi.org/10.1080/13645570500155078 Hedges, L., & Olkin, I. (1985). Statistical methods for meta-analysis. New York, NY: Aca demic Press. Leff, H. S., Chow, C. M., Pepin, R., Conley, J., Allen, I. E., & Seaman, C. A. (2009). Does one size fit all? What we can and can’t learn from a meta-analysis of housing mod els for persons with mental illness. Psychiatric Services, 60(4), 473–482. http://doi. org/10.1176/appi.ps.60.4.473 Lemire, S., & Freer, G. (2015). Inside the black box—modeling the inner workings of social development programs. In V. Jakupek & M. Kelly (Eds.), Assessing the impact of for eign aid: Value for money and aid for trade (pp. 149–168). Cambridge, MA: Elsevier Academic Press. Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage. Miles, M. B., Huberman, A. M., & Saldaña, J. (2014). Qualitative data analysis: A methods sourcebook (3rd ed.). Thousand Oaks, CA: Sage. Nelson, G. F., Aubry, T., & Lafrance, A. (2007). A review of the literature on the effectiveness of housing and support, assertive community treatment, and intensive case manage ment interventions for persons with mental illness who have been homeless. American Journal of Orthopsychiatry, 77(3), 350–361. http://doi.org/10.1037/0002-9432.77.3.350

007_52945_Lemire_Christie4.indd 411 21-02-2019 11:08:49 AM 412 Lemire and Christie

Pawson, R. (2002). Evidence-based policy: The promise of “realist synthesis.” Evaluation, 8(3), 340–358. https://doi.org/10.1177/135638902401462448 Pawson, R. (2006). Evidence-based policy: A realist perspective. London, England: Sage. Pawson, R., & Boaz, A. (2004). Evidence-based policy, theory-based synthesis, user-led re view. ESRC Research Methods Programme Project. Pawson, R., Greenhalgh, T., Harvey, G., & Walshe, K. (2005). Realist review—A new method of systematic review designed for complex policy interventions. Journal of Health Ser vices Research & Policy, 10(Suppl. 1), 21–34. http://doi.org/10.1258/1355819054308530 Pearson, C., Montgomery, A. E., & Locke, G. (2009). Housing stability among homeless individuals with serious mental illness participating in Housing First programs. Journal of Community Psychology, 37(3), 404–417. http://doi.org/10.1002/jcop.20303 Ragin, C. C. (2014). The comparative method: Moving beyond qualitative and quantitative strategies. Oakland, CA: University of California Press. Ragin, C. C., Drass, K. A., & Davey, S. (2006). Fuzzy-Set/Qualitative Comparative Analysis 2.0. Tucson, AZ: Department of Sociology, University of Arizona. Rihoux, B., & Ragin, C. C. (2009). Configurational comparative methods: Qualitative Com parative Analysis (QCA) and related techniques. Applied Social Research Methods Series, Vol. 51. Thousand Oaks, CA: Sage. Saini, M., & Shlonsky, A. (2012). Systematic synthesis of qualitative research. New York, NY: Oxford University Press. Saldaña, J. (2013). The coding manual for qualitative researchers(2nd ed.). London, Eng land: Sage. Schneider, C. Q., & Wagemann, C. (2013). Set-theoretic methods for the social sciences: A guide to qualitative comparative analysis. New York, NY: Cambridge University Press. Scott-Little, C., Hamann, M. S., & Jurs, S. G. (2002). Evaluations of after-school programs: A meta-evaluation of methodologies and narrative synthesis of findings. American Journal of Evaluation, 23(4), 387–419. https://doi.org/10.1177/109821400202300403 Thomas, J., Harden, A., Oakley, A., Oliver, S., Sutcliffe, K., Rees, R., Brunton, G., & Ka vanagh, J. (2004). Integrating qualitative research with trials in systematic reviews. British Medical Journal, 328, 1010–1012. https://doi.org/10.1136/bmj.328.7446.1010 Tsai, J., Mares, A. S., & Rosenheck, R. A. (2010). A multisite comparison of supported housing for chronically homeless adults: “Housing First” versus “Residential Treat ment First.” Psychological Services, 7(4), 219–232. http://doi.org/10.1037/a0020460 Tsemberis, S. (1999). From streets to homes: An innovative approach to supported housing for homeless adults with psychiatric disabilities. Journal of Community Psychology, 27(2), 225–241. https://doi.org/10.1002/(SICI)1520-6629(199903)27:2<225::AID JCOP9>3.0.CO;2-Y Tsemberis, S., & Eisenberg, R. F. (2000). Pathways to housing: Supported housing for street- dwelling homeless individuals with psychiatric disabilities. Psychiatric Services, 51(4), 487–493. http://doi.org/10.1176/appi.ps.51.4.487 van der Knaap, L. M., Leeuw, F. L., Bogaerts, S., & Nijssen, L. T. J. (2008). Combining Camp bell Standards and the realist evaluation approach: The best of both worlds? American Journal of Evaluation, 29(1), 48–57. https://doi.org/10.1177/1098214007313024

007_52945_Lemire_Christie4.indd 412 21-02-2019 11:08:49 AM Meta-Modeling Housing First 413

Wachter, K. W., & Straf, M. L. (1990). The future of meta-analysis. New York, NY: Russell Sage Foundation.

auThor informaTion Sebastian Lemire is a postdoctoral scholar in the Social Research Methodology Division in the Graduate School of Education and Information Studies, University of California, Los Angeles. Christina A Christie is professor and chair of the Department of Education in the Gradu ate School of Education and Information Studies, University of California, Los Angeles.

007_52945_Lemire_Christie4.indd 413 21-02-2019 11:08:49 AM How We Model Matters: A Manifesto for the Next Generation of Program Theorizing

Sebastian Lemire University of California, Los Angeles Jane Whynot University of Ottawa Steve Montague Performance Management Network

Abstract: In this concluding article, grounded on the exemplary contributions contained in the preceding pages, the guest editors scale the proverbial soapbox and present a manifesto to guide the pursuit and advancement of the next generation of program theorizing. Formulating ten declarations for program theory development and examination, the modest hope of the authors is to motivate and inspire refl ec tive evaluation practitioners to broaden their views, approaches, and techniques for future program theorizing. Keywords: contribution analysis, manifesto, program theorizing, program theory, realist evaluation, reflective practice, theory-based evaluation

Résumé : Pour conclure, en s’insiprant des contributions exceptionnelles des articles précédents, les éditeurs de ce numéro spécial se lancent et soumettent un manifeste pour orienter la poursuite et l’avancement de la prochaine génération de théories de programmes. En formulant 10 propositions sur l’élaboration et l’analyse des théories de programmes, les auteurs entretiennent l’espoir de motiver et d’inspirer les évalu atrices et évaluateurs pour qu’ils élargissent leurs horizons, leurs approches et leurs techniques pour la théorisation future des interventions. Mots clés : analyse de contribution, manifeste, théorie d’intervention, élaboration de théories d’intervention, évaluation réaliste, évaluation fondée sur la théorie

REFLECTIVE PRACTICE AND PROGRAM THEORIZING IN EVALUATION As stated in the Introduction, the overarching aim of this special issue is to pro mote reflective practice in program theorizing: to expand and strengthen both the conceptual and technical foundations of program theories in evaluation. Th is

Corresponding Author: Sebastian Lemire, Social Research Methodology Division, University of California, Los Angeles, Moore Hall 2005, Los Angeles, CA 90095-1521, USA; [email protected]

© 2019 Canadian Journal of Program Evaluation / La Revue canadienne d’évaluation de programme 33.3 (Special Issue / Numéro special), 414–433 doi: 10.3138/cjpe.53070 How We Model Matters 415 emphasis on reflective practice emerges in part from a broader push in evaluation circles toward evaluative thinking ( Vo & Archibald, 2018 ). More specifi cally, the notion of reflective practice is rooted in the recent work by Th omas Schwandt (2015) , who describes the reflective practitioner as someone who develops the knowledge and skills to design and implement methodologies in a variety of un familiar situations, has the capacity to comprehend local contexts, knows how to adapt methodological principles and practices accordingly, and has the ability to determine courses of action and move forward in a way that balances methodo logical, practical, and ethical considerations. As scholar-practitioners, we have over the past years collectively developed hundreds of program theories across a broad range of settings, geographical loca tions, and contexts, deploying a wide range of methods and approaches for devel oping and testing our program theories, experiencing both successes and failures. And as is true of most practitioners, we have learned through trial and error about what has worked (and what has not), allowing us over time to advance and refi ne our understanding and appreciation of real-world program theorizing. Our fi rst hand experience of how program theories unfold (and sometimes fold) in real- world settings has in fundamental ways shaped our scholarly and practical work. As aspiring reflective practitioners, we have also come to realize that the methodological quality (however we choose to define it) of our theory-based evaluations was never the product or property of a specific design, data-collection method, analytic approach, or even specific features of our program theories, although we falsely assumed this to be the case on more than one occasion. In stead, we have come to realize that the rigor more commonly resided or emerged from the analytical reasoning embedded in our program theorizing—the active interplay and integration of analytical competencies, technical know-how of methods and techniques, contextual and ethical awareness, and, perhaps of most importance, our integration of these in developing and verifying the program theories. As Van Melle, Gruppen, Holmboe, Flynn, Oandasan, and Frank (2017, p. 752) declare, “it is all about rigor in thinking.” With these observations as our backdrop, the position we hold is that the notion of refl ective practice provides a worthwhile framework for the proposed pursuit of next-generation program theorizing. And with the notion of refl ective practice as our guide, we now turn to our program theory manifesto.

A SOAPBOX MANIFESTO FOR THE NEXT GENERATION OF PROGRAM THEORIZING From our viewpoint, striking changes are called for in program theorizing for it to rise to its potential. For far too long, program theorizing in evaluation has been characterized and depressed by linear, overly simplistic program depic tions (Coryn, Noakes, Westine, & Schröter, 2011; Weiss, 1997). Th e all-too common mechanistic and perhaps ritualistic development of less than inspired program theories, followed by limited, if any, attention in subsequent program doi: 10.3138/cjpe.53070 CJPE 33.3, 414–433 © 2019 416 Lemire, Whynot, and Montague implementation, has resulted in piles of half-implemented or even unused pro gram theories. Admittedly, we too have added pebbles to this pile. As such, and as theory-driven soapbox orators, we seek to preach our practice (and practice our preach) toward widening the range of strategies for improving the development and potential use of program theories. Toward these ends, the following 10 declarations collectively comprise what we take to be a “Program Th eory Manifesto”:

1. Promote inclusion and representation 2. Strengthen the linkage between evaluation and decision making 3. Develop situated, contextualized, systems-oriented models 4. Focus on program archetypes 5. Explore different types of causal explanations 6. Flip, stack, and layer program theories 7. Put “theory” back into program theory 8. Strengthen the testability of program theories 9. Pursue theory-based synthesis and accumulation of knowledge 10. Do more with less, more quickly

Th ese declarations represent what we take to be useful avenues to pursue in future program theorizing. They are grounded in, and inspired by, the impor tant work of other scholars and practitioners, informed by our experiences as practicing evaluators, and shaped by our ongoing exchanges on the topic of theory-based evaluation. The declarations range in scope and focus; some promote specific methods or techniques; some advocate for more fundamental perspectives on what program theories are, or even what program theories could or should be; yet others simply call attention to ideas or practices in the work of other theory-based practitioners and scholars that we fi nd inspiring. Underlying this diversity is a shared commitment to Schwandt’s (2015 , p.33) idea of evaluation practice as “actions informed by situated judgements.” As Schwandt suggests,

[p]ractice decisions can be enlightened by conceptual or theoretical knowledge that serves as an aid in thinking through options in a situation that a practitioner faces. It is helpful to think of this kind of knowledge as a repertoire of principles, concepts, insights, and explanations that professional practitioners can use as heuristic tools “to think with.”

We wholeheartedly agree. And in line with the above, it deserves to be said that the declarations are just that: aids in thinking about future program theorizing. Accordingly, these declarations are not to be viewed as off -the-shelf solutions or recipes to the many pesky pitfalls of program theorizing, nor do the declara tions collectively represent an exhaustive list of worthwhile practices to pursue. Rather, the declarations—individually and collectively—represent what we take to be improvements and advancements that are both grounded, yet innovative in

DECLARATION 1: PROMOTE INCLUSION AND REPRESENTATION We envision program theory as something that is only as valuable as when it is shared. This requires both the broad and targeted engagement of a diverse set of stakeholders at various junctures and levels throughout the development, imple mentation, reporting, and use of program theory. Indeed, calls have repeatedly been made for participatory approaches to theory-based evaluation (see Hansen & Vedung, 2010, on theory-based stakeholder evaluation; and Koleros & Mayne, this issue, on actor-based theories of change). This engagement requires the adoption of a pluralistic framework that is suffi ciently fl exible to include conceptual, tech nical, and practical contributions from various stakeholders, including program beneficiaries as experts on their own experience, sources of expert input including academics from other sectors/disciplines, and the decision makers responsible for allocations of funding. Such engagement cuts to the core of refl ective practice. While we have learned a significant amount from those who have gone before us, the capacity for learning is unlimited; there is a potential contribution to be made by each stakeholder. Moving from stage to stage within a detailed theory of change from the level of inputs through to the ultimate outcomes without engaging others in the development, implementation, reporting, and use dimensions seems foolish. Opening both evaluators and the evaluation function to learning allows for the engagement of various diverse groups of stakeholders from a multitude of disciplines to collaboratively contribute to the articulation of how interventions are anticipated to work, and perhaps more importantly offer insight into why interventions do not work. This is not to say that all stakeholders need to be in volved at all stages or throughout all levels. Homework is required to determine which stakeholders can contribute most effectively at the various program-theory junctures. At various points in the process we should be asking the following questions: What would happen if a broader set of stakeholders were involved in the various stages and levels of program theory? Would we uncover key assump tions faster? Would we challenge our own assumptions more eff ectively? Would we waste fewer already scarce resources on implementation if the theory of im plementation was flawed and program beneficiaries could tell us this from day one? The recent coverage of Zimbardo’s debunked Stanford Prison Experiment (Reicher, Haslam, & Van Bavel, 2018), which was widely acclaimed in psychol ogy circles, demonstrates that the study’s impacts could have been much diff erent if only study researchers’ and participants’ views had been heard, recognized, and valued differently, much earlier. And the Stanford Prison Experiment is not unique in this regard. While it is useful to bring diverse viewpoints to bear on our theories, we should remind ourselves that the participation of diverse groups of stakeholders

doi: 10.3138/cjpe.53070 CJPE 33.3, 414–433 © 2019 418 Lemire, Whynot, and Montague does not automatically lead to the achievement of equitable outcomes. Equity concerns and eff orts need to be reflected in program theory in much more sys tematic ways to ensure transparency regarding inclusion/exclusion decisions, particularly in identifying relevant assumptions underlying and potential adverse consequences emerging from our evaluations. This involves providing space to ensure that multiple voices are represented. Rogers ( 2016 , p. 202) draws atten tion to the theory/equity deficit and outlines that “while theories of change are increasingly used in evaluation, they rarely address equity issues adequately.” With respect to Bledsoe’s (2005) contribution on using program theory in underserved communities, and more recent eff orts (Bledsoe & Donaldson, 2015; Donaldson & Picciotto, 2016) that actively promote the role of program theory in equity discussions, encouragement is offered on the need for evaluators to step forward to explore what culturally responsive evaluation means in all facets and types of evaluation, ranging from program to policy. Evaluators are further encouraged to think about how all evaluation approaches are imbued with culture ( Bledsoe & Donaldson, p. 23). We can be more aware. While specific attention is paid to the engagement of stakeholders in devel oping impact theory, the implementation dimension of program theory is just as important as the alignment with systemic barriers to access and participation, as noted by Greene (2016) . The relationship among program theory as an approach, participation, and equitable outcomes and valuation needs to be explored in greater detail. The intentional inclusion of elements affecting equitable outcomes, including gender (Whynot et al., this issue), needs to be explicitly incorporated into program theory. Incorporating reach into program theory is an early step in the right direction ( Mayne, 2015 ; Montague & Porteous 2013 ). Conceptually, equity may be embedded in program theories as key components and expressed through the articulation of assumptions, but its representation is muted when it requires inferences to be made by readers. More can be done to situate equity issues explicitly in program theory. Promoting advancement in related areas requires building the capacity to learn more effectively from many in a program-theory- specifi c context; this also requires a mindset that supports hearing from others who have not perhaps been part of these discussions to date. Certain epistemological orientations and subsequent assignations of evaluator roles align themselves with greater facility. Regardless of paradigm, we advocate for a recognition of the dynamics related to power and influence in theorizing. These variables sway the value assigned to who is speaking and who is being heard. Program theory has jointly been held historically in two different sets of hands: those of the intervention and those of the evaluators. Enacting change to incorporate the collaboration of a broader set of stakeholders is not an easy undertaking, particularly working in silo bureau cracies that are heavily dependent upon processes and protocols to engage with other functions. The message regarding the value of engaging a diverse set of stakeholders needs to be clearly communicated by organizational leadership, and for that to happen there need to be related accountability mechanisms in place,

DECLARATION 2: STRENGTHEN THE LINKAGE BETWEEN EVALUATION AND DECISION MAKING We aim to strengthen the linkage between evaluation and decision making through program theorizing. The evaluation function in policy environments has historically failed to maximize its potential in contributing to decision mak ing (Leeuw, 1991). This is not a new challenge, but we are positing that program theory could be a potential solution. This challenge emerges in Weiss’s (1998 ) critical work, where she notes that evaluation is one of many inputs in the gov ernance of public funds, summarizing that evaluation and research are used in a manner that could be characterized as indirect at best. Dobell and Zussman’s (1981) articulation of specific challenges encountered by the evaluation func tion include its misunderstanding by senior managers and program staff , tacit resistance by program managers, and the failure to consider the true nature and information needs of the user. These challenges, in addition to existing evalua tion policy requirements, the program level focus of evaluations, and the public nature of evaluation reports (Bourgeois & Whynot, 2018), were additionally identifi ed as fi ndings that impeded strategic evaluation utilization by decision makers. The many disconnects between evaluation and decision making are relevant for discussion, as we suggest that program theory as envisioned in this issue could potentially provide one means by which to bridge the divide to evaluative thinking. How we identify our evaluand and how it interacts within its existing systems is a lost opportunity. Internationally, work on the Sustainable Development Goals (SDGs) reminds us of this. For example, gender equality not only is a standalone goal but is also woven amongst multiple other SDGs. These goals serve to remind us that policy makers are not dealing with simple problems but rather complex ones, and with what has been increasingly referred to as the “wicked policy prob lems” (Head, 2008; McGrail, 2014). With cross-cutting issues facing decision makers worldwide in areas of concern such as climate change, these policy prob lems will not simplify themselves, nor will they disappear of their own volition. Complexity and wickedness necessarily involve a broader view of the landscape to address non-linearity and its potential paths, the inputs and infl uences of multiple actors, their associated technical and political considerations, and most importantly the “small shifts that may produce large differences in the outcomes of the systemic dynamics” (Peters, 2017, p. 386). With more sophisticated tax onomies evolving related to complexity and wickedness (Alford & Head, 2017), we are suggesting that more sophisticated understandings of program theory can contribute to fostering evaluative thinking amongst decision makers. Ultimately, it is people who will reap the benefi ts or bear the burden of policy and decision makers getting theory “right.” doi: 10.3138/cjpe.53070 CJPE 33.3, 414–433 © 2019 420 Lemire, Whynot, and Montague

DECLARATION 3: DEVELOP SITUATED, CONTEXTUALIZED, SYSTEMS-ORIENTED MODELS Experience suggests that logic models that set out as stark box-and-wire dia grams with limited attention to narratives, explanations, and contexts can do more harm than good (Freer & Lemire, this issue). Without elaboration, these depictions become no more than desires or slogans in boxes subject to political “force fi tting.” Schwandt (2018, p. 131) importantly reminds us that as evaluators we can never “investigate something like a policy, program, practice or strategy in its totality—that is, in terms of all of its interconnections and relationships, and from all diff erent perspectives and viewpoints.” Rather, what we can do, and what we should always strive for, is to be transparent about how we select which aspects of our evaluand are highlighted and who is engaged in making these decisions. Building on this proposition, we would emphasize three elements as key to model improvement:

1. theory situating; 2. contextualizing; and 3. systems orientation.

Depictions of theory at the program, policy, or broader initiative level must be sit uated in terms of the level and focus for analysis and juxtaposed with other levels of theory. Theories can, and should, be contemplated at micro-, meso-, and meta levels. Much of the discussion in this volume has been at a meso-level, although actor-based theories as well as theories that situate and position girls’ and women’s experiences in STEM may touch on micro-theories to some extent. Lemire and Christie in this issue directly illustrate an approach to meta-modeling, potentially theorizing across multiple local program theories. The point is that diff erent levels of theory can work to support or detract from each other. For example, evaluators need to be cognizant of whether, and to what extent, a micro-theory about how to reduce individual drug-addiction harm is congruent with broader program theo ries of deterrence and remediation for drug addiction and, in turn, how these are consistent with meta-theories (e.g., can harm-reduction strategies and theories work within a broader strategy that amounts to a “war” on drugs?). Following the situating of theories, contextualizing is crucial. Th e notion of singular best practice should be eliminated. All practices need to be judged in context. The realist mantra of “what works (to what extent) for whom under what conditions and why?” can be a powerful means to recognize the inherent complexity in most human-based systems ( Pawson, 2006 ). It also ensures the con sideration of context as part of the performance story of any initiative. It should also encourage a fundamental focus on learning and improvement in addition to accountability and a normative assessment of value. Whether identifi ed as “fac tors,” “assumptions,” “pre-conditions,” or “risks,” good program theory should include contextual considerations.

Ultimately, a systems orientation needs to be fostered while avoiding hyper-complexity in description. Several articles in this special issue have, in addition to advocating for the adoption of a systems view, noted the value in recognizing key system elements such as the change theory and the action or implementation theory and strategy (e.g., Montague, this issue). The match or mismatch of these elements can explain a lot of observed performance. An other key element in systems is that they have actors. Koleros and Mayne (this issue) note the benefits of actor-based theories in the conduct of contribution analysis in complex settings. One could argue that such actor-based modeling can also help explain another key element of systems—the relationship and feedback loops that create virtuous and vicious circles, performance distor tions, as well as emergent outcomes and “bends” in implementation (Pawson, 2006). Finally, it deserves mention that another useful strategy may involve leaving room for ambiguity in certain aspects of the program theory (Dahler-Larsen, 2018 ). As Dahler-Larsen succinctly argues, there are concrete steps that evaluators can take to include and use specific types of ambiguity in their development and examination of program theories, including the use of what Dahler-Larsen refers to as “Janus Variables”—variables that work in two ways (p. 6). Th e implications or impact of not getting program theory right may not be felt in the short term at the individual program level, but it will undoubtedly raise its implication head in years to come at broader policy, country, or global levels. The achievement of results is a shared responsibility that necessitates moving beyond the stakeholders that have been historically consulted in the course of evaluation activities.

DECLARATION 4: FOCUS ON PROGRAM ARCHETYPES Evaluation has for the most part been fi xated on programs. How to think about and best portray programs is no trivial task. The problem is that program evalu ation as commonly practiced does not encourage a systematic approach to ac cumulating knowledge on programs. Evaluations are all too often narrow and “local” in scope. Th ey often miss obvious similarities in their impact pathways with other programs simply because the subject matter and clientele may be dif ferent. Moreover, program theories and logic models have often been developed from “scratch.” Realist evaluators like Pawson have noted that

• all programs are associated with some kind of theory or theories (they are theories incarnate); and • there are as many unique programs as there are programs; however, there are only a limited number of program theories; therefore, • one can accumulate a good deal of diversified knowledge about the circumstances and success of program theories which can be of value to programs.

Program theories in turn can be classified by the nature of the program as policy instruments. As an example, take the Bemelmans-Videc, Rist, and Vedung (1998 ) definitions of carrots, sticks, and sermons representing incentive, deter rent, and information programs. Each of these instruments has distinct results and delivery logic, characteristics, and recognized circumstances associated with success and failure. Funnell & Rogers (2011 ) propose that archetypal models can be built for these policy instruments, whereby refined categories can be established. For example, in the deterrence (stick) archetype, there might be subcategories such as summary conviction notices of variance-type actions, with little or no penalty. The relative success of such actions has been found to depend heavily on other contextual factors, such as the potential for shame (“reputational loss” is often a term used) with public exposure. It has also been associated with the understanding that more significant penalties can and will be used—thereby forming a deterrence period ( Ayres & Braithwaite, 1992 ). Other types of deter rents may more easily stand on their own. Over time, one can begin to glean from study, work, and experience the factors that recur in association with certain types of outcomes for a given mechanism. One can also consider the implementation and design characteristics that best fit a given policy instrument and its success characteristics. Th e notion of archetypal program theories applies to both policy instruments and to action/implementation, including governance and delivery dimensions. Is the governance and/or delivery through a multi-government and multi-agency agreement? Unilateral delivery? Delivery with privately contracted elements? Delivery with voluntary or semi-voluntary third-party elements? Is the delivery mode hierarchical, matrix, or network-based? These are but a few key questions and factors that are important for understanding the complex effects that action theories and change theories can have on each other and as such demand our at tention. While the distinction of archetypes and correlational success factors may experience a few false starts, this orientation may just produce some profound insights.

DECLARATION 5: EXPLORE DIFFERENT TYPES OF CAUSAL EXPLANATIONS A central aim of theory-based evaluation is to explain how (or why) programs work (or fail to work). Despite this central role of causal explanation, there is limited attention to the different kinds of causal explanation potentially pursued in the context of theory-based evaluation. For instance, an important distinction could and should be made between how or why a program brings about a desired outcome. Explaining how a program works entails determining and describing the “active ingredients” that either individually or collectively elicit the desired outcome. These active ingredients are typically in the form of program activities or outputs that in diff erent confi gurations elicit a specifi c outcome. By identify ing and describing these critical ingredients, or (for lack of a better term) causal

© 2019 CJPE 33.3, 414–433 doi: 10.3138/cjpe.53070 How We Model Matters 423 recipes, theory-based evaluation offers important insights into how the program works. This is a configurational explanation for the outcomes. In marked distinction, explaining why a program works entails determin ing and describing the program mechanism(s), that is, the underlying processes generating a specific outcome. This entails making explicit the individual or social psychological processes triggered by the program. By specifying these underly ing mechanisms, theory-based evaluation offers important insights into why the program works, why the program makes a difference for the participants. Th is is a mechanism-based explanation. Th e position we hold is that these are fundamentally different types of expla nations, requiring different types of data and analysis, resulting in diff erent types of information, and potentially supporting different kinds of decision making. Accordingly, the type of explanation pursued should emerge from the type of informational need motivating the evaluation. As just one illustration, if the aim of the theory-based evaluation is to support future program design, a confi gu rational explanation focusing on how individual program components (or con figurations of these) lead to a desired outcome might be advantageous. Th is type of information will support the identification of specific program components (critical ingredients) that all things considered should be included in future pro grams. Conversely, if the aim of the theory-based evaluation is to understand the underlying reason(s) why program participants change their behavior (or fail to change) in response to the program, a mechanism-based explanation is perhaps more appropriate. Of course, we may also pursue understanding both how and why the program works (see Lemire & Christie, this issue). Th ere are of course several other types of explanations to be pursued in theory-based evaluation. To be sure, many situations might call for a combination of explanations, especially in layered program theories (see Koleros & Mayne, this issue). The point to be made here is that evaluators should be reflective of the types of causal explanations pursued and the type of information provided, and the extent to which these match the information needs motivating the evaluation. What makes something an explanation in theory-based evaluation? What makes one explanation better than another? What is the explanatory logic of these diff er ent types of explanation? And wherein lies the explanatory strength? These are but some of the fundamental questions that must be pursued if we are to better under stand and further advance the explanatory potential of theory-based evaluation.

DECLARATION 6: FLIP, STACK, AND LAYER PROGRAM THEORIES How we model matters. Simply consider the marked difference between a pro gram modeled according to a simple logic model, an embedded program theory, or a Context-Mechanism-Outcome (CMO) configuration. Each of these types of models depicts a markedly different representation and, in eff ect, understanding

Table 1: Common types of contributory relationships in program theories

Model Type Explanation Description

Simple succes A causal model depicting a contributory A leads to C causal model sionist relationship between a program activity and one or more program outcomes Simple confi gura A causal model depicting a contributory A plus B leads causal pack tional relationship between two or more pro to C age model gram activities and a program outcome Simple generative A causal model depicting a contributory A leads to C causal mech relationship between a program activity because of D anism model and one or more outcomes, specify ing the underlying mechanisms of the relationship Complex confi g A causal model depicting a contribu A plus B leads causal model urational/ tory relationship two or more program to C because generative activities and one or more outcomes, of D specifying the underlying mechanisms of the relationship Embedded contextual A causal model depicting a contributory A plus B leads causal pack confi gura relationship between two or more pro to C, under age model tional gram activities and a program outcome, condition E specifying the contextual conditions under which the relationship holds Embed contextual A causal model depicting a contributory A leads to C ded causal generative relationship between a program activity because of D, mechanism and one or more outcomes, specify under condi model ing the underlying mechanisms of the tion E relationship as well as the contextual conditions under which the relationship holds Embedded contextual A causal model depicting a contribu A plus B leads complex confi g tory relationship between two or more to C because causal model urational/ program activities and one or more of D, under generative outcomes, specifying the underlying condition E. mechanisms of the relationship as well as the contextual conditions under which the relationship holds of the program being evaluated. Underlying all the creative labels for program theories, the position we hold is that practical applications of program theories typ ically involve seven fundamental (arche-)types of models, as described in Table 1. Th e point to be made here is not that one type of model is inherently bet ter than another. What the table illustrates is that there is a broad array of types of relationships to be modeled as part of program theorizing—many diff erent

© 2019 CJPE 33.3, 414–433 doi: 10.3138/cjpe.53070 How We Model Matters 425 opportunities for modeling and understanding how and why programs work (or fail to work). Moreover, most program theories might advantageously use a variation of the above model types, using one type of model for some parts of the program theory and other types of models for other parts of the program theory, depending on the degree of specification called for (this type of diff erentiation is illustrated by Koleros & Mayne, this issue). In addition to these types of models, there is also a wide range of modelling techniques available to evaluators, including simple box-and-arrow diagrams, causal loop diagrams (Williams & Hummelbrunner, 2011), and interconnected system of stock-and-flow diagrams ( Williams & Hummelbrunner ), or some fan ciful combination of these. Each of these modelling techniques would depict the underlying program logic in markedly different ways, promoting diff erent pro gram understandings and, in eff ect, result in dramatically diff erent insights and conclusions about how and in what way the program works (or fails to work). Th e different models off er different lenses through which we see the program. Accordingly, we need reflected conceptual and practical pluralism in how we model programs, whereby the use of different modelling techniques emerges from active refl ection and decision making in regard to the specifi c purpose and role of our program theories. To illustrate, we may, as Koleros and Mayne propose in this issue, use multiple layers in our models, using different types of modelling for each layer. One theory is then nested (Mayne, 2015) in another theory of change within the same model. We may also use hybrid models by combining modelling techniques within the same theory of change (without layers). Yet other innova tive modelling techniques include “stacked” and even “3D” logic models (Grim, Castillo, & O’Quinn, 2018 ). Th e complexity in and surrounding social programs necessarily demands a broader range of modelling techniques. By broadening the range of models we employ in our program theorizing, we actively more away from the all-too common linear, one-way, overly simplified box-and-wire diagrams. We also move closer to the real-world complexity of social programs that are oft en characterized by feedback loops (balancing and reinforcing loops), reverse causality, causal interactions, asymmetrical causal relationships, contextual contingencies, and so on. The simple aim here is to reinvigorate our toolbox and practice by consider ing a broader range of modelling techniques when developing program theories, recognizing the relative strengths and weaknesses of each technique. We call for reflective practice in our modeling.

DECLARATION 7: PUT “THEORY” BACK INTO PROGRAM THEORY We need to put the “theory” back into program theory. The potential role and value of social science theory is a persistent topic in theory-based evaluation discussions. Yet, despite sustained interest, both the extent to which and how evaluators use social science theory in their program theorizing is limited. Th e

doi: 10.3138/cjpe.53070 CJPE 33.3, 414–433 © 2019 426 Lemire, Whynot, and Montague selection of which social science theories also matters, as seen in the case of equity considerations (see Whynot et al., this issue). The position we hold is that social science theory, especially behavioral change theory, potentially serves well to inform and enhance our program theorizing. Inspired by Riemer and Bickman (2011) , we strongly recommend a stronger push for “theory knitting,” that is, using “one or more social science theories as the conceptual grounding for both the de sign of the program and the program theory” ( Lemire, Christie, & Nielsen, 2019 ). Th e idea of theory knitting is nothing new in the social sciences and can be traced back to work on theory development in psychology by Kalmar and Sternberg (1988) . In the context of evaluation, Leeuw and Donaldson (2015) have further promoted this linking of domain theories, theories on policy-making processes, and behavioral theories, offering a compelling example of theory knit ting in relation to producing public goods. In a forthcoming publication, Lemire et al. (2019 ) review and showcase published examples of different types of theory knitting in theory-based evaluations. Indeed, the case for theory knitting is easily made. As noted by Mark, Donaldson, and Campbell (2011 ), the active and explicit use of social science theory promotes the specificity of key program-theory components, including purported mechanisms whereby the testability of the program theories is en hanced. On a similar note, Vaessen and Leeuw (2009) compellingly argue that using specific behavioral theories to substantiate and explain causal mechanisms can serve to enhance the internal validity of causal claims. Equally important, and in relation to external validity, the use of social science theory also supports the generalizability of these conclusions. As argued by Donaldson and Lipsey, the integration of social science theory and program theory “constitutes a (if not the) major way that evaluation contributes to social betterment by way of knowledge development” (cited in Donaldson & Crano, 2011 , p. 145). We agree.

DECLARATION 8: STRENGTHEN THE TESTABILITY OF PROGRAM THEORIES A central aim of theory-based evaluation is to establish what Mayne (2001 ) terms “plausible association” between a specific set of program components and out comes. As Mayne goes on to argue, this involves evidentiary confirmation of key elements of the program theory, examination of influencing factors, and disproof of alternative explanations (2001, p. 7). These are necessary steps in constructing credible contribution stories, regardless of the type of theory-based evaluation that is pursued. Th ere are many potential ways to address this concern. One central stepping stone is to enhance program theory specification by developing more fi ne-grained descriptions of the most salient aspects of the program theory to be empirically examined (Vaessen, 2016). As indicated in Table 1, different levels of model speci fication (granularity) can be pursued, ranging from simple models (depicting A leads to B) to more complex models (depicting A plus B leads to C because of D,

© 2019 CJPE 33.3, 414–433 doi: 10.3138/cjpe.53070 How We Model Matters 427 under condition E). The methodological point here is that specifying these aspects of a program theory with more precision enhances the evaluator’s ability to make predictions about confi rmatory/disconfirmatory patterns in the data, which in turn allows for more robust empirical verifications of the program theory. To aid in this empirical verification, Mayne (2017 ) has developed and promoted theory- robustness criteria that emphasize, first, overall understanding, structural logic and clarity, and alignment between activities and anticipated results, followed by specific result and assumption criteria. In logical extension of enhancing the specificity of program theories, another way to enhance the testability of program theories is by using structured analyti cal approaches and techniques. Brousselle and Champagne (2011) have proposed logic analysis as an approach for using existing social science theory to examine the plausibility and validity of a program theory. Logic analysis critically examines “the program’s strengths and weaknesses, elucidates the links between the pro gram’s design and the production of desired outcomes, and identifi es contextual influences” ( Brousselle & Buregeya, 2018 , p. 156). Another strategy is to consider the strength of evidence in favor of one or more key elements of the program theory. For instance, by gauging the breadth of evidence in favor of the most salient components of the program theory, including the span of evidence across different data sources, methods, and implementation sites. There is a rich and broad literature on how to conduct quality assessment of qualitative, quantitative, or mixed methods evidence, including operational frameworks to structure and guide such appraisals. Lemire, Nielsen, and Dybdal (2012 ) have proposed the Relevant Explanation Finder, an operational framework promoting a more systematic and transparent assessment of rival explanations and influencing factors. The framework has been applied and further developed in a wide range of settings (e.g., Biggs, Farrell, Lawrence, & Johnson, 2014). Th e use of the REF or other similar frameworks may serve to establish a stronger chain of evidence in support of plausible association and lend transparency and structure to the identification and examination of rival explanations and infl uencing factors, ultimately giving credibility to the causal conclusions drawn. Yet a third strategy is to pursue analytical approaches such as process trac ing, whereby more fine-grained causal mechanisms can be examined and verifi ed (Schmitt & Beach, 2015). A key strength of process tracing is that it relies on clear ly specified tests, whereby the presence or absence of hypothesized mechanisms can be determined. In this way, the method makes explicit the weight of evidence in favor or against the specific mechanisms, which in turn enhances the credibility of the conclusions drawn. In a similar vein, Qualitative Comparative Analysis, a case-based analytical approach, can also be partnered with program theorizing in identifying the causal conditions (e.g., specific program components) that either individually or collectively promote a specific outcome (Befani, Ledermann, & Sager, 2007 ; Lemire & Christie, this issue). Th ere are of course many more structured analytical strategies and tech niques for both developing and empirically verifying program theories. By using

doi: 10.3138/cjpe.53070 CJPE 33.3, 414–433 © 2019 428 Lemire, Whynot, and Montague these structured analytical strategies, we ensure a more systematic and transparent approach to program theorizing, whereby the conclusiveness and credibility of our causal conclusions are enhanced. We strongly encourage more applied work on this topic.

DECLARATION 9: PURSUE THEORY-BASED SYNTHESIS AND ACCUMULATION OF KNOWLEDGE Taking stock of what we know is important for learning. Th e fi ndings included in this special issue suggest that there is an immense opportunity for cumulative learning at this time. The calls for archetypal program theories and theory knit ting are at least in part motivated by the opportunity to cumulatively learn from theory-based evaluations in ways not as of yet systematically or widely pursued in evaluation circles. As compared to the Cochrane Collaboration, for example, which might focus on addressing specific research questions using data from controlled trials and rigorously pre-screened statistical studies, the accumulation of evidence under this kind of effort would accept a wider range of studies but in turn would encourage transparency, discussion, and dialogue about both data and interpretations of theory. Over the past few years, there has also been a growing interest in mixed methods synthesis approaches, and, in logical extension, theory-based synthesis approaches. The most common approach, at least in evaluation circles, is that of Pawson’s realist synthesis approach (Pawson, 2006). In the present special is sue, Lemire and Christie illustrate and promote meta-modeling, an alternative approach to developing program theories across existing evaluations (see also Lemire, 2017). Pursuing theory-based syntheses come with a number of benefits. For one, systematically combining the findings from a comprehensive pool of diverse studies provides robust information on the extent to which, and how, programs work across diff erent settings, populations, and times. Moreover, synthesizing a broader range of evidence—qualitative and quantitative—allows for answers to a broader range of questions, including “how” and “why” programs work (or fail to work). These types of questions are difficult to answer in traditional systematic reviews relying on meta-analysis of quantitative findings. For another, theory- based synthesis approaches also hold potential to move beyond “a thin description of the evidence to produce higher order syntheses resulting in the production of new knowledge and/or theory” ( Pope, Mays, & Popay, 2007 , p. 6). Th e pursuit of “thicker” understanding of how and why programs work across diff erent settings, contexts, and times is in our view a worthwhile eff ort.

DECLARATION 10: DO MORE WITH LESS, MORE QUICKLY Th ere is an iterativeness that has been associated with program theory’s de velopment, implementation, reporting, and use that has signifi cant resource

© 2019 CJPE 33.3, 414–433 doi: 10.3138/cjpe.53070 How We Model Matters 429 implications. These resources include capacity, time, and nancesfi that are typical ly in short supply in most organizations, including various orders of government. These resource implications are highlighted in an era in which the evaluation function is pressured “to do more with less, and—do it more quickly, please” (we are Canadian after all). In the Canadian federal government, resources allocated to the evaluation function, including contracting dollars, and full-time equiva lents (FTEs) have been steadily declining over the last decade. Coupled with the former 2009 Policy on Evaluation’s requirement to address all direct program spending (DPS) with no new funding, and the recently implemented 2016 Policy on Results overhaul of the government’s results structure, this tension is very real and unlikely to change in the foreseeable future. Practically speaking, we suggest that there are several factors that will contribute to making program theory more efficient, including the following: (a) building program-theory capacity and competencies, including the resulting ability to scope appropriately, (b) the adoption of a longer-term perspective on behalf of decision makers, which features the accumulation of learning about what works, and (c) the demonstration of its value. Akin to the development of any other competency, we suggest that the more program theory is used, the better we will get at it. This applies at first at the individual level, and then within insti tutional settings based on principles forwarded in double- and triple-loop learn ing. Building competency, cumulatively learning from ourselves and others, and scoping where and how program theory may be employed most eff ectively should all result in a reduced resource burden by better targeting the expensive parts of evaluation—the data collection and analysis. By having strong theory-based knowledge about the key actors and critical influence points in our interventions, we can more efficiently target the people and resources required to understand how things are working. Additionally, the value of getting program theory and, accordingly, investing in its development, will ultimately serve as a more efficient use of government resources. It is our experience that organizations sometimes spend millions of dollars doing the wrong thing by design. Investing resources up front to develop an understanding of key pre-conditions and factors that would allow a given design in a given context to work seems far superior to spending on patchwork fi xes and corrective actions downstream.

CONCLUDING THOUGHTS A central aim of this issue is to promote reflective practice in program theorizing: to expand and strengthen both the conceptual and technical foundations of pro gram theories in evaluation. Grounded on the idea of reflective practice, we have proposed 10 declarations that in our perspective will serve to advance refl ective practice around the development, refinement, examination, and use of program theories. Ultimately, our manifesto and this issue as a whole both elaborate on Carol Weiss’s (1995) famous adage that when it comes to modern evaluation practice, there is truly nothing as practical as good theory. doi: 10.3138/cjpe.53070 CJPE 33.3, 414–433 © 2019 430 Lemire, Whynot, and Montague

REFERENCES Alford, J., & Head, B. W. (2017). Wicked and less wicked problems: A typology and a contingency framework. Policy and Society, 36 (3), 397–413. https://doi.org/10.1080/ 14494035.2017.1361634 Ayres, I., & Braithwaite, J. (1992). Responsive regulation: Transcending the deregulation debate. New York, NY: Oxford University Press. Befani, B., Ledermann, S., & Sager, F. (2007). Realistic Evaluation and QCA: Concep tual parallels and an empirical application. Evaluation, 13 (2), 171–192. https://doi. org/10.1177/1356389007075222 Bemelmans-Videc, M.-L., Rist, R. C., & Vedung, E. (Eds.) (1998). Carrots, sticks & sermons: Policy instruments and their evaluation. New Brunswick, NJ: Transaction Publishers. Biggs, J. S., Farrell, L., Lawrence, G., & Johnson, J. K. (2014). A practical example of Con tribution Analysis to a public health intervention. Evaluation , 20 (2), 214–229. https:// doi.org/10.1177/1356389014527527 Bledsoe, K. (2005). Using theory-driven evaluation with underserved communities: Pro moting program development and program sustainability. In S. Hood, R. H. Hop- son, & H. T. Frierson (Eds.), Th e role of culture and cultural context: A mandate for inclusion, the discovery of truth and understanding in evaluation theory and practice (pp. 175 - 196). Charlotte, NC: Information Age Publishing. Bledsoe, K., & Donaldson, S. (2015). Culturally responsive theory-driven evaluation. In S. Hood, R. H. Hopson, & H. T. Frierson (Eds.), Continuing the journey to reposition culture and cultural context in evaluation theory and practice (pp. 3- 28). Charlotte, NC: Information Age Publishing. Bourgeois, I., & Whynot, J. (2018). Strategic evaluation utilization in the Canadian federal government. Canadian Journal of Program Evaluation, 32 (3), 327–346. https://doi. org/10.3138/cjpe.43179 Brousselle, A., & Champagne, F. (2011). Program theory evaluation: Logic analysis. Evaluation and Program Planning, 34(1), 69–78. https://doi.org/10.1016/j.evalprog plan.2010.04.001 Brousselle, A., & Buregeya, J.-M. (2018). Theory-based evaluations: Framing the existence of the new theory in evaluation and the rise of the 5th generation. Evaluation, 24(2), 153–168. https://doi.org/10.1177/1356389018765487 Coryn, C. L. S., Noakes, L. A., Westine, C. D., & Schröter, D. C. (2011). A systematic review of theory-driven evaluation practice from 1990 to 2009. American Journal of Evalu ation, 32 (2), 199–226. Dahler-Larsen, P. (2018). Theory-based evaluation meets ambiguity: The role of Janus variables. American Journal of Evaluation, 39 (1), 6–23. https://doi. org/10.1177/1098214017716325 Dobell, R., & Zussman, D. (1981). An evaluation system for government: If politics is thea tre, then evaluation is (mostly) art. Canadian Public Administration, 24 (3), 404–427. https://doi.org/10.1111/j.1754-7121.1981.tb00341.x Donaldson S. I., & Crano, W. D. (2011). Theory-driven evaluation science and applied social psychology: Exploring the intersection. In M. M. Mark, S. I. Donaldson, &

B. Campbell (Eds.), Social psychology and evaluation (pp. 141- 160). New York, NY: Guilford Press. Funnell, S. C., & Rogers, P. (2011). Purposeful program theory: Effective use of theories of change and logic models. San Francisco, CA: Jossey-Bass. Greene, J. C. (2016). Advancing equity: Cultivating an evaluation habit. In S. I. Donaldson & R. Picciotto (Eds.), Evaluation for an equitable society (pp. 49 - 66). Charlotte, NC: Information Age Publishing. Grim, E., Castillo, I., & O’Quinn, E. P. (2018). The logic model repair shop: An introduction to 3D logic models. A Presentation at EERS 2018. Retrieved from https://www.slideshare. net/IsaacCastillo6/the-logic-model-repair-shop-an-introduction-to-3d-logic-models Hansen, M. B., & Vedung, E. (2010). Theory-based stakeholder evaluation. American Journal of Evaluation, 31 (3), 295–313. https://doi.org/10.1177/1098214010366174 Head, B. W. (2008). Wicked problems in public policy. Public Policy, 3(2), 101–118. Re trieved at https://search.informit.com.au/documentSummary;dn=66288030650475 4;res=IELHSS Kalmar, D. A., & Sternberg, R. J. (1988). Theory knitting: An integrative approach to theory development. Philosophical Psychology, 1(2), 153–170. https://doi. org/10.1080/09515088808572934 Leeuw, F. L. (1991). Policy theories, knowledge utilization, and evaluation. Knowledge and Policy, 4 (3), 73–91. https://doi.org/10.1007/BF02693089 Leeuw, F. L., & Donaldson, S. I. (2015). Theory in evaluation: Reducing confusion and encour aging debate. Evaluation, 21 (4), 467–480. https://doi.org/10.1177/1356389015607712 Lemire, S. (2017). Meta-modeling social programs: Methodological reflections on a practical example (Doctoral dissertation). University of California, Los Angeles. Lemire, S., Christie, C. A., & Nielsen, S. B. (2019). Mending the theory gap in evaluation— moving towards theory knitting. In M. Palenberg and A. Paulson (Eds.), Evaluation and the pursuit of impact. New York, NY: Taylor & Francis. Lemire, S., Nielsen, S. B., & Dybdal, L. (2012). Making contribution analysis work: A practical framework for handling influencing factors and alternative explanations. Evaluation, 18 (3), 294–309. https://doi.org/10.1177/1356389012450654 Mark, M. M., Donaldson, S. I., & Campbell, B. (2011). The past, the present, and possible futures of social psychology and evaluation. In M. M. Mark, S. I. Donaldson, and B. Campbell (Eds.), Social psychology and evaluation (pp. 4 - 27). New York, NY: Guilford Press. Mayne, J. (2001). Addressing attribution through contribution analysis: Using performance measures sensibly. Canadian Journal of Program Evaluation, 16 (1), 1–24. Mayne, J. (2015). Useful theory of change models. Canadian Journal of Program Evaluation, 30(2), 119–142. https://evaluationcanada.ca/system/files/cjpe-entries/30-2-119_0.pdf Mayne, J. (2017). Theory of change analysis: Building robust theories of change. Canadian Journal of Program Evaluation, 32 (2), 155–173. https://doi.org/10.3138/cjpe.31122 McGrail, S. (2014). Rethinking the roles of evaluation in learning how to solve “wicked” policy problems: The case of anticipatory techniques used to support climate change mitigation and adaptation. Evaluation Journal of Australasia, 14 (2), 4–16. https://doi. org/10.1177/1035719X1401400202

Montague, S., & Porteous, N. L. (2013). The case for including reach as a key element of program theory. Evaluation and Program Planning, 36 (1), 177–183. https://doi. org/10.1016/j.evalprogplan.2012.03.005 Pawson, R. (2006). Evidence-based policy: A realist perspective. London, England: Sage. Peters, B. (2017). What is so wicked about wicked problems? A conceptual analysis and a research program. Policy and Society, 36(3), 385–396. https://doi.org/10.1080/14494 035.2017.1361633 Pope, C., Mays, N., & Popay, J. (2007). Synthesizing qualitative and quantitative evidence: A guide to methods. Milton Keynes, England: Open University Press. Reicher, S., Haslam, A., & Van Bavel, J. (2018). Time to change the story. Th e Psychologist. Retrieved from http://thepsychologist.bps.org.uk/time-change-story Riemer, M., & Bickman, L. (2011). Using program theory to link social psychology and program evaluation. In M. M. Mark, S. I. Donaldson, and B. Campbell (Eds.), Social psychology and evaluation (pp. 104 - 138). New York, NY: Guilford Press. Rogers, P. (2016). Understanding and supporting equity: Implications of methodologi cal and procedural choices in equity-focused evaluations. In S. I. Donaldson and R. Picciotto (Eds.), Evaluation for an equitable society (pp. 199 - 216). Charlotte, NC: Information Age Publishing. Schmitt, J., & Beach, D. (2015). The contribution of process tracing to theory-based evaluations of complex aid instruments. Evaluation, 21(4), 429–447. https://doi. org/10.1177/1356389015607739 Schwandt, T. (2015). Evaluation foundations revisited: Cultivating a life of the mind for practice. Stanford, CA: Stanford University Press. Schwandt, T. (2018). Evaluative thinking as a collaborative social practice: The case of boundary judgment making. In A. T. Vo & T. Archibald (Eds.). New Directions in Evaluation, 158 , 125–138. Vaessen, J. (2016). What is (good) program theory in international development? (Blog post). Retrieved from http://ieg.worldbankgroup.org/blog/what-good-program-the ory-international-development Vaessen, J., & Leeuw, F. L. (2009). Interventions as theories: Closing the gap between evaluation and the disciplines. In J. Vaessen and F. L. Leeuw (Eds.), Mind the gap: Perspectives on policy evaluation and the social sciences (141 - 170). New Brunswick, NJ: Transaction Publishers. Van Melle, E., Gruppen, L., Holmboe, E. S., Flynn, L., Oandasan, I., & Frank, J. R. (2017). Using contribution analysis to evaluate competency-based medical education pro grams: It’s all about rigor in thinking. Academic Medicine, 92 (6), 752–758. https://doi. org/10.1097/ACM.0000000000001479 Vo, A. T., & Archibald, T. (2018). Evaluative thinking. New Directions in Evaluation, 158, 139–147. https://doi.org/10.1002/ev.20317 Weiss, C. H. (1995). Nothing as practical as good theory. Washington, DC: Aspen Institute. Weiss, C. H. (1997). How can theory-based evaluation make greater headway? Evaluation Review, 21 (4), 501–524. https://doi.org/10.1177/0193841X9702100405 Weiss, C. H. (1998). If program decisions hinged only on information: A response to Pat ton. Evaluation Practice, 9 (3), 15–28. https://doi.org/10.1177/109821408800900302

Williams, B., & Hummelbrunner, R. (2011). Systems concepts in action: A practitioner’s toolkit. Stanford, CA: Stanford University Press.

AUTHOR INFORMATION Sebastian Lemire is a postdoctoral scholar in the Social Research Methodology Division in the Graduate School of Education and Information Studies, University of California, Los Angeles. Jane Whynot is the past president of the National Capital Chapter of the Canadian Evalu ation Society, a contract instructor at Carleton University, and a Ph.D. candidate at the University of Ottawa. Steve Montague is a partner in Performance Management Network Inc. and an adjunct professor at Carleton University

Peer Reviewers for Volume 33 and Manuscripts Submitted in 2018 / Examinateurs des manuscrits du volume 33 et des manuscrits soumis en 2018

Marc Alain, Université du Québec à Trois Rivières Courtney Amo, Atlantic Canada Opportunities Agency Thomas Archibald, Virginia Tech Brad Astbury, University of Melbourne Tim Aubry, University of Ottawa Tarek Azzam, Claremont Graduate University Alexey Babayan, Brant County Health Unit Herman Bakris, University of Victoria Gail Barrington, Barrington Research Group Marie-Josée Beauséjour, University of Montreal Andrealisa Belzer, Health Canada Lynda Benhadj, Université Sherbrooke Annie Bérubé, Université du Québec en Outaouais Frédéric Betrand, Independent Consultant Steffan Bohni Nielson, Consultant Normand Boucher, Université Laval Ayesha Boyce, University of North Carolina at Greensboro Richard Boyle, Institute of Public Administration Paul Brandon, University of Hawai'I at Mānoa Leann Brosius, Kansas State University Anthony Buckley, University of Leeds Jean Marie Buregeya, Université de Sherbrooke Valerie Caracelli, U.S. Government Accountability Offi ce Fred Carden, Independent Consultant Joanne Carman, University of North Carolina at Charlotte Annie Carrier, Université Sherbrooke Nancy Carter, Nova Scotia Health Research Foundation Catherine Charette, Winipeg Regional Health Authority Kaireen Chaytor, University of Prince Edward Island Rhonda Cockerill, University of Toronto Damien Contandriopoulos, Université de Montréal Chris Coryn, Western Michigan University Brad Cousins, University of Ottawa Christian Dagenais, Université de Montréal Peter Dahler-Larsen, University of Southern Denmark Pierre-Marc Daigneault, Université Laval Evangeline Danseco, Ontario Centre of Excellence for Child and Youth Mental Health Randall Davies, Brigham Young University

Thomas Delahais, Independent Researcher Christian de Visscher, UC Louvain Raisa Deber, University of Toronto Lori Diemert, University of Toronto Rob Downie, Fanshawe College Diane Dubeau, Université du Québec en Outaouais Natalie Dubois, Université du Québec en Montréal François Dumaine, Prairie Research Associates Sarah Earl, YMCA Pearl Eliadis, McGill University Amy Etherington, International Development Research Centre Paul Favaro, York University Marie-Josée Fleury, Douglas Institute Kim Forss, Consultant Kimberly Fredericks, Sage Colleges John Garcia, University of Waterloo Joseph Garcia, Consultant Nathalie Gilbert, University of Ottawa Brenda Gladstone, Sick Children's Hospital Swee Goh, University of Ottawa Jennifer Greene, University of Illinois at Urbana Champaign Harold Hanson, Consultant Eman Hassan, University of British Columbia Rodney Hopson, George Mason University Carol Hubberstey, Nota Bene Consulting Hamid Jorjani, International Research, Innovation, and Development Inc. Myriam Laventure, Université de Sherbrooke Bernard-Simon Leclerc, Université de Montréal Chris Lovato, University of British Columbia Gregory Marchildon, University of Toronto Melvin Mark, Penn State University Jim McDavid, University of Victoria Kate McKegg, The Knowledge Institute Céline Mercier, McGill University Donna Mertens, Gallaudet University Nadia Minian, Centre for Addiction and Mental Health Anita Myers, University of Waterloo Cameron Norman, University of Toronto Michael Obrecht, Consultant John Owen, University of Melbourne Burt Perrin, Consultant Cheryl Poth, University of Alberta Hallie Preskill, Consultant Ralph Renger, University of North Dakota

Kay Rockwell, University of Nebraska-Lincoln Lynda Rey, École nationale d’administration publique Michelle Searle, Queen's University Mark Seasons, University of Waterloo Robert Shepherd, Carleton University Mary Elizabeth Snow, Simon Fraser University Nicoletta Stame, University of Rome Kenneth Watson, Consultant Ingrid Weigold, University of Agron Ros Woodhouse, York University Jennifer Yessis, University of Waterloo

INSTRUCTIONS TO AUTHORS Journal Objectives The Journal seeks to promote the theory and practice of program evaluation in Canada by publishing: • Articles of up to 6,000 words on all aspects of the theory and practice of evaluation including methodology; standards of practice; strategies to enhance the implementation, reporting, and use of evaluations; and evaluation audits/meta-evaluations. Articles reporting original empirical research on evaluation are of particular interest. • Addressing Challenges in Evaluation Practice manuscripts of 1,500–3,000 words that pre sent real-life cases by evaluation practitioners. • Evaluation Practice Notes between 1,000 and 3,000 words on all aspects of evaluation prac tice with the goal of sharing practical knowledge, experiences, and lessons learned of benefit to the evaluation community. • Book Reviews of up to 1,000 words that provide a critique of authored and edited volumes of interest to the field. Articles Submitted manuscripts will be evaluated in relation to: • Relevance to the Canadian context in respect to either the programs subjected to evaluation or issues applicable to the practice of evaluation in Canada and elsewhere; • Clarity and conciseness; articles of less than 6,000 words are en couraged; • Originality; • The mix of theoretical, methodological, and reported findings available for publication in any particular issue of the Journal; • The “Manuscript Evaluation Sheet” that can be found at the end of this issue. Addressing Challenges in Evaluation Practice We encourage submissions that highlight real-life challenges encountered in evaluation design, conduct, reporting, knowledge transfer, and utilization. Rich descriptions of challenges and of ap proaches to addressing challenges are invited. Articles must include the following three sections and provide answers to all of the following “interview” questions: • Description of Case and Evaluation Context – Why was the evaluation conducted? What did the client want to learn? – What resources (time, money, in-kind, etc.) were available for conducting the evaluation? Were they suitable for answering the evaluation questions? • Description of Challenges and How They Impede the Evaluation Process – What challenges did you face in conducting this evaluation? – To what extent did you or could you have anticipated these challenges? – How did these challenges affect the implementation of the evaluation? • Description of How Challenges Were Addressed – How did you address each of these challenges? – What should evaluators do to avoid these challenges to start off with? – What would you recommend for others faced with similar challenges? – What, if any, are the systemic issues that the evaluation community should address? Submissions must follow this general structure and be written as interview questions and answers. Authors may add additional “interview” questions. Review comments connecting the case to evalu ation research literature will be published alongside the article. Evaluation Practice Notes The purpose of Evaluation Practice Notes is to promote the sharing of knowledge, experience, and insight; to build a repository of the collective knowledge of the Canadian evaluation community of practice; and to foster the ongoing development of a strong Canadian evaluation capacity. Manu scripts will be reviewed in relation to the following: • Relevance to the practice of evaluation in Canada and elsewhere • Credibility of the analysis that led to lessons learned, best practices, etc. • Validity of conclusions and implications •Originality • Clarity and conciseness (between 1,000 and 3,000 words) • Reader interest.

Book Reviews Reviews of current publications relevant to program evaluation in Canada are reported in each issue. Reviewers are solicited by the Book Review Editor. Language Manuscripts accepted for publication will be published in either official language with an abstract in the other official language. Manuscript Submissions • Questions (i.e., about suitability of manuscript topic) can be sent directly to the Editor-in- Chief, *TBCFMMF#PVSHFPJT, at . • Submissions must not have been previously published, nor should they be before anotherjournal for consideration (or an explanation has been provided to the Editor). • Manuscripts considered appropriate for the journal review process can be submitted elec-tronically by registering for the CJPE Online Journal System (OJS) at . • Manuscripts should be submitted in electronic format as a Microsoft Word file. • Manuscripts should not exceed a maximum of 30 doublespaced pages. • Manuscripts must include an abstract in both official languages of no more than 100 words. • All copy must be doublespaced on 81⁄2 x 11 inch pages. One-inch margins are required on all four sides. • All tables and figures must be numbered separately and grouped together at the end of the manuscript. Figures must also be submitted as individual JPG, TIF, or PDF files with aminimum of 300-dpi resolution. Clearly visible notes within the text should indicate theirapproximate placement. • Manuscripts must conform to the referencing format of the sixth edition of the Publication Manual of the American Psychological Association (2009). All authors of manuscripts accepted for publication that do not conform to APA style will be charged a copyediting fee that must be paid before the manuscript is published. • Where available, URLs for the references should be provided. • Content footnotes are discouraged and should be used only when absolutely necessary. • All submitted manuscripts will be subject to blind review by up to four (4) expert members of the evaluation community. Authors are required to ensure that all clues to their identityare removed from manuscripts submitted for potential publication. Copies of the reviewers’comments will be sent to all authors with identities withheld. INSTRUCTIONS AUX AUTEURS Objectifs de la Revue La Revue vise à promouvoir la théorie et la pratique de l’évaluation de programmes au Canada en publiant : • des articles d’au plus 6 000 mots sur tous les aspects de la théorie et de la pratique de l’évaluation, y compris la méthodologie, les normes d’évaluation, l’application des évaluations, le compte rendu et l’utilisation des évaluations; et articles portant sur des vérifications ou des méta-évaluations. Nous nous intéressons particulièrement aux rapports sur des travaux de recherche empiriques sur l’évaluation. • Surmonter les défis rencontrés lors d’une évaluation dans un manuscrit de 1 500 à 3 000 mots présentant des situations réelles vécues par des évaluateurs. • Notes pratiques d’évaluation entre 1 000 et 3 000 mots portant sur tous les aspects de la pratique de l’évaluation dans le but de partager les connaissances pratiques, les expériences et les leçons apprises dans l’intérêt de l’ensemble de la communauté des évaluateurs. • des comptes rendus de livres d’au plus 1 000 mots qui fournissent un examen des livres d’interêt du domaine. Articles Les manuscrits soumis seront évalués en fonction d’un certain nombre de critères : • La pertinence compte tenu du contexte canadien, que ce soit dans le cas de programmes fai sant l’objet d’une évaluation ou dans le cas de questions applicables à la pratique de l’évaluation au Canada et ailleurs; • La clarté et la concision; on encourage la présentation d’articles de moins de 6 000 mots; • L’originalité; • Le mélange de théories, de méthodes, et de constatations pratiques disponibles aux fins de la publication dans chaque numéro de la Revue; • La « feuille d’évaluation de manuscrit » figurant à la fin de ce numéro. Surmonter les défis de l’évaluation Nous encourageons les auteurs à soumettre des manuscrits portant sur de vraies difficultés rencon trées dans les domaines de la conception, la conduite, l’utilisation, et le compte rendu de l’évaluation, ainsi que du transfert de connaissances à ce sujet. Nous invitons les auteurs à fournir une description substantielle de ces difficultés et des démarches adoptées pour les résoudre. Ces articles doivent in clure les trois sections ci-dessous et fournir des réponses à toutes les questions d’entrevue suivantes : • Description du cas et du contexte de l’évaluation – Pour quels motifs a-t-on mené cette évaluation? Quels renseignements le client voulait-il obtenir? – Quelles ressources (en temps, en argent, en nature, etc.) a-t-on fournies pour assurer la conduite de cette évaluation? Ont-elles permis de répondre aux questions d’évaluation? • Description des difficultés et de leur entrave au processus d’évaluation – Quelles difficultés avez-vous affrontées lors de la conduite de cette évaluation? – À quel point avez-vous ou auriez-vous pu anticiper ces difficultés? – Comment ces difficultés ont-elles influé sur l’exécution de l’évaluation? • Description de la démarche de résolution des difficultés – Comment avez-vous résolu chacune de ces difficultés? – Que devraient faire les évaluateurs pour éviter ces difficultés, d’abord et avant tout? – Que recommanderiez-vous aux autres évaluateurs confrontés à des difficultés similaires? – Quels sont les problèmes systémiques auxquels le milieu de l’évaluation devrait s’attaquer, le cas échéant? Les soumissions doivent respecter cette structure générale, en étant rédigées sous la forme de ques tions et de réponses d’entrevue. Les auteurs peuvent également ajouter d’autres questions d’« entrevue ». Des commentaires d’analyse établissant un lien entre le cas et des comptes rendus de recherche sur l’évaluation seront publiés en parallèle avec chaque article. Notes pratiques d’évaluation Le but des « Notes pratiques d’évaluation » est de promouvoir le partage des connaissances, de l’expérience et des recommandations, afin de construire un répertoire des connaissances collectives de la communauté canadienne des évaluateurs et dans le but de favoriser le développement constant et optimal de la capacité évaluative canadienne. Les manuscrits seront examinés en fonction des éléments suivants: • Pertinence par rapport à la pratique de l’évaluation au Canada et ailleurs • Crédibilité de l’analyse menant aux leçons apprises, aux meilleures pratiques, etc. • Validité des conclusions et de leurs implications •Originalité • Clarté et concision (entre 1 000 et 3 000 mots) • L’intérêt pour le lecteur Comptes rendus de livres Des comptes rendus de publications récentes dans le domaine de l’évaluation de programmes au Canada sont publiés dans chaque numéro. Le rédacteur de cette rubrique fait appel aux personnes en mesure de juger les publications en cause. Langues Les manuscrits acceptés seront publiés dans l’une ou l’autre des deux langues officielles, accompagnés d’un résumé dans l’autre langue. Présentation de manuscrits • Les questions (quant à la pertinence du sujet du manuscrit) peuvent être envoyés directement à l’éditeur en chef *TBCFMMF#PVSHFPJT, . • Les manuscrits soumis ne doivent pas avoir déjà été publiés ni être soumis à une autre revue pour fins d’évaluation (à moins qu’une explication n’ait été fournie à l’éditeur). • Les manuscrits considérés comme pertinents pour être soumis au processus d’examen de la revue peuvent être transmis par voie électronique en vous inscrivant au système de gestion électronique (OJS) de la RCÉP à . • Les manuscrits doivent être soumis en format électronique sous forme de fichier Microsoft Word. • Les manuscrits ne doivent pas comporter plus de 30 pages à double interligne. • Tout manuscrit doit être accompagné de deux résumés rédigés dans chacune des deux langues officielles et comportant 100 mots chacun au maximum. • Tous les manuscrits doivent être à double interligne sur du papier de format lettre. Laisserune marge d’un pouce en haut, en bas, et de chaque côté. • Tous les tableaux et chiffres doivent être numérotés individuellement et regroupés à la fin du manuscrit. Les figures doivent également être transmises en fichiers individuels (JPG, TIFou PDF) en résolution minimale de 300-dpi. Des notes indiquant clairement l’emplacementapproximatif des tableaux doivent figurer dans le texte. • Les manuscrits doivent être conformes au format de réference apparaissant dans la sixièmeédition du Publication Manual of the American Psychological Association (2009) (manuel depublication). Tout auteur dont le manuscrit a été accepté mais n’est pas en tous points con-forme aux stipulations de l’APA devra assumer les frais de rédaction avant que le manuscritne soit publié. • Lorsque disponibles, les liens URL référencés sont fournis. • Les renvois en bas de page sont à déconseiller. Prière de n’y avoir recours qu’en cas d’absolue nécessité. • Tous les manuscrits seront soumis à une lecture à l’aveugle effectuée avis d’un maximum dequatre (4) membres experts de la communauté des évaluateurs. Les auteurs doivent veillerà ce que tous les indices pouvant révéler leur identité soient retirés du manuscrit soumis envue d’une éventuelle publication. Les exemplaires de commentaires des examinateurs seront envoyés à tous les auteurs sans révélation d’identité. ADVERTISING The Council of the Canadian Evaluation Society has decided that a limited amount of space in the Journal will be made available for advertising. At present the journal is published three or four times a year, and has a circulation of approximately 1,600 including over 75 libraries.

Rates 1 Issue 2 Issues Full Page $300 $500 Half Page $200 $300 Quarter Page $125 $175

The above rates include typesetting and makeup, as the Council has decided thatJour nal advertising should match the makeup of the rest of the publication.

Terms • All advertisements must be accompanied by payment. No cash or agency dis counts are offered. • Art work is returned promptly to advertisers for corrections and final approval.

Contract Regulations • All advertising copy is subject to the approval of the editors. • Neither the editors nor the Canadian Evaluation Society shall be subject to any liability for any failure to publish or circulate all or any part of any issue because of strikes, work stoppages, fire, accidents, acts of God, or any other circum stances not within their control. • The Canadian Evaluation Society reserves the right to increase advertising rates or change closing dates at any time upon 30 days’ notice in writing, and all con ditions are subject to this reservation. • Notice of cancellation must be received by closing date.

Contact Canadian Evaluation Society #3, 247 Barr Street Renfrew, Ontario K7V 1J6 PUBLICITÉ Le conseil de la Société canadienne d’évaluation a décidé de réserver dans la Revue un espace limité pour la publication d’une rubrique consacrée aux messages publicitaires. À l’heure actuelle, la Revue est publiée trois ou quatre fois par an et a environ 1 600 abonnés, y compris plus de 75 bibliothèques.

Tarifs 1 Numéro 2 Numéros Page entière 300$ 500$ Demi-page 200$ 300$ Quart le page 125$ 175$

Ces tarifs comprennent les frais de composition et de mise en page, étant donné que le Conseil a décidé que la publicité paraissant dans la Revue devait être conforme à la mise en pages du reste de la publication.

Conditions • Toutes les publicités doivent être accompagnées du paiement. Nous ne donnons d’escompte ni aux agences publicitaires ni en cas de paiement comptant. • Les textes et illustrations sont renvoyées rapidement aux annonceurs pour cor rections éventuelles et approbation finale.

Modalitiés du contrat • Tout message publicitaire est accepté sous réserve de l’approbation du comité de rédaction. • Les rédacteurs et la Société canadienne d’évaluation n’acceptent aucune respon sabilité légale dans l’éventualité où des grèves, des arrêts de travail, des incendies, des accidents, des désastres naturels, ou toutes autres circonstances indépendantes de leur volonté empêchent la parution ou la distribution en entier ou en partie de n’importe quel numéro de la Revue. • La Société canadienne d’évaluation se réserve le droit d’augmenter les tarifs pub licitaires ou de modifier les dates limites à n’importe quel moment en donnant 30 jours de préavis par écrit, et tous les termes du contrat sont soumis à cette clause. • Tout avis d’annulation doit nous parvenir avant la date limite.

Liaison Société canadienne d’évaluation #3, 247 Barr Street Renfrew, Ontario K7V 1J6 The Canadian Journal of Program Evaluation La Revue canadienne d’évaluation de programme Manuscript Evaluation Sheet

Reviewer Title

Date Sent Due Back

Not Evaluation Applicable Excellent Good Adequate Marginal Poor A. Significance of topic ______B. Literature review ______C. Conceptualization ______D. Methodology ______E. Data analyses ______F. Interpretation ______G. Clarity of presentation ______H. Validity of conclusions ______I. Reader interest ______Narrative: (comments, reasons for ratings, suggestions for revisions, etc.) Recommendation: (circle one) 1. Excellent: accept as is 4. Reject: definitely not publishable 2. Accept: needs minor revisions 5. Not appropriate for The Canadian Journal of 3. Has worth, but reject as is: suggest Program Evaluation, suggest submission to: major revisions and resubmission ______

Confidential remarks to the Editor: Return with manuscript to: *TBCFMMF#PVSHFPJT, Editor-in-Chief 5he Canadian Journal of Program Evaluation

Please use additional sheets if necessary. The Canadian Journal of Program Evaluation La Revue canadienne d’évaluation de programme Feuille d’évaluation de manuscrit

Nom Titre

Date d’envoi Date de retour

Sans Évaluation objet Excellent Bon Passable Marginal Faible A. Importance du sujet ______B. Examen de la littérature ______C. Conceptualisation ______D. Méthodologie ______E. Analyses des données ______F. Interprétation ______G. Clarté de la présentation ______H. Validité des conclusions ______I. Intérêt pour le lecteur ______Narratif : (commentaires, raison appuyant la cotation, révisions suggérées, etc.) Recommandation : (encerclez-en une) 1. Excellent : accepter tel quel 4. Rejeter : ne doit définitivement pas être publié 2. Accepter : doit subir de petites révisions 5. Ne convient pas à La Revue canadienne 3. A du mérite, mais à rejeter tel d’évaluation de programme, recommandons que présenté : recommandons soumission à : révisions importantes et re-soumission ______

Faites parvenir les commentaires confidentiels accompagnés du manuscrit ËMB rédactrice associéF francophone : Astrid Brousselle %JSFDUSJDF ²DPMFEh"ENJOJTUSBUJPO1VCMJRVF 6OJWFSTJUÏEF7JDUPSJB BTUSJE!VWJDDB

Veuillez utiliser des feuilles supplémentaires si nécessaire.