<<

Philosophies and types of research Elliot Stern

In: Descy, P.; Tessaring, M. (eds) The foundations of evaluation and impact research Third report on vocational training research in Europe: background report. Luxembourg: Office for Official Publications of the European Communities, 2004 (Cedefop Reference series, 58)

Reproduction is authorised provided the source is acknowledged

Additional information on Cedefop’s research reports can be found on: http://www.trainingvillage.gr/etv/Projects_Networks/ResearchLab/

For your information: · the background report to the third report on vocational training research in Europe contains original contributions from researchers. They are regrouped in three volumes published separately in English only. A list of contents is on the next page. · A synthesis report based on these contributions and with additional research findings is being published in English, French and German.

Bibliographical reference of the English version: Descy, P.; Tessaring, M. Evaluation and impact of and training: the value of learning. Third report on vocational training research in Europe: synthesis report. Luxembourg: Office for Official Publications of the European Communities (Cedefop Reference series) · In addition, an executive summary in all EU languages will be available.

The background and synthesis reports will be available from national EU sales offices or from Cedefop.

For further information contact: Cedefop, PO Box 22427, GR-55102 Thessaloniki Tel.: (30)2310 490 111 Fax: (30)2310 490 102 E-mail: [email protected] Homepage: www.cedefop.eu.int Interactive website: www.trainingvillage.gr Contributions to the background report of the third research report

Impact of education and training From project to evaluation in and training – possible concepts and tools. Preface Evidence from countries in transition. Evelyn Viertel, Søren P. Nielsen, David L. Parkes, The impact of human capital on economic growth: a Søren Poulsen review Rob A. Wilson, Geoff Briscoe Look, listen and learn: an international evaluation of adult learning Empirical analysis of human capital development and Beatriz Pont and Patrick Werquin economic growth in European regions Hiro Izushi, Robert Huggins Measurement and evaluation of competence Gerald A. Straka Non-material benefits of education, training and skills at a macro level An overarching conceptual framework for assessing Andy Green, John Preston, Lars-Erik Malmberg key competences. Lessons from an interdisciplinary and policy-oriented approach Macroeconometric evaluation of active labour-market Dominique Simone Rychen policy – a case study for Germany Reinhard Hujer, Marco Caliendo, Christopher Zeiss Evaluation of systems and Active and measures: impact on integration programmes and reintegration in the labour market and social life Kenneth Walsh and David J. Parsons Preface The impact of human capital and human capital investments on company performance Evidence from Evaluating the impact of reforms of vocational literature and European survey results education and training: examples of practice Bo Hansson, Ulf Johanson, Karl-Heinz Leitner Mike Coles

The benefits of education, training and skills from an Evaluating systems’ reform in vocational education individual life-course perspective with a particular and training. Learning from Danish and Dutch cases focus on life-course and biographical research Loek Nieuwenhuis, Hanne Shapiro Maren Heise, Wolfgang Meyer Evaluation of EU and international programmes and initiatives promoting mobility – selected case studies Wolfgang Hellwig, Uwe Lauterbach, The foundations of evaluation and Hermann-Günter Hesse, Sabine Fabriz impact research Consultancy for free? Evaluation practice in the Preface European Union and central and eastern Europe Findings from selected EU programmes Philosophies and types of evaluation research Bernd Baumgartl, Olga Strietska-Ilina, Elliot Stern Gerhard Schaumberger

Developing standards to evaluate vocational education Quasi-market reforms in employment and training and training programmes services: first experiences and evaluation results Wolfgang Beywl; Sandra Speer Ludo Struyven, Geert Steurs

Methods and limitations of evaluation and activities in the European Commission research Josep Molsosa Reinhard Hujer, Marco Caliendo, Dubravko Radic Philosophies and types of evaluation research Elliot Stern

Abstract

This chapter considers different types of evaluation in vocational education and training (VET). It does so from two standpoints: debates among evaluation researchers and the way contexts of use and evalua- tion capacity shape evaluation in practice. The nature of VET as an evaluation object is discussed and theories of evaluation are located in wider debates about the nature of knowledge and philosophies of science. The various roles of evaluation in steering and regulating decentralised policy systems are discussed, as is the way evaluation itself is regulated through the development of standards and profes- sional codes of behaviour. Table of contents

1. Introduction 12 1.1. Scope of this chapter 12

2. Can evaluation be defined? 13 2.1. Assessing or explaining outcomes 13 2.2. Evaluation, change and values 14 2.3. Quantitative and qualitative methods 15 2.4. Evaluation types 15 2.5. A focus on results and outcomes 17 2.6. Participatory methods and devolved 18 2.7. Theory and practice 19

3. The object of VET evaluation 21 3.1. The domain of VET 21 3.2. Overarching characteristics of evaluation objects 22

4. Philosophical foundations 24 4.1. , observation and theory 24 4.2. Scientific realism 24 4.3. Constructivists 26

5. Evaluation theory 30 5.1. Programme theory 30 5.2. Theory based evaluation and theories of change 31 5.3. A wider theoretical frame 32

6. Determinants of evaluation use 34 6.1. Instrumental versus cumulative use 34 6.2. Process use of evaluation 34 6.3. The importance of 35 6.4. Evaluation and learning 35 6.5. The institutionalisation of evaluation 36 6.6. Organisation of evaluation in public agencies 37 6.7. Evaluation as dialogue 38 6.8. Strategies and types of evaluation use 39

7. Evaluation standards and regulation 40 7.1. Evaluation codes and standards 40 7.2. Evaluation as a method for regulating decentralised systems 43

8. Conclusions and future scenarios 45

List of abbreviations 47

References 48 List of tables

Tables Table 1: Overlaps between methodology and purpose 16 Table 2: Evaluation types 18 Table 3: Rules guiding realistic enquiry and method 26 Table 4: Comparing constructivist and ‘conventional’ evaluation 27 Table 5: Grid for a synthetic assessment of quality of evaluation work 42 1. Introduction

Until quite recently, evaluation thinking has been 1.1. Scope of this chapter centred in North America. Only over the last ten years have we seen the growth and spread of It is against this background that this chapter on evaluation in Europe. This has been associated ‘types and philosophies’ of evaluation has been with a significant expansion of evaluations sup- prepared. The main sections are as follows. ported by the European Union (EU), especially in Chapter 2 begins by seeking to define, or, more relation to Structural Funds (European Commis- accurately, review, attempts to define evaluation sion, 1999), and the establishment of evaluation through the work of various scholars and experts. It societies at Member State and European levels. allows us to clarify the main types of evaluation There has also been the beginnings of a tradition that are in use in different institutional and adminis- of evaluation publishing in Europe, including the trative settings, even though the writing of scholars emergence of a major new evaluation journal and experts focus on ‘pure types’ when compared edited for the first time from a European base and with evaluation as practised. This section also with a high proportion of European content. Many begins to highlight issues related to standards and factors account for the growth in evaluation activi- their role in evaluation more broadly. ties in Europe in recent years (Leeuw et al., 1999; Chapter 3 then considers the nature of evalua- Rist et al., 2001; Toulemonde, 2001). These tion in the context of vocational education and include both structural and management consider- training (VET) and addresses the question: what ations. Expenditure pressures, both at national and characterises evaluation in this domain; is it in any European level, have increased demands for way distinctive? The section includes both specific improved performance and greater effectiveness consideration of VET and more general character- within the public sector. Furthermore, public action istics of evaluation objects and configurations. is becoming increasingly more complex both in Chapter 4 seeks to locate evaluation theory terms of the goals of programmes and policies within the broader setting of the nature of theory and the organisational arrangements through in the philosophy of science. Many of the debates which they are delivered. in evaluation are shaped by, and reflect, these The decentralisation of public agencies, wider debates. together with the introduction of results based Evaluation theory – narrowly conceived – is then management and other principles commonly reviewed in Chapter 5. In many ways theory within described under the heading ‘new public manage- evaluation (as will be discussed) is a very particular ment’, have created new demands for account- construction, though this does not invalidate wider ability; these are often in multi-agency and part- understandings of the role of theory. nership environments. Without the bottom line of The reality and practice of evaluation are then financial measures to judge success, new ways of considered in Chapter 6, bringing together a demonstrating impacts and results are being substantial body of research into evaluation use demanded; so are new ways of regulating and and institutionalisation in order to understand steering decentralised systems. We see evaluation better different types of evaluation in situ. nowadays not only applied to programmes or Evaluation standards and codes of behaviour policy instruments but also built into the routines and ethics for evaluators are reviewed in of administration. This is often associated with Chapter 7, drawing on experience in North standards that are set by policy-makers in relation America, Australasia and, more recently, Europe. to the performance of those expected to deliver Finally, in Chapter 8, the discussion returns to public services. Standards, a concept central to the question of evaluation standards and the role evaluation, have also come to be applied to evalu- they play both in the evaluation process and in ation itself. Even evaluation is not free from the governance and regulatory processes to steer demands to deliver reliable, high quality output. institutions and promote policies and reforms. 2. Can evaluation be defined?

There are numerous definitions and types of eval- gration or synthesis of the results to achieve an uation. There are, for example, many definitions overall evaluation or set of associated evalua- of evaluation put forward in handbooks, evalua- tions.’ (Scriven, 1991; p. 139). tion guidelines and administrative procedures, by This definition prepares the way for what has bodies that commission and use evaluation. All of been called ‘the logic of evaluation’ these definitions draw selectively on a wider (Scriven, 1991; Fournier, 1995). This logic is debate as to the scope and focus of evaluation. A expressed in a sequence of four stages: recent book identifies 22 foundation models for (a) establishing evaluation criteria and related 21st century programme evaluation (Stufflebeam, dimensions; 2000a), although the authors suggest that a (b) constructing standards of performance in smaller subset of nine are the strongest. Rather relation to these criteria and dimensions; than begin with types and models, this chapter (c) measuring performance in practice; begins with an attempt to review and bring (d) reaching a conclusion about the worth of the together the main ideas and orientations that object in question. underpin evaluation thinking. This logic is not without its critics (e.g. Indicating potential problems with ‘definition’ Schwandt, 1997) especially among those of a by a question mark in the title of this section naturalistic or constructivist turn who cast doubt warns the reader not to expect straightforward or on the claims of evaluators to know, to judge and consistent statements. Evaluation has grown up ultimately to control. Other stakeholders, it is through different historical periods in different argued, have a role and this changed relationship policy environments, with inputs from many disci- with stakeholders is discussed further below. plines and , from diverse value The most popular textbook definition of evalu- positions and rooted in hard fought debates in ation can be found in Rossi et. al.’s book Evalua- philosophy of science and theories of knowledge. tion – a systematic approach: ‘Program evalua- While there is some agreement, there is also tion is the use of procedures to persistent difference: evaluation is contested systematically investigate the effectiveness of terrain. Most of these sources are from North social intervention programs. More specifically, America where evaluation has been established – evaluation researchers (evaluators) use social as a discipline and practice – and debated for 30 research methods to study, appraise, and help or more years. improve social programmes in all their important aspects, including the diagnosis of the social problems they address, their conceptualization 2.1. Assessing or explaining and design, their implementation and administra- outcomes tion, their outcomes, and their efficiency.’ (Rossi et al., 1999; p. 4). Among the most frequently quoted definitions is Using words such as effectiveness rather than that of Scriven who has produced an evaluation Scriven’s favoured ‘merit worth or value’ begins to Thesaurus, his own extensive handbook of evalu- shift the perspective of this definition towards the ation terminology: ‘“evaluation” refers to the explanation of outcomes and impacts. This is process of determining the merit, worth or value partly because Rossi and his colleagues identify of something, or the product of that process […] helping improve social programmes as one of the The evaluation process normally involves some purposes of evaluation. Once there is an intention identification of relevant standards or merit, worth to make programmes more effective, the need to or value; some investigation of the performance explain how they work becomes more important. of evaluands on these standards; and some inte- Yet, explanation is an important and intentionally 14 The foundations of evaluation and impact research

absent element in Scriven’s definitions of evalua- ‘The problem of establishing a program’s impact tion: ‘By contrast with evaluation, which identifies is identical to the problem of establishing that the the value of something, explanation involves program is a cause of some specified effect. answering a Why or How question about it or a Hence, establishing impact essentially amounts call for some other type of understanding. Often, to establishing causality.’ (Rossi et al., 1999). explanation involves identifying the cause of a The difficulties of establishing perfect, rather phenomenon, rather than its effects (which is a than good enough, impact assessments are major part of evaluation). When it is possible, recognised by Rossi and colleagues. This takes without jeopardizing the main goals of an evalua- us into the territory of experimentation and causal tion, a good evaluation design tries to uncover inference associated with some of the most influ- microexplanations (e.g. by identifying those ential founders of North American evaluations components of the curriculum package that are such as Campbell, with his interest in experi- producing the major part of the good or bad mental and quasi-experimental designs, but also effects, and/or those that are having little effect). his interest in later years in the explanatory The first priority, however, is to resolve the evalua- potential of qualitative evaluation methods. The tion issues (is the package any good at all, the debate about experimentation and causality in best available? etc.). Too often the research orien- evaluation continues to be vigorously pursued in tation and training of evaluators leads them to do various guises. For example, in a recent authori- a poor job on evaluation because they became tative text on experimentation and causal infer- interested in explanation.’ (Scriven, 1991, p. 158). ence, (Shadish et al., 2002) the authors begin to Scriven himself recognises that one pressure take on board contemporary criticisms of experi- moving evaluation to pay greater attention to mental methods that have come from the philos- explanation is the emergence of programme ophy of science and the social sciences more theory, with its concern about how programmes generally. In recent years, we have also seen a operate so that they can be improved or better sustained realist critique on experimental implemented. A parallel pressure comes from the methods led in Europe by Pawson and Tilley uptake of impact assessment associated with the (1997). But, whatever their orientations to experi- growth of performance management and other mentation and causal inference, explanations managerial reforms within public sector adminis- remain at the heart of the concerns of an impor- trations. The intellectual basis for this work was tant constituency within evaluation. most consistently elaborated by Wholey and colleagues. They start from the position that eval- uation should be concerned with the efficiency 2.2. Evaluation, change and and effectiveness of the way governments deliver values public services. A core concept within this approach is what is called ‘evaluability assess- Another important strand in evaluation thinking ment’ (Wholey, 1981). The starting point for this concerns the relationship between evaluation and assessment is a critical review of the logic of action or change. One comparison is between programmes and the assumptions that underpin ‘summative’ and ‘formative’ evaluation methods, them. This work constitutes the foundation for terms also coined by Scriven. The former most of the thinking about programme theory and assesses or judges results and the latter seeks to logical frameworks. It also prefigures a later gener- influence or promote change. Various authors ation of evaluation thinking rooted more in policy have contributed to an understanding of the role analysis that is concerned with the institutionalisa- of evaluation and change. For example, Cron- tion of evaluation within public agencies (Boyle bach (1982, 1989) rooted in policy analysis and and Lemaire, 1999), as discussed further below. education, sees an important if limited role for These management reforms generally link evaluation in shaping policy ‘at the margins’ interventions with outcomes. As Rossi et al. through ‘piecemeal adaptations’. The role of eval- recognise, this takes us to the heart of broader uation in Cronbach’s framework is to inform poli- debates in the social sciences about causality: cies and programmes through the generation of Philosophies and types of evaluation research 15

knowledge that feeds into the ‘policy shaping Campbell began to emphasise more the impor- community’ of experts, administrators and tance of qualitative understanding. However, policy-makers. Stake (1996) on the other hand, among those concerned with formative, respon- with his notion of ‘responsive evaluation’, sees sive and other change-oriented evaluations, there this as a ‘service’ to programme stakeholders is a predominance of qualitative methods. The and to participants. By working with those who scope of ‘qualitative’ includes processes as well are directly involved in a programme, Stake sees as phenomena, which by their nature require the evaluator as supporting their participation and qualitative description. This would include, for possibilities for initiating change. This contrasts example, the means through which a programme with Cronbach’s position and even more strongly was being implemented as well as the dynamics with that of Wholey (referred to earlier) given that occur during the course of evaluation (e.g. Stake’s scepticism about the possibilities of learning to do things better, improving proce- change at the level of large scale national (or in dures, overcoming resistance). the US context Federal and State) programmes Stake emphasises qualitative methods. He is and their management. Similarly, Patton, (1997 often associated with the introduction of case and earlier editions) who has tended to eschew studies into evaluation practice, although he also work at programme and national level, shares advocates a full range of observational, interview with Stake a commitment to working with stake- and conversational techniques (Stake, 1995). holders and (local) users. His concern is for Patton’s commitment to qualitative methods is ‘intended use by intended users’. reinforced by his interest in the use by Virtually everyone in the field recognises the programme managers of the results of evalua- political and value basis of much evaluation tions. ‘They must be interested in the stories, activity, albeit in different ways. While Stake, experiences and perceptions of programme Cronbach and Wholey may recognise the impor- participants’ (Patton, 2002; p. 10). tance of values within evaluation, the values that they recognise are variously those of stake- holders, participants and programme managers. 2.4. Evaluation types There is another strand within the general orienta- tion towards evaluation and change which is After this ‘tour’ around some of the main argu- decidedly normative. This category includes ments and positions in evaluation, it becomes House, with his emphasis on evaluation for social possible to return to the matter of definition and justice and the emancipatory logic of Fetterman types of evaluation. This is not a simple or single et al. (1996) and ‘’. definition but types of evaluation can be seen to Within the view of Fetterman and his colleagues, cohere around two main axes. The first axis is evaluation itself is not undertaken by external methodological and the second concerns experts but rather is a self-help activity in which – purposes. because people empower themselves – the role In terms of methodologies, looking across the of any external input is to support self-help. So, different approaches to evaluation discussed above, one of the main differences among those evalua- we can distinguish three methodological positions: tors who explicitly address issues of programme (a) the criteria or standards based position, and societal change is in terms of the role of eval- which is concerned with judging success and uators, be they experts who act, facilitators and performance by the application of standards; advocates, or enablers of self help. (b) the causal inference position, which is concerned with explaining programme impacts and success; 2.3. Quantitative and qualitative (c) the formative or change oriented position, methods which seeks to bring about improvements both for programmes and for those who Much of the literature that forms the foundation of participate in them. the explanatory strand in evaluation is quantita- Alongside these methodological distinctions tive, even though we have noted that the later are a series of definitions that are concerned 16 The foundations of evaluation and impact research

with evaluation purposes. Distinguishing evalua- of the achievements of a programme or tion in terms of purpose has been taken up by policy; many authors including Vedung (1997), evalua- (b) development, where the intention is to tors at the Tavistock Instititute (Stern, 1992; improve the delivery or management of a Stern et al., 1992) and Chelimsky (1995, 1997). programme during its term; Most of these authors distinguish between (c) knowledge production, where the intention is different evaluation purposes that are clearly to develop new knowledge and understanding; consistent with the overview presented above. (d) social improvement, where the intention is to Along this axis, we can distinguish between the improve the situation of the presumed benefi- following purposes: ciaries of public interventions. (a) accountability, where the intention is to give There is a degree of correlation between these an account to sponsors and policy-makers two axes as Table 1 suggests.

Table 1: Overlaps between methodology and purpose

Purposes Methodology

Criteria and standards Causal inference Change orientation Outcome and impact Accountability evaluations. Mainly summative Formative evaluation Development of programmes ‘What works’ – Knowledge production improving future policy/practice Empowerment and Social improvement participative evaluations

Evaluation for the purpose of accountability alongside, and sometimes competes with, tends to be concerned with criteria and stan- management and delivery logic. Evaluation is dards (or indicator studies). Development evalua- often seen by programme managers as a means tions use change oriented methods to pursue the of supporting improved effectiveness of imple- desired improvements in programme delivery. mentation. In many institutional settings, funds Evaluations for the purpose of knowledge are committed for evaluation to meet account- production are often concerned with drawing ability purposes (Vedung, 1997), but are spent causal inference from evaluation data. Finally, mainly for managerial, formative and develop- evaluations for the purposes of social improve- mental purposes. Nor is causal inference always ment are also preoccupied with change oriented absent from evaluation purposes concerned with methods, though to improve the circumstances social improvement. Nonetheless, the clusterings of programme participants and citizens rather represented in Table 1 do summarise the main than programme management per se. However, types of evaluation. These are: this is not to suggest a one-to-one association (a) accountability for policy-making evaluations between methodologies and purposes. that rely on criteria, standards and indicators; In the world of evaluation in practice, there are (b) development evaluations that adopt a change also incompatibilities and tensions. Thus, the orientated approach in order to improve accountability driven goal of evaluation often sits programmes; Philosophies and types of evaluation research 17

(c) knowledge production evaluations that are While both of the previous evaluation types are concerned to establish causal links explana- expected to affect current programme learning tions and valid knowledge; and knowledge production, this type looks to (d) social improvement evaluations that seek to apply lessons to future programmes and policies. improve the circumstances of beneficiaries by Social improvement evaluation can take many deploying change, advocacy and facilitation forms. Many social and economic programmes skills. depend for their success on consensus among Already implicit in the above discussions and the intended beneficiaries. Participative evalua- definitions is the dimension of time. Evaluations for tions that seek to involve target groups contribute the purpose of accountability tend to occur at the to the development of consensus and consent. end of a programme cycle. Development oriented This type of evaluation may also take on an advo- evaluations tend to occur while the programme is cacy role: promoting certain interests or groups. ongoing, and knowledge production evaluations It is within this evaluation purpose that can continue long after the initial programme cycle programme beneficiaries are most likely to be has ended. Wholey’s concept of evaluability directly involved, not merely consulted. focuses attention on the initial programme logic These different evaluation types, can be further while Cronbach’s interest in the ‘policy shaping elaborated, in terms of the following questions: community’ carries over into the long term and the (a) who are the stakeholders? periods of transition between one programme and (b) what is the focus of the evaluation? another. Notions of ex-ante evaluation (and (c) what are the main approaches and methods? appraisal or needs analysis), ongoing or mid-term (d) what are the key questions that can be evaluations, and ex-post evaluations that have asked? been adopted as a basic framework by the Euro- Table 2 presents the main elements of the four pean Commission and other agencies, derive from evaluation types in relation to these questions. these different understandings of when evaluation However, we would not wish to suggest that activity is most relevant. these types do full justice to the diversity of eval- The main types of evaluation identified above uation models; rather they summarise the main can be further elaborated in terms of the kinds of high level differences. It is possible, for example, questions they ask, the stakeholders that are to see the emergence of sometimes contradictory included and the focus of their activities. evaluation subtypes in recent years. Two exam- Accountability for policy-making, evaluation meets the needs of external stakeholders who ples of these are outcome focused evaluations require the delivery of programme or policy and participation focused evaluations. outputs. Management may also demand accountability but here we mean external management rather than management internal to 2.5. A focus on results and a programme or policy area. This has become a outcomes dominant form of evaluation in public administra- tions, consistent with the growth of performance The concern that public interventions should lead management philosophies more generally. Evalu- to specific and measurable results is mirrored in ations of this type tend to occur at the end of a the development of evaluation practice. In programme or policy cycle and focus on results. complex socioeconomic programmes in partic- Development evaluation follows the lifecycle of ular, the tendency is often to focus on interme- an initiative with the aim of improving how it is managed and delivered. These evaluations are diate outcomes and processes of implementa- more likely to meet the needs of internal tion. Sometimes this is inevitable, when the final managers and partners rather than external results of interventions will only be discernible in stakeholders. Formative evaluations and process the long term. Contemporary models of public evaluations tend to fall into this category. management create a demand for methods that Knowledge production evaluation is mainly focus on results and there has been considerable concerned with understanding in the longer term. investment in such methodologies in recent These evaluations often seek to synthesise under- years. These methodologies tend to be in three standing coming from a number of evaluations. areas. 18 The foundations of evaluation and impact research

Table 2: Evaluation types

Purpose Stakeholder Focus Main evaluation Key questions approaches Accountability Parliaments, Ministers, Impacts, outcomes, Indicators, What have been for policy-makers funders/sponsors, achievement of targets, performance the results? Management Boards value for money measures, value for Are they intended money studies, or unintended? quantitative surveys Are resources well-used? Development Project coordinators. Identifying constraints. Relating inputs to How well is the for programme Partner organisations. How they should be outputs, qualitative programme being improvement Programme managers overcome? Delivery description, following managed? and implementation processes over time Can it be strategies. implemented better? Knowledge Programme planners, Dissemination of Experimental and What is being learnt? production and policy-makers. good practice. quasi-experimental Are there lessons explanation Academics What works? studies, case studies, that can be applied Organisational change systematic reviews elsewhere? and syntheses How would we do it next time? Social improvement Programme To ensure full Stakeholder What is the best and social change beneficiaries and involvement, influence involvement, way to involve civil society and control by citizens participative reviews, affected groups? and affected groups. advocacy How can equal opportunities and social inclusion be ensured?

The first deals with systematic reviews. causal relationships. Such models are especially Reaching policy conclusions and taking actions useful where data sources are incomplete and on the basis of the evaluation of single projects, results have to be estimated rather than precisely or even programmes, has for long been criticised. measured. The evidence-based policy movement works on the assumption that it is necessary to aggregate the results of different evaluations through 2.6. systematic reviews in order to produce reliable Participatory methods and evidence. devolved evaluations Next is results based management. This is now a feature of most public management systems There is a general tendency in programme and and can be variously expressed in terms of policy evaluation for multiple stakeholder and targets, league tables, payment-by-results and citizen involvement. These general developments outcome funding. Within the Commission there have led to a spate of innovations among evalua- has been a move in this direction, under the label tors, who are now able to draw on an extensive of activity based management. It is also the repertoire of participative methods and tech- underlying principle of the performance reserve niques, many of them pioneered in international within the Structural Funds and relevant to development contexts. They include: rapid current debates about impact assessment. appraisal methods, empowerment evaluation, Finally there are macro and micro economic methods for involving stakeholders, and user- models. These seek to simulate the relationship focused evaluations. between key variables and explain outputs Evaluation is often seen as an instrument for through a mixture of available data and assumed developing social consensus and strengthening Philosophies and types of evaluation research 19

social cohesion. The expectation is that owner- Nonetheless, the role of devolved evaluation in ship and commitment by citizens to the management and ‘steering’ of programmes priorities will be maximised when they have also has been a noticeable trend over the last ten been involved in setting these priorities and eval- years. By requiring programme participants to uating the outcomes of interventions. There is a clarify their priorities, collect information, interpret strong managerial logic within large scale decen- findings and reflect on the implications, it is tralised programmes to use evaluation as an assumed that programme management at a instrument to strengthen programme manage- systemic level will be improved. ment by diffusing the culture of evaluation among all programme participants. This also focuses attention more generally on 2.7. Theory and practice who undertakes evaluations and where is evalua- tion located? Among the types outlined above, It has not been the intention to focus on evaluation the assumption is that some outside expert occu- practice in this chapter. However, it is worth pies the evaluator role. Already within the partici- reflecting briefly on how evaluation practice relates pative subtype just referred to, the role of the to some of the main debates outlined above: evaluator is far less prominent. The role of the (a) evaluations in the public sector are firmly evaluator – as orchestrator, facilitator and enabler – within ‘accountability’ and ‘programme is further elaborated in the discussion of construc- management’ purposes (e.g. Nagarajan and tivist evaluation in a later section of this chapter. Vanheukelen, 1997); However, even within other types of evaluation (b) the notion of ‘goal-free’ evaluation that does there are different possible types of operationali- not start from the objectives of programmes sations and locations of the evaluation role. One has never been favoured by public adminis- important variant is devolved evaluation. trations in Europe or elsewhere. Although It is becoming increasingly common for evalu- there is often scope to examine overall ation to become a devolved ‘obligation’ for impacts and to consider ‘unintentional conse- programme beneficiaries. In the European Struc- quences’ the design of most evaluations is tural Funds, requirements for ex-ante and firmly anchored around goals and objectives; mid-term evaluations are now explicitly the (c) there is a trend to take on board evaluation responsibility of Member States and monitoring criteria such as relevance, efficiency, effec- committees. The same is true of international tiveness, impact and sustainability (the now development aid within the CEC, where project standard World Bank and OECD criteria). In evaluation is consistently devolved to beneficia- the EU guide referred to above, these are ries. Often, those who evaluate on such a applied as evaluative judgements, in relation ‘self-help’ basis are required to undertake the to programme objectives and in relation to evaluations and must demonstrate that they socioeconomic problems as they affect target incorporate and use findings. In fact, the ‘devolu- populations; tion chain’ is far more extended. In the European (d) there is sometimes confusion between Structural Funds, monitoring committees will economic appraisal and evaluation. In most often require beneficiaries and programme public administrations judgements have to be managers to conduct their own evaluations. The made, before new policy initiatives are same is true for national programmes. In the UK launched, on whether to proceed or not (e.g. for example, ‘local evaluation’ conducted by the UK Treasury’s Green Book and recent EU projects within a programme are the norm. These guidance on impact assessment). For many are variously intended to focus on local concerns, economists, this pre-launch appraisal is seen inform local management and generate data that as the same as evaluation. In general it is will be useful for accountability purposes. sensible to confine the term evaluation to These intentions are not conflict-free. For what happens once a programme or policy example, top-down demands by the EU or by has been decided on; central governments can easily undermine the (e) macro-economic methods in particular are local focus on local needs (Biott and Cook, 2000). more difficult to apply, when the resource 20 The foundations of evaluation and impact research

input is relatively small. Where policy inputs evaluation and contribute to consensus are large scale and can be isolated from other process; inputs (e.g. in Objective 1, but not Objective 2 (g) the boundaries between research and evalua- or 3 within EU Structural Funds), they can be tion remain clouded. Many studies commis- more easily applied; sioned as evaluation contribute to knowledge (f) the status of stakeholders has undoubtedly production and are indistinguishable from been enhanced in most evaluations in recent research. Various distinctions have been years. However, the role of stakeholders is proposed, including the short- rather than generally as informants rather than sources of long-term nature of evaluation and its mainly evaluative criteria, let alone as judges of merit instrumental intent. However, few of these and worth. There is considerable scope within distinctions are watertight. For example, many decentralised and devolved evaluation of the elements within this overall study could systems for participative and constructivist be defined as research and, arguably, once a approaches. What is more common is close research-generated study is deployed for working with stakeholder to define criteria for evaluative purposes, its character changes. 3. The object of VET evaluation

What is evaluated is a factor in how evaluation is health’ incorporates environmental, lifestyle and practised. The object of evaluation is different in policy elements alongside data on illness topics different domains (health, transport, education, such as morbidity and mortality. Similarly, evalua- vocational training, etc.) and this partly shapes tion in education is now more likely to include what we call evaluation in these various domains. learning processes, socioeconomic and cultural At the same time, there are overarching charac- factors and broader pedagogic understandings teristics of evaluation objects that are similar alongside studies of classroom behaviour. It is across domains. In this section, we consider both probably more useful to think of evaluation approaches to the object of evaluation; those that configurations as composites of contingent eval- follow from the nature of the domain and those uation objects rather than a single evaluation that follow from the characteristics of what is object. being evaluated. As we shall see below, methodological devel- opments within evaluation mirror this contextuali- sation. There is a move away from ceteris paribus 3.1. The domain of VET assumptions, to focus increasingly on impacts in context. It is likely that the broadening concep- VET is a broad field that, at a minimum, includes tion of evaluation configurations such as VET is initial vocational training, continuing vocational the result of new methods and theories helping training, work-based learning and VET systems. redefine the core concept. As is often the case, However, this selection understates the scope methods and methodologies interact with core of VET as an object of evaluation. VET has content, which they also help to shape. become more complex and multi-faceted over Overall, most classes of evaluation object can the years as can be seen in previous Cedefop be found under the umbrella of VET and it is not reports on vocational training research in Europe. possible to associate the evaluation and impact This is mainly because there has been a shift of VET with a particular type of evaluation object from decontextualised studies of impact to or configuration. What is clear is that VET, as an studies that increasingly incorporate context. So object of evaluation, calls on a vast range of VET, even at the level of the firm, is seen as being disciplinary understandings, levels of analysis embedded in other corporate policies and proce- and potential areas of impact. The importance of dures such as marketing, the organisation of interdisciplinary evaluation efforts is highlighted production, supervisory and managerial practice by this discussion. and human resource management. In order to The scope of VET itself is further complicated describe, let alone explain, the impact of VET, this by the different understandings of impact that broader set of factors needs to be considered. characterise the field. We have particular studies The same is true for policy level interventions. For of the impact of continuing vocational training example, what is called ‘active labour-market (CVT): on company performance; on active labour policies’, especially for those who are markets; on individual employment and pay marginalised in the labour market, usually prospects; pedagogic methods as they influence includes VET, but this is embedded in a raft of learning-outcomes and competences; VET other policies including subsidies to employers, system reform affecting training outcomes; and restructuring of benefits and new screening and knowledge and qualifications as an influence on matching processes. national economic performance. This recontextualisation of the objects of eval- It is these clusters of interest – the preoccupa- uation is happening across many fields of evalua- tions of a domain at any given time – that circum- tive enquiry. Evaluations of health are no longer scribe the object of evaluation. It is the sets of confined to studies of illness. The ‘new public objects and understandings around what has 22 The foundations of evaluation and impact research

been called configurations that best describes spatial level (characteristics of the area) or in what distinguishes evaluation of VET from other terms of the context of delivery or the institutional domains. The impacts of CVT on company setting within which programmes are located or performance, the way in which it is possible to policies are expected to have an impact. In VET, improve initial vocational training through the relevant context may be a labour market, a changing qualification systems, and how VET training provider or an enterprise. A highly diver- affects economic performance and social integra- sified initiative may be located across different tion are all examples of what defines the object of kinds of contexts and, even within a single evaluation within VET. Such preoccupations also context, there may be considerable variety. The change over time. It is worth adding that such diversity or standardisation of the immediate preoccupations are also encapsulated in theoret- context will have many implications, in particular ical form. Topical theories – such as social exclu- for how policies and programmes are imple- sion, human capital, cultural capital, corporate mented and how much effort needs to be innovation – will be widely accepted in the VET devoted to the evaluation of implementation. domain as in others. Today’s theories also help Modes of delivery are also important since the define the evaluation object (see below for more same input or instrument can be delivered in very general discussion of theory in evaluation). different ways. For example, a needs analysis may be undertaken through a local survey as part of the recruitment process of potential 3.2. Overarching characteristics trainees or by a company reanalysing its of evaluation objects personnel data. Nowadays it would be more common for programmes to be delivered through Although it is not possible within the scope of this partnership arrangements rather than through a chapter to offer a full typology of evaluation single administrative chain. This will often be the objects, it is worth highlighting the kinds of differ- case, for example, in VET measures delivered ences that occur not only in the evaluation of VET through EU Structural Funds. but also in many other evaluation domains. There Settings need considerations as well given are many ways in which evaluation configurations the embedded and contextualised nature of can be differentiated; for simplicity’s sake the many evaluation objects and that isolated evalu- following examples concentrate on common ation objects are increasingly rare. With concep- dimensions such as similarity or difference, more tualisations that incorporate context, evaluation or less, etc. Of course, there are also much more objects have a tendency to become configura- complex descriptions of evaluation configurations. tions. A classic example of an evaluation object There are a number of important dimensions of that is presumed to be isolated is classroom- evaluation configurations, including input charac- based studies that ignore the overall school teristics. Most programmes are operationalised context or the socioeconomic characteristics of through inputs or policy instruments; in VET these a catchment area. By contrast, a VET measure include new curricula, new forms of funding for that is bundled together with a package of enterprise-based training or new training courses. incentives, vocational guidance measures and Such inputs may be standardised across a qualifications will need to be evaluated in this programme or may be more or less diverse. It is, wider context. for example, common for inputs to be carefully A further dimension is the number of stake- tailored to individual, local or enterprise needs. This holders. In any evaluation, there will be those who will have consequences for sampling and scale of have an interest in the evaluation and what is an evaluation. More seriously it will have implica- being evaluated. Within decentralised, multi- tions for the possibilities of generalisations that can agency programmes there are often many stake- be made on the basis of evaluation findings. holders, each with their own evaluation questions Another dimension is the immediate context. and judgement criteria. These might, for example, The context or setting within which an input is include regional authorities, training providers, located can also be relatively standardised or sectoral representatives, social partners and relatively diverse. This statement might apply at a European institutions. Philosophies and types of evaluation research 23

Finally, there is the degree of consensus. Poli- tional skills and work preparedness of the young cies and programmes may be contentious and will unemployed. be supported by a greater or lesser degree of Within the first scenario it would be possible consensus among stakeholders. Numerous stake- and appropriate to assess success in terms of a holders are often associated with lower levels of limited range of output and outcome measures consensus. Evaluations which draw a high level of and possibly to apply experimental and random consensus will be able easily to apply agreed assignment techniques as part of the evaluation criteria. Where there is lower consensus, quite procedure. Within this scenario there would be different criteria may need to be applied to evalua- limited resources devoted to the evaluation of the tion data and more work may need to be done to processes of implementation. There is also likely bring together different interests and perspectives. to be a limited number of stakeholders involved in This not only shapes methodology but also the the evaluation. work required of evaluators. Within the second scenario there will be a need While each of these characteristics or dimen- for several different measures of output and sions has consequences for the design of an outcomes. Comparisons across the programme evaluation and how it is organised, they also will be difficult to standardise given the diversity interact. For example, we can envisage two of modes of delivery and types of input or policy different scenarios. In the first, a single subsidy instrument. There is also likely to be limited is available to employers within firms in the retail consensus among the many different stake- sector to provide additional training to young holders involved in the programme and its imple- apprentices following a recognised national mentation. The use of experimental methods (e.g. qualification. In the second, a package of control groups and before and after measures) measures locally determined by partnerships of may be possible in such a configuration. It is also companies, training providers and regional likely that case studies that illustrate the way all authorities is available to firms, colleges and the various dimensions come together will be private training providers, to improve the voca- appropriate. 4. Philosophical foundations

be ultimately reducible to explanations in terms of 4.1. Positivism, observation and people’s beliefs, dispositions, and situations. […] theory It is a working doctrine of most economists, polit- ical scientists, and political historians in North Before addressing particular aspects of evaluation America and Britain.’ (Miller, 1991; p. 749). theory it is important to locate the role of theory in In this world-view, explanations rest on the evaluation within the broader set of debates within aggregation of individual elements and their the philosophy of science. The dominant school, behaviours and interactions. It is worth noting much criticised but of continuing influence in the that this has been described as a ‘doctrine’ as way we understand the world, is logical positivism. well as a methodological statement. It underpins Despite being largely discredited in academic circles for some 50 years, this school still holds many of the survey based and economic models sway in policy debates. It constitutes the base that are used in evaluation. model around which variants are positioned. With There is now widespread agreement that empir- a history that stretches back to Compte, Hume, ical work cannot rely only on observations. There Locke, Hobbes and Mill, positivism emerged partly are difficulties empirically observing the entirety of as a reaction to metaphysical explanations: that any phenomena; all description is partial and there was an ‘essence’ of a phenomenon that incomplete, with important unobservable could be distinguished from its appearance. At the elements. ‘Scientists must be understood as heart of positivism therefore, is a belief that it is engaged in a metaphysical project whose very possible to obtain objective knowledge through rules are irretrievably determined by theoretical observation and that such knowledge is verified by conceptions regarding largely unobservable statements about the circumstances in which such phenomena.’ (Boyd, 1991; p. 12). This is even knowledge is true. more true for mechanisms which it is generally In the field of evaluation, House (1983) has recognised can be imputed but not observed. As discussed this tradition under the label of objec- Boyd goes on to say, ‘it is an important fact, now tivism: ‘Evaluation information is considered to be universally accepted, that many or all of the central “scientifically objective.” This objectivity is methods of science are theory dependent’. achieved by using “objective” instruments like This recognition of the theory dependence of tests or questionnaires. Presumably, results all scientific inquiry underpins the now familiar produced with these instruments are reproducible. critiques of logical positivism, even though there The data are analysed by quantitative techniques is considerable difference between the alterna- which are also “objective” in the sense that they tives that the critics of positivism advocate. can be verified by logical inspection regardless of The two most familiar critiques of positivism who uses the techniques.’ (House, 1983; p. 51). are scientific realism and constructivism. House goes on to emphasise that part of objectivist tradition that he calls ‘methodological individualism’ in Mill’s work in particular. Thus, repeated observation of individual phenomena is 4.2. Scientific realism the way to identify uniformity within a category of phenomena. This is one important strand in the Scientific realism, while acknowledging the limits mainstream of explanations within the social and of what we can know about phenomena, asserts economic sciences. It is the basis for reduc- that theory describes real features of a not fully tionism: the belief that it is possible to understand observable world. Not all realists are the same the whole by investigating its constituent parts. and the European tradition currently being ‘By methodological individualism, I mean what- inspired mainly by the work of Pawson (Pawson ever methodologically useful doctrine is asserted and Tilley, 1997; Pawson, 2002a and b) can be in the vague claim that social explanations should distinguished in various ways from US realist Philosophies and types of evaluation research 25

thinking. For example, some prominent North particular programmes. It is the mechanisms that American realists commenting on Pawson and bring about change and any programme will Tilley’s work have questioned the extent to which probably rely on more than one mechanism, not realists need completely to reject experimental all of which may be evident to programme archi- and quasi-experimental designs, and suggest tects or policy-makers. that more attention should be paid in the realist As Pawson and Tilley summarise the logic of project to values. This is especially important if, in realist explanation: ‘The basic task of social inquiry addition to explanation, realists are to influence is to explain interesting, puzzling, socially signifi- decisions (Julnes et al., 1998). Nonetheless, this cant regularities (R). Explanation takes the form of chapter draws mainly on the work of Pawson and positing some underlying mechanism (M) which Tilley to describe the realist position in evaluation. generates the regularity and thus consists of In some ways realism continues the positivist propositions about how the interplay between project: it too seeks explanation and believes in structure and agency has constituted the regu- the possibility of accumulating reliable knowledge larity. Within realist investigation there is also inves- about the real world, albeit through different tigation of how the workings of such mechanisms methodological spectacles. According to Pawson are contingent and conditional, and thus only fired and Tilley, it seeks to open the ‘black-box’ within in particular local, historical or institutional contexts programmes or policies to uncover the mecha- (C)’ (Pawson and Tilley, 1997; p. 71). nisms that account for what brings about change. Applying this logic to VET, we may note, for It does so by situating such mechanisms in example, that subsidies to increase work-based contexts and attributing to contexts the key to learning and CVT in firms sometimes lead to what makes mechanisms work or not work. This greater uptake by the intended beneficiaries. This is especially important in domains such as VET need not lead to the assessment of the programme where the evaluation objects are varied and as ineffective because, for example, positive drawn from different elements into different outcomes can only be observed in 30 % of cases. configurations in differentiated contexts. We try rather to understand the mechanisms and ‘What we want to resist here is the notion that contexts which lead to success. Is the context one programs are targeted at subjects and that as a where firms showing positive outcomes are in a consequence program efficacy is simply a matter particular sector or value chain or type of region? of changing the individual subject.’ (Pawson and Or is it more to do with the skill composition of the Tilley, 1997; p. 64). firms concerned? Are the mechanisms that work in Rather than accept a logic that sees these contexts effective because a previous invest- programmes and policies as simple chains of cause ment has been made in work-based learning at the and effect, they are better seen as embedded in firm level or is it because of the local or regional multilayered (or stratified) social and organisational training infrastructure? Which mechanisms are at processes. Evaluators need to focus on ‘underlying play and in what context: mechanisms’: those decisions or actions that lead (a) the competitive instincts of managers (mech- to change, which is embedded in a broader social anism), who fear that their competitors will reality. However these mechanisms are not uniform benefit (context) unless they also increase or consistent even within a single programme. their CVT efforts? Different mechanisms come into play in different (b) the demands of trade unions concerned about contexts, which is why some programmes or policy the professionalisation and labour-market instruments work in some, but not all, situations. strength of their members (mechanism), Like all those interested in causal inference, sparked off by their awareness of the avail- realists are also interested in making sense of ability of subsidies (context)? patterns or regularities. These are not seen at the (c) the increased effectiveness of the marketing level of some programme level aggregation but efforts of training providers (mechanism) made rather at the underlying level where mechanisms possible by the subsidies they have received operate. As Pawson and Tilley (1997; p. 71) note: (context)? ‘regularity = mechanism + context’. Outcomes According to the realists, it is by examining and are the results of mechanisms unleashed by comparing the mechanisms and contexts in which 26 The foundations of evaluation and impact research

they operate in relation to observed outcomes, that (b) differentiating a programme and its instru- it becomes possible to understand success and ments more clearly to ensure that different describe it. For Pawson and Tilley, all revolves mechanisms that work in different contexts around these CMO (context, mechanism, outcome) are adequately covered; configurations. (c) seeking to influence the contexts within which Policy-makers are then in a position to the programme aims to be effective. consider options such as: The table below, taken from the concluding (a) focusing the programme more narrowly at chapter of Pawson and Tilley’s book, provides a beneficiaries that are likely to change brief summary of the realist position, in terms of because of the mechanisms that work in the eight ‘rules’ that are seen as encapsulating the contexts they inhabit; key ideas of realistic enquiry and method.

Table 3: Rules guiding realistic enquiry and method

Rule 1: Generative causation Evaluators need to attend to how and why social programmes have the potential to cause change Rule 2: Ontological depth Evaluators need to penetrate beneath the surface of observable inputs and outputs of a programme Rule 3: Mechanisms Evaluators need to focus on how the causal mechanisms which generate social and behavioural problems are removed or countered through the alternative causal mechanisms introduced in a social programme Rule 4: Contexts Evaluators need to understand the contexts within which problem mechanisms are activated and in which programme mechanisms can be successfully fired Rule 5: Outcomes Evaluators need to understand what are the outcomes of an initiative and how they are produced Rule 6: CMO configurations In order to develop transferable and cumulative lessons from research, evaluators need to orient their thinking to context-mechanism-outcome pattern configurations (CMO configurations) Rule 7: Teacher-learner processes In order to construct and test context-mechanism-outcome pattern explanations, evaluators need to engage in a teacher-learner relationship with program policy-makers, practitioners and participants Rule 8: Open systems Evaluators need to acknowledge that programmes are implemented in a changing and permeable social world, and that programme effectiveness may thus be subverted or enhanced through the unanticipated intrusion of new contexts and new causal powersAdapted from Pawson and Tilley (1997) Source: Adapted from Pawson and Tilley (1997)

real. They are as real as anything scientists can 4.3. Constructivists study ever gets. The impression that there is Constructivists deny the possibility of objective some sort of socially unconstructed reality that is knowledge about the world. They follow more in somehow deeper than the socially constructed the tradition of Kant and other continental Euro- variety rests, the constructivist maintains, on a pean philosophers than the mainly Anglo Saxon failure to appreciate the theory-dependence of all school that underpins positivism and realism. It is our methods. The only sort of reality any of our only through the theorisations of the observer methods are good for studying is a theory- that the world can be understood. dependent reality.’ (Boyd, 1991; p. 13). ‘Socially constructed causal and metaphysical The way we know, whatever the instruments phenomena are, according to the constructivist, and methods we use, is constructed by human Philosophies and types of evaluation research 27

actors or stakeholders. According to Stufflebeam evaluators be totally ethical in respecting and in his review of Foundation models for 21st advocating for all the participants, especially the century program evaluation: ‘Constructivism disenfranchised.’ (Stufflebeam, 2000a; pp. 71-72). rejects the existence of any ultimate reality and The most articulate advocates of construc- employs a subjectivist . It sees tivism in evaluation are Guba and Lincoln. They knowledge gained as one or more human have mapped out the main differences between constructions, uncertifiable, and constantly prob- constructivists and the ‘conventional’ position (as lematic and changing. It places the evaluators and they label positivists) in their well-known text program stakeholder at the centre of the inquiry Fourth generation evaluation (Guba and Lincoln, process, employing all of them as the evaluation’s 1989). The highlights of this comparison is “human instruments”. The approach insists that summarised in the table below:

Table 4: Comparing constructivist and ‘conventional’ evaluation

Conventional Constructivist The truth of any proposition (its credibility) can be determined by submitting it semiotically to the judgement of a group of informed The truth of any proposition (its factual and sophisticated holders of what Nature of truth quality) can be determined by testing it may be different constructions. empirically in the natural world. Any Any proposition that has achieved proposition that has withstood such consensus through such a test is a test is true; such truth is absolute regarded as true until reconstructed in the light of more information or increased sophistication; any truth is relative. A proposition that has not been tested A proposition is neither tested empirically cannot be known to be true. nor untested. It can only be known Limits of truth Likewise, a proposition incapable of to be true (credible) in relation empirical test can never be confirmed to and in terms of informed to be true. and sophisticated constructions. Constructions exist only in the minds of constructors and typically cannot Whatever exists in some measurable be divided into measurable entities. Measurability amount. If it cannot be measured If something can be measured, it does not exist. the measurement may fit into some constructions but it is likely, at best, to play a supportive role. Facts are aspects of the natural world Facts are always theory-laden, that is, Independence that do not depend on theories that they have no independent meaning of facts and theories happen to guide any given inquiry. except within some theoretical framework. Observational and theoretical There can be no separate observational languages are independent. and theoretical languages. Facts and values are independent. Facts can be uncovered and arrayed independently of the values Facts and values are interdependent. Independence that may later be brought to bear to Facts have no meaning except within of facts and values interpret or give meaning to them. some value framework; they are There are separate factual and value-laden. There can be no separate valuational languages, the former observational and valuational languages. describing ‘isness’ and the latter ‘oughtness’. Source: adapted from Guba and Lincoln, 1989 28 The foundations of evaluation and impact research

According to Guba and Lincoln, when consid- defined by Guba and Lincoln as a process that ering the purpose of evaluations, one needs to brings together divergent views and seeks to distinguish both between merit and worth and interpret and synthesise them mainly to ‘allow between summative and formative intent: their mutual exploration by all parties’ (Guba and (a) a formative merit evaluation is one concerned Lincoln, 1989; p. 149). As Schwandt has argued with assessing the intrinsic value of some from a postmodernist standpoint, ‘only through evaluand with the intent of improving it; so, situated use in discursive practices or language for example, a proposed new curriculum games do human actions acquire meaning’ could be assessed for modernity, integrity, (Schwandt, 1997; p. 69). Applied to evaluation, continuity, sequence, and so on, for the sake this position argues for the importance of the of discovering ways in which those character- ‘dialogic encounter’ in which evaluators are istics might be improved; ‘becoming partners in an ethically informed, (b) a formative worth evaluation is one reasoned conversation about essentially concerned with assessing the extrinsic value contested concepts […]’ (Schwandt, 1997; p. 79). of some evaluand with the intent of improving In more down to earth terms, Guba and it; so, for example, a proposed new Lincoln emphasise the role of the evaluator to: curriculum could be assessed for the extent (a) prioritise those unresolved claims, concerns to which desired outcomes are produced in and issues of stakeholders that have survived some actual context of application, for the earlier rounds of dialogue orchestrated by the sake of discovering ways in which its perfor- evaluator; mance might be improved; (b) collect information through a variety of means (c) a summative merit evaluation is one – collating the results of other evaluations, concerned with assessing the intrinsic value reanalysing the information previously gener- of some evaluand with the intent of deter- ated in dialogue among stakeholders, mining whether it meets some minimal (or conducting further studies – that may lead to normative or optimal) standard for modernity, the ‘reconstruction’ of understandings among integrity, and so on. A positive evaluation stakeholders; results in the evaluand being warranted as (c) prepare and carry out negotiations that, as far meeting its internal design specifications; as possible and within the resources avail- (d) a summative worth evaluation is one able, resolve that which can be resolved and concerned with assessing the extrinsic value (possibly) identify new issues that the stake- of some evaluand for use in some actual holders wish to take further in another evalu- context of application. A positive evaluation ation round. results in the evaluand being warranted for So how might this be exemplified in the VET use in that context. (Guba and Lincoln, 1989; domain? It should be noted that what follows pp. 189-190). does not fully conform to Guba and Lincoln’s In practical terms, what it is that the evaluator vision of constructivist evaluation, largely should do, Guba and Lincoln start from the because it is situated in a larger scale socioeco- ‘claims, concerns and issues’ that are identified nomic policy context than many of their own by stakeholders, people ‘who are put at some smaller scale case examples. But also it should risk by the evaluation’. It is therefore necessary be noted that constructivist thinking is, to some for evaluators to be ‘responsive’. ‘One of the extent, relevant to many contemporary evaluation major tasks for the evaluator is to conduct the challenges and the example below is intended to evaluation in such a way that each group must illustrate such potential relevance. confront and deal with the constructions of all So, to apply this logic to VET, constructivist others, a process we shall refer to as hermeneutic thinking can be especially helpful where there is a dialectic. […] Ideally responsive evaluation seeks problem area with many stakeholders and the to reach consensus on all claims, concerns and entire system will only be able to progress if there issues […]’ (Guba and Lincoln, 1989; p. 41). is a broad consensus. For example, there may be A distinctive role of the evaluator, therefore, is a political desire to become more inclusive and to help put together ‘hermeneutic’ circles. This is involve previously marginalised groups in training Philosophies and types of evaluation research 29

opportunities. The problem is how to ensure that (c) feed back to all stakeholders their own and certain groups such as women, young people each other’s interests, knowledge and and ethnic communities are given a higher profile concerns in a way that emphasis the similari- in VET. Here, the involvement of many stake- ties and differences; holders will be inevitable. Furthermore the views (d) clarify areas of agreement and disagreement of these stakeholders are more than data for the and initiate discussions among the stake- evaluator: they are the determinants and shapers holders and their representatives to clarify of possible action and change. Unless the areas of consensus and continuing dissent; trainers, employers, advocacy groups, funding (e) agree what other sources of information authorities and employment services responsible could help move the stakeholders forward – for job-matching and the groups being ‘targeted’ perhaps by synthesising other available cooperate, change will not occur. It is also likely studies, perhaps by initiating new studies; that these stakeholders hold vital information and (f) reach the best possible consensus about what insights into the past experience of similar efforts; should be done to improve VET provision and what went wrong and right and what could be participation for the groups concerned. done to bring about improvements in the future. It is worth highlighting that the balance of The evaluator might then follow much of the activities within constructivist evaluation is very constructivist logic outlined above: different from both positivist and realist variants. (a) identify the different stakeholders who poten- It emphasises the responsive, interactive, dialogic tially have a stake in these areas of concern; and ‘orchestrating’ role of the evaluator because (b) conduct a series of initial discussions to the sources of data that are privileged are seen to clarify what they know, what they want and reside with stakeholders, as much as with new what are their interests; studies and externally generated data. 5. Evaluation theory

There has been a strong bias within evaluation as ‘The following three domain theories are part of it has evolved to focus on method, technique the general normative theory: 1) treatment theory and, to a lesser extent, methodology. It is only in specifies what the nature of the program treat- recent years that there has been an upsurge in ment should be; 2) implementation environment interest in the role of theory in evaluation. To theory specifies the nature of the contextual envi- some extent, this reflects the wider debates from ronment within which the program should be within the philosophy of science that has been implemented; 3) outcomes theory specifies what sketched out above. From the early 1990s the nature of the program outcomes should be. onwards there has been a re-balancing of The following three domain theories are related attention towards theory. Chen’s book, Theory-driven to the general causative theory: 1) impact theory evaluations (1990), has become a landmark in this specifies the causal effect between the treatment shift in focus towards theory. The now classic and the outcome; 2) intervening mechanism text Foundations of evaluation (Shadish et al., theory specifies how the underlying intervening 1991) is organised around five main bodies of processes operate; 3) generalization theory spec- theory: social programming, knowledge construc- ifies the generalizability of evaluation results to tion, valuing, knowledge use and evaluation the topics or circumstances of interest to stake- practice. holders.’ (Chen, 1990; pp. 49 and 51) As it has been widely recognised, Weiss was Although avowedly seeking to escape from the among the first to direct our attention to the limitation of input/output thinking, Chen’s importance of theory (Weiss, 1972) and has conceptualisation is still linear. His domains actively carried forward this debate under the follow the treatment/implementation/outcome umbrella of the Aspen Institute’s New approaches logic and incorporate concepts such as inter- to evaluating Community initiatives (Connel et al., vening mechanisms: ‘the causal processes 1995; Fulbright-Anderson et al., 1998). While the underlying a program so that the reasons a starting point of the discussion that follows is programme does or does not work can be under- these authors, it continues to need to be situated stood’ (Chen, 1990; p. 191). in the broader philosophical debates outlined The underlying logic of causality favoured by earlier. Chen is essentially consistent with classic experi- mental thinking. For example, with regard to a programme that used comic books to influence 5.1. Programme theory adolescent smoking: ‘The underlying causal mechanism of this program is the assumption that The dominant school of theory in evaluation is the comic book will attract adolescents’ interest ‘programme theory’. This is concerned with and attention and that they will read it closely and opening up the programme ‘black-box’, going frequently, and thereby pick up the important anti- beyond input/output descriptions and seeking to smoking message contained in it. The message understand how programmes do and do not work. will in turn change their attitudes, beliefs, and Chen’s conceptualisation distinguishes behaviour regarding smoking. The causal structure ‘normative’ and ‘causative’ components of of this program is that the program treatment vari- programme theory which he defines as ‘a specifi- able (exposure to the comic book) attempts to cation of what must be done to achieve the affect the intervening variable (the intensity of desired goals, what other important impacts may ), which in turn will affect the outcome vari- also be anticipated and how these goals and ables (attitudes, beliefs, and behaviour toward impacts would be generated’ (Chen, 1990; p. 43). smoking)’ (Chen, 1990, p. 193). Chen’s conceptualisation extends to what he In comparison to realist evaluation approaches identifies as ‘six domains’. (see above), there continues to be an emphasis Philosophies and types of evaluation research 31

on the programme’s interventions rather than on can pinpoint where in the theory the assump- the mechanism operating within the context. For tions break down. This should enable the example, if we try to answer the question ‘Why program to take corrective action before too are some adolescents more likely to be influ- much time goes by; enced by such exposure?’, answers do not fall (c) The results of a evaluation out easily from traditional programme theory can be more readily generalized across logic. It is these ‘underlying mechanisms’ that are programs. Seeing the successes and the not explained in this framework. Nor are the failures between closely linked assumptions, contexts within which these interventions occur such as between greater parental attention to explored in detail. children and better child behaviour, is easier than between say, parenting education programs and better child behaviour.’ (Weiss, 2000). 5.2. Theory based evaluation and By focusing on the assumptions of ‘programme theories of change practitioners’ and aspiring to encourage consensus among them, this vision of evaluation theory shares many features with the participative The other main strand of theory in evaluation is labelled ‘theory based evaluation’ and latterly and even constructivist schools of evaluation. As ‘theory of change’ and is associated with the described by some of its main proponents, the Aspen round table on comprehensive community ‘theory of change’ evaluation approach – as it has initiatives. In Volume 1 of the Aspen collection, come to be called – is also a collaborative, dialogic Weiss identifies four main rationales for theory process. It takes the themes of practitioners, based evaluation: ‘1) It concentrates evaluation makes them coherent, explicit and testable and attention and resources on key aspects of the then seeks to measure and describe programme program; 2) it facilitates aggregation of evaluation outcomes in these terms. results into a broader base of theoretical and Overall, over the last seven or eight years in program knowledge; 3) it asks program practi- particular, there has been a gradual blurring of tioners to make their assumptions explicit and to what is meant by programme theory. The term reach consensus with their colleagues about now seems to encompass several approaches what they are trying to do and why; 4) evaluations that unpick the logic of programmes, make that address the theoretical assumptions explicit their assumptions, work with stake- embedded in programs may have more influence holders, monitor progress and explain the on both policy and popular opinion.’ (Weiss, outcomes that are observed (Rogers, 2000 as an 1995; p. 69). example of this tendency). As such, it has come This is also, in essence, a programme theory to include what is now classic programme theory approach. In a subsequent volume, Connel and with theory based evaluations and realistic evalu- Kubisch define this evaluation approach ‘as a ations such as those advocated by Pawson and systematic and cumulative study of the links Tilley (Rogers, 2000; p. 219). between activities, outcomes, and contexts of the The programme theory approach has also initiative.’ (Fulbright-Anderson et al., 1998; p. 16). been taken on board by those who advocate They emphasise collaborative working with logic models or logical frameworks that link stakeholders to bring to the surface underlying outcomes with programme activities and mechanisms. processes. Thus, a recent W.K. Kellogg Founda- Weiss herself takes these ideas of joint working tion guide (2000) also makes the link with ‘theo- further: ‘Three big advantages for pursuing a retical assumptions/principles of the programme’ theory of change evaluation are as follows: and devotes an entire chapter to ‘developing a (a) a theory of change evaluation allows evalua- theory of change logic model for your tors to give early word of events without programme’. This is an important development having to wait until the end of the whole given the power of logic models in the world of program sequence; evaluation. These were initiated by the World (b) the evaluators can identify which assump- Bank and taken on also by the EU. One of the tions are working out and which are not. They main criticisms of these models is their lack of 32 The foundations of evaluation and impact research

explanatory power and a-theoretical nature. tion that evaluators will have some knowledge of Bringing theory-based approaches into the logic the domain contexts within which they work, model framework begins to address some of including relevant domain theories. these criticisms. There are also substantial bodies of relevant theory about policy change and implementation deriving mainly from political science and . For example this extends beyond Chen’s 5.3. A wider theoretical frame description of implementation environment evalu- However, we would not wish to confine descrip- ation (Chen, 1990). It is well exemplified by the tions of theory solely to programme theory and work of people such as Sabatier (1988) and associated elaborations. The Shadish et al. (1991) Sabatier and Jenkins-Smith (1993). There is also framework includes a theory of social program- more generic literature on implementation and ming. However, their focus is more on a theory of change, often encapsulated under the heading what social programmes do and how effective ‘the diffusion of innovation’ and following on from they are. This is consistent with their overall the work of Rogers (1995). This is particularly approach which is to elaborate the theoretical relevant to the issue of generalisability of innova- basis for evaluation practice. Other theoretical tions that are first broached on pilot basis. focuses for Shadish et al. concern: In summary, five bodies of theory appear to be (a) the theory of use, i.e. what is known about relevant to evaluators: how to encourage use; (a) theories of evaluation. These would include (b) the theory of valuing, i.e. about judging programme theory; theories of change outcomes and the role of values and stake- approaches and realist approaches which holder interests in such judgements; emphasise the identification of mechanisms (c) the theory of knowledge in evaluation, i.e. the underlying successful change which have to familiar questions of what constitutes knowl- be understood in specific contexts and edge, explanation and valid method; settings; (d) the theory of practice in evaluation, i.e. the (b) theories about evaluation. Thus there is a main decisions about resource allocation, the growing literature on evaluation practice, use, choice of methods, questions to ask and design and capacity. Included in this category evaluation purposes. would be particular aspects of practice iden- Beyond the various focuses identified – that is tified by Shadish et al. such as theories of programme theory and its various elaborations – valuing; and theories of evaluation itself, as articulated by (c) theories of knowledge, including the main Shadish et al., there are a number of other bodies debates about the nature of knowledge, epis- of theory that are undoubtedly relevant and come temology, methodology, etc., and about the into the discourse of evaluators. In particular, nature of causal inference; there are domain theories and implementation (d) domain and thematic theories, which could theories and change. be described as theory of the evaluation In every policy domain or field where evalua- object. This would include bodies of theory tion occurs, there are bodies of theory unrelated about domains such as human resource to the practice of evaluation and to the logic of development, skill acquisition, the develop- programmes. Thus, in social welfare, there are ment of human capital and equal opportuni- theories related to the welfare state, the nature of ties that could inform evaluation design, social solidarity, the behavioural consequences of programme/policy implementation and different benefit regimes and the interactions outcomes; between social welfare and labour-market perfor- (e) theories of implementation and change often mance. Similar bodies of theory exist in relation seen as relevant by evaluators. We would to VET as they do in other domains such as include here understandings of policy research and development, regional planning, change, the diffusion of innovation and education and criminal justice. In the European administrative behaviour. Such bodies of context, at least, there seems to be an expecta- theory are likely to condition the success of Philosophies and types of evaluation research 33

programme interventions and can be quite objects of evaluation are constructed. They separate from the kind of programme theories are abstracted ideas which do not have a referred to above. direct empirical referent. (e.g. efficiency, a Finally, it is worth restating the main reasons learning organisation, an enterprise culture that theory is seen as important in evaluation: would all be examples of constructed (a) theory can help support interpretations. This objects.) In order to describe and measure follows from the widespread recognition that these objects, theory is needed. all evaluations are based on data that can be At a different level, we are also beginning to interpreted in different ways. Theory provides see theoretical development around the issue of an explicit framework for such an interpreta- complexity in socioeconomic programmes tion; (Sanderson, 2000). Many interventions are not (b) theory can help fill in the gaps in incomplete self-contained, they interact with other data. This follows from the recognition that programmes and with other social and organisa- however thorough evaluators may be, they tional processes. Thus in VET, a new training will never have the complete picture. Theory system is embedded in an institutional and can provide a plausible way of filling gaps in educational context which supports and available evaluation data; constrains this system. Similarly, a training initia- (c) theory can provide a framework to work with tive at the level of a particular work group is stakeholders. This follows from the increas- mediated by the way work is organised, different ingly common practice of dialogue and management styles and labour-market behaviour collaboration between evaluators, presumed of employers and workers. Often there are beneficiaries and others who are affected by multiple programmes and interventions operating programmes and policy instruments; simultaneously on a particular target group or (d) theory can help prediction and explanation. area. The interaction of these various processes This is the classic scientific role of theory: to and programmes is one of the greatest chal- suggest and explain causal links and likely lenges that evaluators face. Complexity theory is outcomes; a very new entrant into evaluation thinking. Ques- (e) theory can make explicit the constructed tions are raised more often than answers are objects of evaluation. Many contemporary provided. 6. Determinants of evaluation use

Evaluation types and philosophies are shaped by act as a vehicle for the absorption of evaluation contexts of use. This term encompasses both findings. She nonetheless remains true to her factors that encourage use and what is some- earlier arguments that evaluators should never times called ‘evaluation capacity development’. expect their inputs to override political agendas These contexts and capacities are not only and administrative necessities which may push conceived of differently by different scholars but decisions in quite different directions from their also change as context changes. It is these kinds recommendations. of use and capacity considerations that ultimately Patton is more committed to the instrumental decide how evaluation contributes to policy- purposes of evaluation. ‘By utilization I mean making and the delivery of public policies. intended use by intended users’ (Patton 1997, 2002 and earlier editions). He places consider- able emphasis on understanding the priorities of decision-makers, engaging with them, encour- 6.1. Instrumental versus aging them to own evaluations and their results in cumulative use order to enhance use. Furthermore, Patton tends to emphasise the example of individual There has been debate about the use of evalua- decision-makers, particular people and not the tion for over 25 years. This is primarily associated decision itself. Indeed he is interested in indi- with two main scholars: Weiss (1976) and Patton vidual ‘decision-makers’ cognitive style and (1997, 2002 and earlier editions). In the logic’. Unlike Weiss, he is less interested in mid-1970s Weiss began to focus on evaluation organisational and administrative processes and use, criticising simplistic notions of instrumental more with the act of decision and the particular use. Based on empirical studies of how decision-maker. policy-makers use evaluation, Weiss has been It has been commonly observed (e.g. Alkin, associated with a complex understanding of 1990) that an important determinant of the differ- decision-making and policy-making in which ence in perspective between Weiss and Patton is evaluation findings are internalised, selectively their respective fields of practice and study. used and rarely lead directly to specific decisions Patton has worked mainly in local community and or changes in policy. This theory, which is some- voluntary organisations; where he has worked at times labelled ‘an enlightenment view of evalua- public administrations they have also have been tion’, takes a long-term incremental view about at a local level. Weiss has been more occupied the way evaluation findings feed through to with large-scale national programmes (beginning policy-making. In this regard, she is close to for example with the 1970s’ war on poverty in Rossi’s understanding of the conceptual use of the US). This probably goes a long way to evaluation i.e. ‘the use of evaluations to influence explaining the one’s preoccupation with complex thinking about issues in a general way’ (Rossi organisational processes and the other’s preoc- et al., 1999). She challenges the rational model of cupation with individual decision-makers and decision-making and bases her conclusions on their decisions. studies of how organisations actually work. She effectively argues for cumulative learning across many evaluations rather than direct use from 6.2. Process use of evaluation particular evaluations. In some of her more recent work (Weiss, 1999) she associates herself with Although the above summarises, in a simplified arguments of Sabatier and Jenkins-Smith (1993) way, the main debate between Weiss and Patton, about the importance of policy forums that bring both have contributed far more to our under- together academics and policy-makers and can standing of evaluation use. In particular Patton is Philosophies and types of evaluation research 35

probably the originator of the term ‘process use’, ‘What works?’ school (Davies et al., 2000) is i.e. ‘changes […] that occur among those pre-eminently concerned with the nature of involved in evaluation as a result of learning that research evidence and the need for rigorous occurs during the evaluation process’ (Patton, methodology. The adherents of this school often 1997). Considerations of the process aspects of regard RCTs as the most reliable form of primary evaluation open up discussions also of different, evidence and systematic reviews and meta-anal- more action orientated, views of evaluation ysis – which make quantitative estimates of the purpose. For example, Patton also discusses overall effects of interventions by aggregating intervention orientated evaluation, participatory primary studies – as the most reliable approach evaluation and empowerment evaluation. These for secondary evidence. However, already in kinds of evaluation approaches are less relevant Davies et al. (2000) the editors of this collection to the immediate concerns of this particular themselves begin to address questions of better study, but do raise interesting methodological understanding the policy process. This is taken questions about how, in the course of evalua- further by Nutley et al. (2003) when they move tions, one can increase commitment and owner- beyond data and findings to consider how ship by attending to the details of the evaluation evidence is disseminated and incorporated into process. practice. Thus these authors also follow the pendulum between methodological and imple- mentation concerns. However, the central components of the 6.3. The importance of evidence-based policy debate remain method- methodology ological, as in the debate around meta-analysis and systematic reviews (Gallo, 1978; Pawson, Weiss, especially in her early work, opens up 2002a). The core of this debate is the importance another aspect of evaluation use, the importance of drawing together evidence from previous of sound methodology and reliable data (Weiss related initiatives before embarking on new initia- and Bucuvalas, 1980). Indeed in some of this tives. For some, narrative reviews that are essen- early work Weiss anticipates some very contem- tially structured qualitative literature reviews porary discussions about evidence-based policy drawing together lessons, is their preferred making and the use of experimental methods. It approach. Others seek to quantify precise effects should be acknowledged, however, that in later by looking to quantitative and experimental years as she investigated more carefully the studies along the lines of RCTs in medical drug actual uses made of evaluation, she shifted trials. Of course, the possibility of undertaking towards the position outlined above, that we had such quantitative studies is limited to a subset of to understand more about use in an organisa- policy areas usually in the social welfare, social tional context. In her concern for sound methods benefit and welfare to work areas. Despite the and valid research, Weiss is more ‘hands on’ than methodological disagreements among the propo- many of her contemporaries: Campbell, for nents of different approaches to such secondary example, sees use as being essentially part of analysis, there is a widespread consensus that politics and of little concern for evaluators who bringing together evidence from across many should be driven by ‘truth’. evaluations is an important guarantor of the relia- In many ways we can perceive a pendulum bility of findings. swing, between organisational and implementa- tion concerns on the one hand and methodolog- ical and quality assurance concerns on the other, back and forth over the last 25 years. The current 6.4. Evaluation and learning interest in evidence-based policy, the rediscovery of ‘the gold standard’ of experimental methods The most recent attempt to revisit the issue of and randomised control trials (RCTs) and the evaluation use in a comprehensive fashion, is a ‘realistic critique’ of such trials and experiments, publication of the American Evaluation Associa- represents a renewed movement toward the tion (AEA), The expanding scope of evaluation methodological end of the pendulum. Thus the use (Caracelli and Preskill, 2000). 36 The foundations of evaluation and impact research

In Chapter 2, Preskill and Torres consider ‘the the closest ‘cousin’ to the earlier discussion of learning dimension of evaluation use’. These evaluation use. However, within this literature the authors highlight, in particular, learning at an starting point is the nature of the policy process organisational level that occurs as part of the which then moves on to considerations of evalu- evaluation process. They draw on constructivist ation practice. Within the earlier cited literature, theories of learning, in which learners themselves the starting point is usually evaluation practice, engage in an active process of interpretation and which then moves on to considerations of dialogue in order to construct their own mean- policy-making. ings. They link this perspective with Lave and Organisational learning has become a common Wenger’s (1991) notions of ‘communities of prac- metaphor within studies of various kinds of tice’, i.e. groups of individuals working together, organisations, although most of this work has who are interdependent and whose tacit knowl- centred on the private sector. Leeuw et al. (1994) edge and problem-solving capacities are inte- bring together a range of experiences from the grated into their social and professional life. The public sector across Canada, the Netherlands, authors suggest that learning from evaluations Norway, Sweden and the US that seeks to apply ‘will most likely occur when evaluation is collabo- the concept of organisational learning through rative, is grounded in constructivist and transfor- evaluation to public sector institutions. Among mational learning theories, and builds communi- the themes highlighted in this study is the role of ties of evaluation practice.’ The implication of this internal evaluations within public bodies (see argument for evaluation use is to reinforce the especially chapters by Sonnichsen and by importance of developing communities of both Mayne). The authors highlight the important role evaluators and users within organisations, if eval- that can be performed by internal evaluation uation is to become part of an active learning offices and how these can influence the overall process. It also steers us away from the narrow organisations’ evaluation practice and learning. view of intended use that certainly informed the They suggest that developing a ‘double-loop earlier work of Patton. Given that communities of learning’ process – reporting not only to practice will interpret and construct their own programme managers but also to top manage- meanings using data and findings that they bring ment within government departments – as an into their own context, the use of evaluation may important part of the contribution that evaluation not be as evaluators or the commissioners of units can make. However, these studies also evaluation originally intended. However, trans- suggest that, while internal units can make an forming such evaluative outputs and processes important contribution to organisational learning, into their own organisational context still consti- they depend ultimately on what Sonnichsen calls tutes evaluation use. ‘a disposition towards critical self-examination’. His notion that ‘self-reflection is crucial before organisations begin to learn’ highlights the impor- 6.5. The institutionalisation of tance of creating a general evaluation culture evaluation within an organisation. In the concluding chapter of this collection, Rist There is extensive literature around the role of identifies two sets of preconditions for learning evaluation in policy systems. This literature is from evaluation. The first set arises from the largely based in policy analysis and public admin- importance of fitting in with the policy cycle. This istration rather than in evaluation per se. The recognises that organisations need ‘information at preoccupations of this body of literature can be different phases of the policy cycle’. Synchronising seen as mainly about evaluation capacity, evaluation outputs with different policy needs over including: organisational learning in the public time is seen as an important means of encour- sector; how to build evaluation into public admin- aging learning. The second set of preconditions for istrations; the nature of effective evaluation learning emphasises how information is trans- capacity; institutionalising and making evaluation mitted and filtered within public organisations. more professional; and analyses of the policy Thus studies within this framework suggest that process itself. This latter set of preoccupations is ‘governmental organisations appear more recep- Philosophies and types of evaluation research 37

tive to information produced internally than that (a) it is a matter of values and attitudes, a belief which comes from external sources’. This is espe- that it is right to look critically at policies and cially so when the news is bad! Bad news is easier programmes and to gather and consider to receive from internal rather than external evalu- evidence about what works and how to ators. Another precondition appears to be the improve performance; credibility of the sources of information. This may, (b) it is a matter of administrative practice, how on occasion, favour internally generated evaluation administrations are organised, how stake- findings, but may also depend on who are the holders are involved in the evaluation process sponsors or gatekeepers who bring information in and how appropriate levels of separation and to a public body. This can be seen as related to integration are maintained between those wider issues of relationship-building and trust. The who implement and those who evaluate existence of such organisational attributes seems public policies; an important precondition for receptivity to evalua- (c) it is a matter of system integration; systems tion findings and to organisational learning within that are supportive of an evaluation culture the public sector. are usually networked through professional Another collection we have considered, associations and adhere to common profes- focuses directly on evaluation capacity (Boyle sional standards. and Lemaire, 1999). Although some of the In the section that follows, (which continues to authors in this volume overlap with the Leeuw draw mainly on Boyle and Lemaire), we concen- et al. collection referred to above, the concerns trate mainly on the administrative arrangements. here are broader. Building evaluation capacity is Decentralised evaluation units are most likely seen in terms of ‘national evaluation systems’. to be intended to support decision-making and Thus, evaluation capacity goes beyond the programme effectiveness at a programme level. internal organisation of public bodies to include They are usually aligned with programme the location of evaluation in the executive or the management and are hence less threatening. legislature (e.g. Parliament) and broader issues of However, issues of independence and bias can governance and institutional arrangements. For arise from their closeness to programme our purposes a number of themes explored in personnel. Another weakness may be lack of this collection are particularly relevant. methodological skills in evaluation, though the One such theme is the design of evaluation most important criticism raised by Sonnichsen is functions and offices. Sonnichsen considers the the possible lack of power that decentralised advantages and disadvantages of a centralised units have, especially where decisions about and a decentralised model of evaluation func- programme and policy futures are still made at a tions. He notes the potential for greater indepen- centralised level. Ultimately this debate resolves dence and credibility of centralised functions and itself into one of evaluation purpose. Where the their ability to develop strategic evaluation plans. purpose is primarily to improve programme and This model has been favoured in Canada (for policy implementation, there appear to be strong example) since the late 1970s. The downside of arguments for a decentralised model which will centralised units and functions is also recognised. also favour learning at a decentralised level. They are often seen as threatening by other units Where the purpose is primarily to support central of administration with attendant resistance to strategies and policy-making, the argument for a change and potentially strained relationships. centralised model appears to be stronger: here the learning would tend to occur centrally rather than at programme or policy division level. Issues 6.6. Organisation of evaluation in of professional competence and skills acquisition public agencies appear to be stronger within a decentralised framework. This issue is further examined in the How evaluation is organised appears to be an same volume by Boyle, who discusses the important factor in effective evaluation take up human resources aspect of evaluation specialists and implementation. There are a number of major and the professionalisation of evaluation. He topics within this debate: highlights, in particular, the importance of appro- 38 The foundations of evaluation and impact research

priate training courses and education curricula, tion and learning: feedback, enlightenment or the way expertise is deployed mixing different argumentation? He challenges the traditional skills together, and the importance of continuous rational-objectivist model of policy evaluation, in-service training. From the point of view of this favouring rather a constructivist view in which review, much of this discussion is generic rather policy-makers conduct dialogues about evalua- than focused on use. However, from a use tion findings in order to reach their conclusions. perspective, Boyle makes a strong argument for Thus ‘policy-making is conceived of as an developing evaluation users. Evaluation users ongoing dialogue, in which both governmental also need to be trained and developed to and societal actors contest their views on policy become consumers. In some ways this can be issues by exchanging arguments’. At heart, this seen as one of the consequences also of intro- argument challenges the ‘positivist idea that ducing managing for results approaches in public policy evaluators may provide the policy-maker service organisations. This, in Boyle’s terms, with neutral or objective feedback information or creates the link between the supply and demand recommendations’. Rather than enlighten the sides of evaluation use. policy-maker, ‘at best, the evaluator might The shift in the institutional practice of evalua- contribute to the quality of policy discourse by tion within the public sector from monitoring and entering the negotiations that compose the management to accountability and performance policy-making processes with informed argu- is widely noted (see especially Bastoe, 1999) One ments and a willingness to listen, argue, and definite tendency that this and other studies persuade or be persuaded’. This shift from the demonstrate is the integration of evaluation and rational to the argumentative is, according to monitoring with various approaches to perfor- Van der Knaap, a way to ‘institutionalise policy mance management. orientated learning’. This is not to suggest that Among the elements of this approach, the evaluator is relieved of the responsibility to following an influential OECD paper (OECD, provide reliable information and findings but that 1995), are: there is a need also to supplement traditional (a) objective and target setting; analysis with material that will stimulate debate (b) management responsibility to implement and allow different stakeholders to consider against targets; material presented from different perspectives. (c) the monitoring of performance; A similar logic informs a recent article by (d) the feed-in of such performance into future Valovirta (2002). ‘Rather than regarding evaluative policy-making and programming; information as indisputable knowledge, it is viewed (e) the provision of information to external parlia- as a collection of arguments, which can be mentary and audit committees for ex-post debated, accepted and disputed’. According to the review. author, utilisation of evaluation should be regarded There is also disagreement about the need and as a process that runs through four stages: benefit of linking audit and evaluation, which (a) familiarisation with evaluation results and Bastoe also recognises, especially when imple- involvement in the evaluation process; menting performance management systems. It is, (b) interpretations based on expectations, for example, important also to consider ‘how assessments of the quality of research (truth learning actually takes place in organizations’. test) and the feasibility of actions proposed or This attention to learning is also the preoccupa- implied (utility test); tion of other authors in the institutionalisation of (c) argumentation in which ‘individual interpreta- evaluation in the public sector. tions are […] subject to collective deliberation, discussion, negotiation and decision-making’; (d) effects which may take the form of decisions 6.7. Evaluation as dialogue and actions, new shared comprehensions and increased awareness; and increased or A quite different approach to analysing evaluation undermined legitimacy. use in policy settings is suggested by Within this perspective on evaluation use, ‘eval- Van der Knaap (1995) in his article Policy evalua- uations force people to present well-grounded Philosophies and types of evaluation research 39

arguments for refuting evaluation conclusions and (d) administrative proceduralist, when the proce- recommendations. This opens up possibilities for dures through which evaluation is organised new understandings to emerge.’ (Valovirta, 2002; and delivered make evaluation use more likely, p. 77). e.g. well structured terms of reference plus Valovirta makes an important distinction requirements that programme managers act between the different contexts in which evalua- on or, at least, respond to evaluation findings; tion takes place. In particular he distinguishes (e) systemic proceduralists, when the wider between settings where there is a high level of system in which evaluation is embedded consensus versus those with a high level of including dissemination networks, communi- conflict; and settings where there is a low pres- ties of practice and administrative cultures sure for change versus those with a high pressure encourage evaluation use; for change. He suggests that these contextual (f) performance management, when the differences will determine the nature of the argu- demands of improving administrative perfor- mentation that takes place around evaluation mance and achieving targets creates a findings. market for evaluation outputs. In the real world, these categories are not mutually exclusive and most public agencies will 6.8. Strategies and types of pursue more than one. However, they do consti- evaluation use tute major alternatives; few users of evaluation attempt many of these strategies simultaneously. In summary, and cutting across the theoretical Some of these different approaches to evaluation debates described above, we can identify six use are more or less supportive of different types main strategies or approaches to evaluation use of evaluation. For example, instrumental from this discussion: approaches are more likely to be consistent with (a) instrumentalist, when evaluation is used an accountability or outcome and impact type of instrumentally to achieve an intended and evaluation; a performance management explicit type of use, e.g. make recommenda- approach is mainly concerned with improving and tions that are then implemented; developing programmes, evaluation being forma- (b) incrementalist, when evaluation becomes tive for these programmes. However different useful cumulatively over time by bringing contexts of use can support very different types together evaluation findings from different of evaluation; processual strategies may conduct evaluations e.g. through meta-analysis and their debates and arguments about evaluations synthesis, in order to inform action; that are neither concerned with processes nor set (c) process-oriented, when evaluation is useful out to be formative. The main implication from as much because of the processes of the above discussion is that evaluation is not engagement and debate it engenders among simply an applied technology or method. Rather stakeholders as because of the results it it is embedded in contexts of use that shape produces: thinking changes even if recom- what evaluation becomes in particular contexts mendations are not implemented; and fields of application. 7. Evaluation standards and regulation

(b) feasibility standards ‘are intended to ensure 7.1. Evaluation codes and that an evaluation will be realistic, prudent, standards diplomatic, and frugal’; (c) propriety standards ‘are intended to ensure Questions about the roles of evaluators are that an evaluation will be conducted legally, unavoidable, partly because the dominant logic of ethically, and with due regard for the welfare evaluation sees them exercise so much control. As of those involved in the evaluation, as well as with any other debate about the responsibility of those affected by its results’; professionals (doctors, auditors, research scien- (d) accuracy standards ‘are intended to ensure tists) the question of quis custodiet ipso custodes? that an evaluation will reveal and convey is soon heard. Given that the usual answer among technically adequate information about the professionals is ‘through self-regulation’, the features that determine worth or merit of the subsequent question is: ‘through what means and program being evaluated’ (from Joint against what criteria and standards does regula- Committee on Standards for Educational tion occur?’. It is in order to establish some shared agreement among evaluators, commissioners and Evaluation, 1994). those affected by evaluation, that evaluation codes These standards are variously concerned with and standards have become so prevalent. sound methods, timely dissemination, the inde- However, codes and standards also have a pendence and impartiality of evaluators and the broader purpose. In a decentralised system necessary level of evaluator skill and compe- composed of many stakeholders, standards are a tence. way of regulating behaviour across organisational Perhaps the clearest indication of how such boundaries, provided, that is, that all parties standards are intended to establish norms that accept these norms. will influence the conduct of evaluation is to be As the previous section has tried to show, there found in the way ‘utility’ is elaborated as a stan- is widespread concern among evaluators about dard. Thus utility includes: evaluation use. This has partly been fuelled by the (a) being clear about stakeholders so that their academic debates reviewed above, which have needs can be addressed; highlighted the problem of evaluations not being (b) ensuring the credibility of the evaluators so used. This is one impetus behind the development that their results are likely to be accepted; of guidelines and standards, usually for evaluators (c) collecting relevant information from a broad but also for commissioners, that are intended to range of sources as understood by clients promote evaluation use directly and indirectly. and stakeholders; They are direct because they often concern use. (d) being clear about value judgement used to They are indirect because they always seek to interpret findings; enhance the quality of evaluation, which is widely (e) reporting clarity so that the information assumed to be a factor in enhancing evaluation provided in reports is easily understood; use and practice more generally. (f) disseminating reports to intended users in a The earliest of these efforts have been timely fashion; produced by the AEA (1995) and its precursor (g) planning evaluations from the outset in a way organisations, for example the Joint Committee that encourages follow-through from stake- on Standards for Educational Evaluation (1994). holders. The programme Evaluation standards, a guide for It is worth noting that other standards also have evaluators, particularly from a programme perspec- implications for the conduct of evaluation. For tive, identifies standards under four main headings: example, feasibility standards include a substan- (a) utility standards ‘are intended to ensure that dard entitled political viability. This is concerned an evaluation will serve the information needs with obtaining the cooperation of different interest of intended users’; groups in order to limit bias or misapplication of Philosophies and types of evaluation research 41

results. Similarly the propriety standards include a concerned with ethical behaviour and deci- substandard entitled service orientation which is sion-making among commissioners, users and designed to assist organisations to address and teachers of evaluation as well as evaluators effectively serve the needs of a full range of themselves. According to the AES, the primary targeted participants; and one entitled conflicts of groups addressed by their guidelines are interest which seeks to avoid compromising evalu- commissioners and evaluation teams or evalua- ations and their results. Accuracy standards are tors. In this regard also the AES diverges from also relevant, for example valid information, one of some of the other guidelines referred to above. the substandards of accuracy, is justified in terms They acknowledge that evaluators often work in of assuring the interpretation arrived at is valid for teams rather than mainly as individuals. the intended use. The AES Guidelines for the ethical conduct of This set of standards has been widely imitated evaluations follows the evaluation cycle by and adapted to different national contexts grouping its guidance under three main headings: including, most recently, by the Deutsche (a) commissioning and preparing for an evaluation; Gesellschaft für Evaluation (DeGEval; 2002; see (b) conducting an evaluation; also Beywl and Speer in this report). There are (c) reporting the evaluation results. also discussions and plans in France and the UK One view one can take about the AES guide- to develop standards. These discussions have, lines is that they constitute a quality assurance however, moved away from directly adopting the framework for the entire evaluation process. These North American model. guidelines focus on a number of ways in which Alongside the widespread adoption of the AEA credibility of evaluations can be enhanced, i.e.: Programme Evaluation Guidelines there has also (a) shared expectations between evaluators and been the emergence more recently of a set of commissioners about what can be delivered guidelines for the ethical conduct of evaluations. through an evaluation; These differ from programme guidelines, being (b) strengthening the basis for evaluation judge- more concerned with the ethical dilemmas that ments; both commissioners and evaluators face in the (c) reducing conflicts during the course of the course of the evaluation process. Such guidelines evaluation; have been variously prepared by the Australasian (d) ensuring balance and simplicity in the way Evaluation Society (AES, 1997), the Canadian reports are presented. Evaluation Society (CES) (1) and the AEA. These However, it should be noted that the AES ethical guidelines are relevant to a discussion regards these guidelines as complementary to, about evaluation itself because they, too, are rather than as a substitute for, other guidelines concerned with the credibility as well as the such as the Programme evaluation guidelines; feasibility of evaluation. indeed they encourage the use of the Guidelines Both the CES and the AEA direct their atten- for the ethical conduct for evaluations jointly with tion to evaluators. Thus the CES guidelines, the Programme evaluation standards. under the three main headings of competence, A set of evaluation guidelines and standards integrity and accountability, are concerned that that is firmly set within the concerns of the evaluators should be competent, act with programme managers and commissioners of integrity and be accountable. Similarly the AEA in evaluation, has recently been issued by the Euro- its Guiding principles for evaluators – under pean Commission (2). These are linked to the various headings of systematic inquiry, compe- introduction of ‘activity based management’, the tence, integrity/honesty, respect for people, Commission’s form of results-based manage- responsibilities for general and public welfare and ment, and are consistent with many of the recommendation for continued work – also direct approaches to strengthening evaluation capacity their attention to what evaluators ought to do. and evaluation use discussed above. They cover One set of ethical guidelines drawn up by eval- how evaluation ‘functions’ across the Commis- uators’ professional bodies, produced by the sion should be organised and resourced, how AES, stand out from the others. These are evaluations should be planned and managed,

(1) www.evaluationcanada.ca (2) http://europa.eu.int/comm/budget/evaluation/index_en.htm 42 The foundations of evaluation and impact research

how results should be used and disseminated and journalists). The Means collection authors also and how to ensure good quality reports. acknowledge the ‘absence of a direct short-term There are a variety of other guidelines for evalu- link between recommendations and decisions’. ation that are indicative of the widespread interest Evaluation standards are not only intended as in evaluation standards. For example, the Means a framework for the design of particular evalua- collection (European Commission, 1999), Volume 1, tions. They are also used in meta-evaluations, to Evaluation design and management, has a section try to describe the range of evaluation practice, on optimising the use of evaluation. This focuses and as a quality assessment tool in relation to on dissemination, distinguishing between both completed evaluations. In the European Commis- different communication channels (e.g. reports, sion, for example, the means ‘quality criteria’ are synthesis, article, confidential note) and audiences widely used as a framework for assessing evalu- (e.g. commissioners of the evaluation, steering ations, both at proposal and completion stages groups, managers, European institutions, citizens (Table 5 below).

Table 5: Grid for a synthetic assessment of quality of evaluation work

With regard to this criterion, the evaluation report is

1. unacceptable 2. acceptable 3. good 4. excellent 1234

Meeting needs: does the evaluation adequately address the requests for infor-  mation formulated by the commissioners and does it correspond to the terms of reference?

Relevant scope: have the rationale of the programme, its outputs, results,  impacts, interactions with other policies and unexpected effects been carefully studied?

Defensible design: is the design of the evaluation appropriate and adequate  for obtaining the results (with their limits of validity) needed to answer the main evaluative questions?

Reliable data: are the primary and secondary data collected or selected suit-  able? Are they sufficiently reliable compared to the expected use?

Sound analysis: are quantitative and qualitative data analysed in accordance  with established rules, and are they complete and appropriate for answering the evaluative questions correctly?

Credible results: are the results logical and justified by the analysis of data  and by interpretations based on carefully presented explanatory hypotheses?

Impartial conclusions: are the conclusions just and non-biased by personal or  partisan considerations, and are they detailed enough to be implemented concretely?

Clear report: does the report describe the context and goal, as well as the  organisation and results of the evaluated programme in such a way that the information provided is easily understood?

In view of the contextual constraints bearing on the evaluation, the evalua-  tion report is considered to be

Source: European Commission, 1999 Philosophies and types of evaluation research 43

On the boundaries of the territory covered elements in the new technology of coordination here, is a wider literature on innovation and good and interagency management required by the practice. Thus in many studies that have been restructured public sector. conducted at a European level there is a concern This has been noted by evaluation researchers for dissemination of innovation. An interesting in the education sector in particular. Thus Henkel example of this is found in Mannila et al. (2001). (1997) has noted how the external assessment of In this volume, Sonderberg discusses main- institutions in the UK has led to a streaming, innovation and knowledge networks. reduction in their traditional autonomy and The way in which innovations are spread is close arguably a shift towards increased managerial to the problem of how to encourage the take-up intervention inside these institutions. Segerholm of the results of evaluations. Sonderberg also (2001) develops a more extensive theory of focuses on knowledge networks (which is close ‘national evaluations as governing instruments’. to notions of policy communities and communi- She describes how the evaluation of education ties of practice) as a means of ensuring the programmes in Swedish Higher Education ‘worked dissemination and mainstreaming of new ideas. as an administrative technique for disciplining and Arguably this broader literature on the diffusion of […] governing’. In this, Segerholm follows Foucault innovation could be regarded more generally as a in arguing that diffused evaluation systems with source of insights into how evaluation practice criteria that are internalised by institutions and and findings are disseminated. practices that make the application of such criteria While there has been a great deal of discussion visible, increase opportunities for central control. around standards and codes in North America, Similar arguments have been advanced in other Australasia and, most recently, in Europe, the parts of the public sector that have been subject evidence of take-up of such standards and codes to wholesale reform and decentralisation – but still is sparse. Most evaluators can come up with held accountable – especially in northern Euro- examples of ‘breaches’ in most of their evaluations. pean and North America. The systematic application of standards is rare. A second explanation for the spread and take- up of evaluation has been the new demand for accountability that comes from better informed 7.2. Evaluation as a method for and less deferential citizenry. Those responsible regulating decentralised for delivering public policy are faced by systems conflicting demands. On the one hand their polit- ical masters require success, transparent opera- There is a need to understand the pervasiveness tions, evidence of efficiency and explanations of of evaluation across sectors, branches of public failure. On the other hand, communities, administration and the public sector more widely presumed beneficiaries and those affected by and its growth as a practice. As was suggested in policy initiatives, schooled in the market rhetoric the terms of reference for this study, part of the of customers who receive a service, are increas- answer rests with the nature of public sector ingly sceptical of government action. The growth reform, including deregulation and decentralisa- in ever more detailed management oriented and tion, that has occurred over the last two decades. explanatory evaluation frameworks, and the This process has fragmented previously estab- simultaneous growth of participative, bottom up, lished mechanisms of control and management. stakeholder led evaluation practice, is a response No longer are programmes, let alone policies, to these contradictory demands. delivered by single agencies. The de facto norms Evaluation in these contexts can be seen as a and standards that characterised well-estab- control process within the cybernetic meaning of lished public bodies are no longer shared among the term. Terms such as steering, guidance and those responsible for programmes and policy regulation are frequently attached to descriptions instruments. This is important not only for retro- of evaluative activity. In these terms, standards spective accounting for success or failure but become especially important. Standards become also to plan and implement major interagency not only a tool for judging and valuing, they also initiatives. Evaluation has become one of the key become the means to develop and express 44 The foundations of evaluation and impact research

consensus among fragmented policy communi- possibly through regulation) in order to exer- ties and communities of practice. cise influence over the performance of others; The role of standards in evaluation, therefore, (c) the standards that operate in devolved needs to be seen much more broadly than the systems are processual; they concern the debate about the standards that apply to evalua- obligation of those to whom powers are tion practice and outputs. Four main understand- devolved to follow certain procedures, ings of standards in evaluation can be identified: including making evaluation outputs available; (a) those concerned with standards (and criteria) (d) standards for evaluators are part of the to judge outcomes and effectiveness; self-regulation agenda or a recently emerged (b) those concerned with required standards of professional group, mainly developed from performance in decentralised administrative within that profession. systems engaged in programme delivery; Types of evaluation identified in this chapter in (c) those concerned with devolved self-evalu- part derive from research and conceptualisation ating systems operating within a predeter- about evaluation. However this research-based mined framework; activity interacts with two others. First, evalua- (d) those concerned with the required behaviour tors as practitioners reflect on their own practice of evaluators and those who commission and have become more professional to the point evaluation. where issues of professional self-regulation have Each of these has been discussed in this been highlighted. Second evaluation is shaped chapter. However, the nature of these standards by a complex web of contractual and institu- differ subtly, for example: tional demands. Both of these activities, evalua- (a) evaluation standards allow for judgements to tors as reflective practitioners and evaluation as be made based on norms and/or the beliefs an institutionalised and market/network based of stakeholders; practice, determine what types of evaluation (b) programme delivery standards are usually set survive as well as what types are possible or by higher levels of an administration (or advocated. 8. Conclusions and future scenarios

As this study is conceived within a European be a presumption of homogeneity among those context, there are also questions to be asked as affected by public intervention and a tendency to to the likely evolution of evaluation within Europe. favour methods and modus operandi that take This is not to suggest that evaluation in Europe is limited account of the particular circumstances of sui generis, rather that there are distinctive policy and programme implementation. aspects of the EU’s institutions and the chal- Reconciliation of evaluation cultures is also lenges that evaluators face in Europe. This important. Alongside the cultural diversity of distinctiveness is shaped by a number of under- Europe, and not disconnected from it, is cultural lying dynamics, the most important of which are diversity among evaluators. Within the social and discussed below. economic sciences in particular, there are familiar The first is accountability and participation. cleavages, many of which have been discussed The policy system, the main customer of evalua- in this chapter. For example, evaluation also has tion, faces contradictory demands. On the one its advocates for largely empirical and positivist hand, the demand for accountability for monies methodologies as well as advocates for spent and for promises made generates theory-based investigation in various forms. What mounting pressure for top-down, performance constitutes evidence, validity, generalisability and oriented, quasi-audit types of evaluation. On the legitimate conclusions are hotly debated among other hand, citizens who often feel estranged evaluators as they are in other academic and from their politicians and question the basis of professional circles. To an extent, these cultural distant, context-free decisions, demand closer divides among evaluators mirror, and are rein- involvement and participation in the way policies forced by, cultural divisions within Europe. The are designed and implemented. The extent to philosophical patrimony of Latin, Scandinavian which evaluation evolves in an accountability or and Anglo-Saxon countries, can make them participative direction will depend on how these receptive to different evaluation approaches and contradictory demands are – or are not – methodologies. There is also a history of policy resolved. This is not to suggest that both variants borrowing within Europe, with established cannot coexist, but the two cultures of evaluation patterns of shared professional and policy are very different and the balance of thinking and networks that predispose some countries to resource constitute genuinely alternative adopt practices more easily from particular coun- scenarios. tries rather than others. (Italians are more likely to Diversity and convergence is a second consid- ‘borrow’ from the French, Scandinavians from eration. The ever-expanding European ‘space’ is each other and the British from North Americans.) increasingly diverse in its traditions, institutions, So, the alternatives of convergence or national – languages and culture. The European project was or at least regional – specificity are available to founded on a vision of integration that assumes evaluators as they are to Europe’s citizens and convergence. Even though the contrary vision of their Member States. Among the influences that subsidiarity emphasises continued diversity as will shape these alternatives, the growth of evalu- much as convergence, both tendencies remain ation societies at a European level and the poli- strong. This matters for evaluation because the cies of the EU, are likely to be important, as are diversity/convergence dynamic underpins many the dissemination of evaluation standards and of the roles that evaluators are expected to procedures by European institutions. take-up. In particular it affects how evaluators Solidarity and social exclusion also feature. In relate to stakeholders. When diversity is many socioeconomic programmes, including accepted, value-sensitivity and working closely VET, lack of social solidarity is a factor that both with stakeholders is taken for granted. When underpins the problems that the programme convergence is assumed there is more likely to seeks to address and constrains the policy 46 The foundations of evaluation and impact research

responses that are possible. In extremis social tors has become a theme in many European exclusion is at the heart of these programmes. debates about evaluation policy and practice. In However, the methods of evaluation that are the first generation of these debates, the frequently deployed assume solidarity and make emphasis has been on educating evaluators to little contribution towards social cohesion. From understand the priorities and pressures on within a presumption of solidarity, evaluators can policy-makers: the importance of deadlines, the easily become part of a regulatory or control need for clear recommendations, the need to regime. A counter tendency within contemporary accept the parameters of current conventional evaluation is to engage actively in formative, wisdom. This is often accompanied by a demand developmental and trust building activities as part from evaluators that ‘policy-makers become more of the evaluation itself. This goes beyond partici- like us’. The emergence of networks and commu- pation or offering stakeholders a voice: it takes nities of practice that span both evaluators and the opportunity to support social inclusion and policy-makers opens up a second generation of solidarity by the way an evaluation is conducted. debate. This offers the possibility of shared frame- For example, not only do evaluators study the works, an understanding of what evaluation can realities of partnership working, they also and cannot achieve and an understanding of what contribute to the strengthening of partnerships by policy-makers need in order to learn from and to the methods and research strategies they adopt; use evaluation outputs and processes. The extent There is also tension between complexity and to which this second generation debate becomes linearity. Characteristically, much evaluation work more commonplace, so that European evaluators is increasingly complex. Such complexity under- and policy-makers can move beyond the ‘why pins a high level of uncertainty in policy interven- can’t they be more like us’ refrain, will also shape tions. Success is unsure and goals need to be the way in which European evaluation evolves in redefined (or rationalised) along the way. However, the future. many policy interventions are simple in their logic. Considering the possible futures of evaluation They assume a linear input leading to a brings into focus many of the issues raised else- predictable output: increased investment in VET where in this article, but from a particular leads to higher wages, or more work-based perspective. While evaluators, policy-makers and learning leads to greater competitiveness. When evaluation researchers advocate different evalua- reinforced by a strong accountability ethos, this tion approaches, models and practices, what this places pressure on evaluators to attempt to cali- concluding section has sought to emphasise is brate the future (e.g. through impact assessment) that evaluation itself is shaped by societal and demonstrate through ‘success stories’ early dynamics and contingencies. There are choices wins and the achievement of anticipated goals. between: types of evaluation and underlying When the policy system is able to acknowledge philosophies of science; capacity development uncertainty – which implies an unusual degree of and evaluation use; evaluation theory and theory openness to citizens and electorates – new possi- more generally; and the prominence of participa- bilities are opened up also for evaluators. They tion, empowerment and self-regulation on the become part of a more reflexive and iterative one hand, and top-down, policy driven variants of ‘learning culture’, with lessons learned from evaluation on the other. These are not open and mistakes, as well as from evidence of success. unencumbered. Rather like other (social) tech- Finally, there continues to be important divi- nologies, evaluation is shaped by wider societal, sions between policy-makers and evaluators. political and institutional dynamics. The future of Some of these derive from the dynamics identified evaluation in Europe, as elsewhere, should there- above. For example, the ethos of accountability fore be understood also in terms of the bigger and the need to demonstrate success, which picture: of the way policy systems adapt to drives policy-makers, and the preoccupation with socioeconomic, cultural and political challenges, complexity and value difference, which tends to and the way evaluators themselves engage as drive evaluators to a greater extent. Bridging the actors, making choices about how they can cultural divide between policy-makers and evalua- contribute to these challenges. List of abbreviations

AEA American evaluation association AES Australasian evaluation society CES Canadian evaluation society CMO Context, mechanism, outcome CVT Continuing vocational training RCTs randomised control trials VET Vocational education and training References

Alkin, M. C. Debates on evaluation. London: Sage, and contexts. New York: The Aspen Institute, 1990. 1995. AEA – American Evaluation Association Task Force Cronbach, L. J. Designing evaluations of educa- on Guiding Principles for Evaluators. Guiding tional and social programs. San Francisco: Principles for Evaluators. In: Shadish, W. R. Jossey-Bass, 1982. et al. (eds) New directions for program evalua- Cronbach, L. J. Construct validation after thirty tion – guiding principles for evaluators, 1995, years. In: Linn, R. L. (ed.) Intelligence: measure- No 66, p. 19–24. ment, theory and public policy. Urbana: Univer- AES – AustralasianEvaluationSociety. Guidelines for the sity of Illinois Press, 1989. ethical conduct of evaluation, 1997. Available from DeGEval – Deutsche Gesellschaft für Evaluation Internet: http://www.aes.asn.au/aesguide02.pdf [German Evaluation Society]. Standards für Eval- [Cited 30.6.2003]. uation. Köln: DeGEval, 2002 Available from Biott, C.; Cook, T. Local evaluation in national early Internet: www.degeval.de/standards/index.htm years excellence centres pilot programme. [cited 30.6.2003]. Evaluation – The International Journal of Theory, Davies, H. T. O.; Nutley, S. M.; Smith, P. C. (eds) Research and Practice, 2000, Vol. 6, No 4, What works? Bristol: The Policy Press, 2000. p. 399-413. European Commission. Evaluation design and Boyd, R. Confirmation, semantics and the interpre- management. In: Means collection: evaluating tation of scientific theories. In: Boyd, R.; socio-economic programmes. Luxembourg: Gasper, P.; Trout, J. D. The philosophy of Office for Official Publications of the European science. Cambridge, MA: MIT Press, 1991. Union, 1999, Vol. 1. Bastoe, P.O. Linking evaluation with strategic plan- Fetterman, D. M.; Kaftarian, S. J.; Wandersman, A. ning, budgeting, monitoring and auditing. In: (eds) Empowerment evaluation: knowledge and Boyle, R.; Lemaire, D. (eds) Building effective tools for self-assessment and accountability. evaluation capacity: lessons from practice. London: Sage, 1996. London: Transaction Publishers, 1999, Fournier,D. M. Establishing evaluative conclusions: p. 93-110 (Comparative policy analysis series). a distinction of general and working logic In: Boyle, R.; Lemaire, D. (eds) Building effective eval- Fournier, D. M. (ed.) Reasoning in evaluation: uation capacity: lessons from practice. London: Transaction Publishers, 1999 (Comparative inferential links and leaps. San Francisco: policy analysis series). Jossey-Bass, 1995, p. 15–32 (New directions Caracelli, V. J.; Preskill, H. (eds) New directions for for program evaluation, 68). evaluation: the expanding scope of evaluation Fulbright-Anderson, K.; Kubisch, A. C.; Connel, J. P. use. San Francisco: Jossey-Bass, 2000. New approaches to evaluating community Chelimsky, E. Politics, policy and research initiatives, Vol. 2: Theory, measurement and synthesis. Evaluation – The International analysis. Washington DC: The Aspen Institute, Journal of Theory, Research and Practice, 1995, 1998. Vol. 1, No 1 p. 97-104. Gallo, P. S. Meta-analysis – a mixed metaphor. Chelimsky, E. Thoughts for a new evaluation American Psychologist, 1978, Vol. 33, society. Evaluation – The International Journal p. 515–517. of Theory, Research and Practice, 1997, Vol. 3, Guba, E. G.; Lincoln, Y. S. Fourth generation evalu- No 1, p. 97-118. ation. London: Sage, 1989. Chen, H.-T. Theory-driven evaluations. London: Henkel, M. Teaching quality assessments: public Sage, 1990. accountability and academic autonomy in Connell, J. P. et al. (eds) New approaches to evalu- higher education. Evaluation, 1997, Vol. 3, No 1, ating community initiatives: concepts, methods p. 9–23. Philosophies and types of evaluation research 49

House, E. R. Assumptions underlying evaluation Patton, M. Q. Utilization-focussed evaluation: the models. In: Madaus, G. F.; Scriven, M.; Stuffle- new century text (3rd Edition). London: Sage, beam, D. L. Evaluation models viewpoints on 1997. educational and human services evaluation. Patton, M. Q. and evaluation The Hague: Kluwer-Nijhoff Publishing, 1983. methods. London: Sage, 2002. Joint Committee on Standards for Educational Pawson, R. Evidence based policy: in search of a Evaluation. The program evaluation standards. method. Evaluation – the International Journal How to assess evaluations of educational of Theory, Research and Practice, 2002(a), programs. Thousand Oaks: Sage, 1994. Vol. 8, No 2, p. 157-181. Julnes, G.; Mark, M. M.; Henry, G. T. Promoting Pawson, R. Evidence based policy: the promise of ‘realist synthesis’. Evaluation – the International realism in evaluation: realistic evaluation and Journal of Theory, Research and Practice, the broader context. Evaluation – the Interna- 2002(b), Vol. 8, No 3, p. 340-358. tional Journal of Theory, Research and Practice, Pawson, R.; Tilley, N. Realistic evaluation. London: 1998, Vol. 4, No 4, p. 483-504. Sage, 1997. Lave, J.; Wenger, E. R. Situated learning: legitimate Rist, R. C.; Furubo, J. E.; Sandahl, R.(eds) The peripheral participation. Cambridge: Cambridge evaluation atlas. London, 2001. Press, 1991. Rogers, E. Diffusion of innovation. New York: The Leeuw, F. L.; Rist, R. C.; Sonnichsen, R. C. Can Free Press, 1995. governments learn? Comparative perspectives Rogers, P.J. Program theory: not whether programs on evaluation and organization learning. work but how they work. In: Stufflebeam, D. L.; Somerset NJ: Transaction Publishers, 1994 Madaus, G. F.; Kellaghan, T. Evaluation models (Comparative Policy Analysis Series). viewpoints on educational and human services Leeuw, F. L.; Toulemonde, J.; Brouwers, A. Evalua- evaluation (2nd edition). Dordrecht: Kluwer tion activities in Europe: a quick scan of the Academic Publishers, 2000. market in 1998. Evaluation – the International Rossi, P. H.; Freeman, H. E.; Lipsey, M. W. Evalua- Journal of Theory, Research and Practice, tion: a systematic approach (6th edition). 1999, Vol. 5, No 4, p. 487–496. London: Sage, 1999. Mannila, S. M.; Ala-Kauhaluoma, M.; Valjakka, S. Sabatier, P. A. An advocacy coalition framework of (eds) Good practice in finding good practice: policy change and the role of policy-oriented international workshop evaluation. Naperville, learning therein. Policy Sciences, 1988, Vol. 21, IL: Rehabilitation Foundation, 2001. p. 129-168. Miller, R. W. Fact and method in the social sciences. Sabatier, P. A.; Jenkins-Smith, H. C. Policy change In: Boyd, R.; Gasper, P.; Trout, J.D. The philos- and learning – an advocacy coalition approach. ophy of science. Cambridge, MA: MIT Press, Boulder, CO: Westview Press, 1993. Sanderson, I. Evaluation in complex policy 1991. systems. Evaluation – the International Journal Nagarajan, N.; Vanheukelen, M. Evaluating EU of Theory, Research and Practice, 2000, Vol. 6, expenditure programmes: a guide to interme- No 4, p. 433-454. diate and ex-post evaluation. Luxembourg: Schwandt, T. Evaluation as practical hermeneutics. Office for Official Publications of the European Evaluation – the International Journal of Theory, Communities, 1997. Research and Practice, 1997, Vol. 3, No 1, Nutley, S. M.; Walter, I.; Davies, H. T. O. From p. 69-83. knowing to doing: a framework for under- Scriven, M. Evaluation thesaurus (4th edition). standing the evidence-into-practice agenda. London: Sage, 1991. Evaluation – the International Journal of Theory, Segerholm, C. National evaluations as governing Research and Practice, 2003, Vol. 9, No 2. instruments: how do they govern? Evaluation, OECD – Organisation for Economic Cooperation 2001, Vol. 7, No 4, p. 427-438. and Development. Budgeting for results: Shadish, W. R.; Cook, T. D.; Leviton, L. C. Founda- perspectives on public expenditure manage- tions of program evaluation: theories of prac- ment. Paris: OECD, 1995. tice. London: Sage, 1991. 50 The foundations of evaluation and impact research

Shadish, W. R.; Cook, T. D.; Campbell, D. T. Exper- Evaluation – the International Journal of Theory, imental and quasi-experimental designs for Research and Practice, 1995, Vol. 1, No 2, generalised causal inference. Boston: p. 189-216. Houghton Mifflin Company, 2002. Vedung, V. Public policy and programme evalua- Stake, R. The art of case study research. London: tion. New Brunswick, NJ: Transaction, 1997. Sage, 1995. W. K. Kellogg Foundation. Logic model develop- Stake, R. For all program evaluations there’s a crite- ment guide. Michigan: W. K. Kellogg Founda- rion problem. Evaluation – the International tion, 2000. Journal of Theory, Research and Practice, 1996, Weiss, C. H. Evaluation research: methods of Vol. 2, No 1, p. 99-103. assessing program effectiveness. New Jersey: Stern, E. The characteristics of programmes and Prentice-Hall, 1972. their evaluation. In: OECD. National Weiss, C. H.; Bucuvalas, M. J. programmes in support of local initiatives. research and decision-making. New York: Paris, OECD, 1992. Columbia University Press, 1980. Stern, E.; Kelleher, J.; Cullen, J. Preliminary Weiss, C. H. Nothing as practical as good theory: common evaluation framework. London: The exploring theory-based evaluation for compre- Tavistock Institute, 1992. (Articulate Deliver- hensive community initiatives for children and able, No 1). families. In: Connell, J. P. et al. (eds) New Stufflebeam, D. L. Foundational models for 21st approaches to evaluating community initiatives: century program evaluation. In: Stufflebeam, concepts, methods and contexts. New York: D. L.; Madaus, G. F.; Kellaghan, T. Evaluation The Aspen Institute, 1995. models viewpoints on educational and human Weiss, C. H. The interface between evaluation and services evaluation (2nd edition). Dordrecht: public policy. Evaluation – the International Kluwer Academic Publishers, 2000(a). Journal of Theory, Research and Practice, 1999, Stufflebeam, D. L. Professional standards and prin- Vol. 5, No 4, p. 468-486. ciples for evaluations. In: Stufflebeam, D. L.; Weiss, C. H.. Theory-based evaluation: theories of Madaus, G. F.; Kellaghan, T. Evaluation models change for poverty reduction programs. In: Fein- viewpoints on educational and human services stein,O; Picciotto,R. (eds) Evaluation and poverty evaluation (2nd edition). Dordrecht: Kluwer reduction. Washington: World Bank, 2000. Academic Publishers, 2000(b). Weiss, C. H. Using research in the policy process: Toulemonde, J. Evaluation identities, 2001. potential and constraints. Policy Studies Valovirta, V. Evaluation utilization as argumentation. Journal, 1976, Vol. 4, p. 224-228. Evaluation – the International Journal of Theory, Wholey, J. S. Using evaluation to improve program Research and Practice, 2002, Vol. 8, No 1, performance. In: Levine, R. A. et al. (eds) Evalu- p. 60-80. ation research and practice: comparative and Van der Knaap, P. Policy evaluation and learning: international perspectives. Beverly Hills, CA: feedback, enlightenment or argumentation. Sage, 1981.