CHAPTER 4 EVALUATION QUESTIONS
The OECD/DAC definition of evaluation has been adopted by Danida and all major development agencies internationally. The definition contains five evaluation criteria that should be used in assessing development interventions: relevance, efficiency, effectiveness, impact and sustainability.'
These are general criteria that should be used as a basis for developing evaluative questions through the full range of evaluations topics, i.e. from single intervention through to thematic, and ways of conducting the evaluation, e.g. joint evaluation.
Taken together, these five criteria should provide the decision-maker with the essential information and clues to understand the situation and determine what should be done next.
To the extent that specific evaluations have specific purposes, that there is no one right way to conduct an evaluation and that these criteria are interdependent and not mutually exclusive, their relative meaningfulness for a specific evaluation should be assessed and trade-offs discussed in each case to ensure that key questions are addressed and to avoid unnecessary effort and expense.
In addition, use of the five criteria does not exclude that other criteria might be used as well to better focus the evaluation on specific characteristics of the intervention and its context.
The criteria for the evaluation of humanitarian assistance are a case in point: because of some unique features of humanitarian intervention, the Active Learning Network for Accountability and Performance in Humanitarian Action, ALNAP http://www.alnap.org, has introduced three additional evaluation criteria: connectedness, coherence and coverage.9
The extent to which the objectives of a development intervention are consistent with beneficiaries’ requirement, country needs, global priorities and partners’ and donors’ policies.
A measure of how economically resources/inputs (funds, expertise, time, etc.) are converted to results.
The extent to which the development intervention’s objectives were achieved, or are expected to be achieved, taking into account their relative importance.
The positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended.
The continuation of benefits from a development intervention after major development assistance has been completed. The probability of long-term benefits. The resilience to risk of the net benefit flows over time.
Additional criteria for evaluation of humanitarian action
The need to ensure that activities of a short-term emergency nature are carried out in a context that takes longer-term and interconnected problems into account.
The need to assess security, developmental, trade and military policies as well as humanitarian policies, to ensure that there is consistency and, in particular, that all policies take into account humanitarian and human rights considerations.
The need to reach major population groups facing lifethreatening suffering wherever they are.
Relevance is a measure of the extent to which development interventions meet population needs and country priorities, and are consistent with donor policies.
For example, in a road project relevance could be assessed in terms of the rationale for constructing the road: was it to serve a political agenda of the few or to exploit real economic potential? In a sector programme to support agriculture, relevance could be assessed in terms of domestic market responses to new crops, farmers’ responses to the various programme initiatives, etc.
A change in society’s policies or priorities could imply that the development interventions are now accorded lower priority, or lose some of their rationale. Once an endemic disease has been eradicated, for instance, it could mean there is no longer any need for a special health programme.
In other words, relevance is basically a question of usefulness; in turn, the assessment of relevance leads to higher level decisions as to whether the development activities in question ought to be terminated or allowed to continue. And, if the latter is the case, what changes ought to be made, and in what direction? Are the agreed objectives still valid, and do they represent sufficient rationale for continuing the activities?
These questions should be addressed at various levels with reference to the partner country:
- At the higher level it concerns the relationship between the development activities and the development policy of the partner country, as well as whether the activities are in keeping with the priorities of the donor country.
- At the middle level it is a question of how development activities fit into a larger context, e.g. in relation to other development interventions and development efforts within a larger programme or sector.
- At the lower level it is a question of whether the development activities are directed towards areas accorded high priority by the affected parties.
Efficiency is a measure of the relationship between outputs, i.e. the products or services of an intervention, and inputs, i.e. the resources that it uses.
An output is a measure of effort; it is the immediate observable result of intervention processes over which the managers of the intervention, i.e. the implementers, have some measure of control. An intervention can be thought of as efficient if it uses the least costly resources that are appropriate and available to achieve the desired outputs, i.e. deliverables, in terms of quantity and quality.
The quality of the inputs and the outputs is an important consideration in assessing efficiency: the most economical resource is not necessarily the most appropriate and the trade-offs between the quantity of outputs and their quality are a key factor of overall performance.
Furthermore, assessing the efficiency of an intervention generally requires comparing alternative approaches to achieving the same outputs and this will be easier for some types of intervention that for others.
In practise, the extent to which intervention activities are standardised or not, i.e. the factors of production are well known or not, usually determines how efficiency is measured and assessed.
In a road building project for example, where the methods of construction are fairly well established, a typical measure of efficiency would be the cost per km per class of road. As well, because other projects and jurisdictions are also likely to use that same measure of efficiency, among others, the bases for comparison and assessment, or benchmarks, are readily available in most cases.
On the other hand, a national initiative on women’s rights for example is not standardised across countries. In such cases, relevant measures of efficiency typically address waste in the process, either at the level of inputs, i.e. economy – obtaining appropriate resources at least cost or fair market value, or at the level of process, i.e. duplication-triplication – etc. of activities, conflicting processes, throughputs that do not link to outputs. As well, good practices, i.e. lessons learned from similar endeavours, can be used as benchmarks for assessing efficiency.
Some examples of useful and practical criteria for assessing the efficiency of a programme or a project are:
- Appropriate resources acquired with due regard for economy
- Activities carried out as simply as possible
- Decisions made as close to where the products or services are delivered
- Overhead as low as possible
- Duplication or conflicts addressed and resolved
- Deliverables achieved on time and on budget.
Effectiveness is a measure of the extent to which the intervention’s intended outcomes, i.e. its specific objectives – intermediate results – have been achieved.
Explicitly, effectiveness is the relationship between the intervention’s outputs, i.e. its products or services – its immediate results – and its outcomes, meaning usually the intended benefits for a particular target group of beneficiaries.
As such, an intervention is considered effective when its outputs produce the desired outcomes; it is efficient when it uses resources appropriately and economically to produce the desired outputs.
For example, a teaching programme is considered effective if students learn, i.e. acquire intended knowledge, skills and abilities; it is considered efficient if it provides instruction, i.e. teaching time and materials, economically and of quality.
An efficient intervention is not necessarily effective. Teaching may be provided economically and efficiently, but if it is not of good quality, e.g. appropriate to the needs and interests the students, intended learning outcomes will not be achieved, i.e. it will not be effective.
Evaluating the effectiveness of an intervention involves three steps:
1. Measuring for change in the observed outcome, e.g. did the students learn something;
2. Attributing the change in the observed outcome to the intervention, did the students learn something because of the teaching;
3. Judging the value of the teaching to the learning, e.g. by using comparisons such as targets, benchmarks, similar interventions, the assessments of teachers, students, others, etc.
Interventions have no control per se over outcomes; at best, a program strives to produce those outputs that have the greatest likelihood of producing the intended outcomes.
As such, an intervention’s effectiveness is driven primarily by two things: its design and its implementation, i.e. its management.
In many cases, attribution, i.e. internal validity, is typically the central challenge to assessing effectiveness of interventions, i.e. how and to what extent can observed changes in outcomes be attributed validly to interventions.
Where the internal validity of the programme is well established, e.g. an immunisation programme, attribution of outcomes, e.g. beneficiaries protected from disease because they have been vaccinated against that disease, can be fairly straightforward.
However, in the case of many development interventions, internal validity is not well established and attribution can become a significant challenge. For example, attributing validly a change in the incidence of human rights violations in a country to an intervention or set of interventions might be difficult for most evaluations.
Generally, the more the evaluand, i.e. what is being evaluated, is large and complex, the more attribution is likely to be difficult.
The reality of methodological and resource constraints in carrying out practical evaluation means that often attribution is expressed in terms of likelihood rather than proof, and that ultimately the test of validity is credibility.
Other challenges to assessing effectiveness are typically:
- Non-existent or poorly defined objectives, e.g. intended outcomes are not stated as measurable change over time in target groups
- Unrealistic and/or conflicting objectives
- Lack of targets or measures of success.
Impact is a measure of all significant effects of the development intervention, positive or negative, expected or unforeseen, on its beneficiaries and other affected parties.
Whereas effectiveness focuses on the intended outcomes of an intervention, impact is a measure of the broader consequences of the intervention such as economic, social, political, technical or environmental effects; locally, regionally, or at the national level; on the target group and other directly or indirectly affected parties.
For example an HIV/AIDS prevention and treatment programme targeting vulnerable groups could have broader effects both positive, such as a reduction in the incidence of tuberculosis on other groups, and negative, such as a reduction of funding to malaria prevention. Effects may also be economic in nature, e.g. size of the workforce, political, e.g. state budget allocation, and so on.
A broad assessment of impact is essential in a comprehensive evaluation, however there are two central challenges to assessing impact: boundary judgment, i.e. deciding what effects to select for consideration, and attribution, i.e. what effect is due to what.
Because, on the one hand, effects can be numerous and varied, and on the other they are typically the result of complex interactions, assessing impact is difficult in most circumstances.
As is the case for effectiveness, the assessment of impact poses a particular challenge with regard to attribution; in most cases, it is difficult to attribute rigorously broad effects on different groups and at different levels over time to a specific intervention or set of interventions.
Systems theory approaches typically provide more appropriate and useful tools for dealing with complex adaptive systems, e.g. societies, than simple linear causal approaches.
As well, a useful principle for dealing pragmatically with the issue of selection of effects, levels and groups for the evaluation, is making choices consistent with the intended use of the evaluative information.
In the final analysis, usually and at best, evaluations estimate impact on probability-based inferences derived from assumptions of simplified cause and effect relationships.
In the case of an impact evaluation, i.e. one in which measurement and assessment of impact is given priority, one must estimate the counterfactual – that means what would have happened if the intervention had not taken place. This can be done by choosing a control or comparison group – a group of individuals, households, etc., that are identical to the project group except for participation in the project or programme.
As well, baseline data, i.e. information about the state of groups before the intervention, is useful in order to measure differences after the intervention has taken place.
These methodological requirements pose particular challenges to the conduct of impact evaluation.
Sustainability is a measure of whether the benefits of a development intervention are likely to continue after external support has been completed.
While the four preceding criteria concern specific development interventions, the assessment of sustainability addresses the effects of the development process itself over the long term.
For example, in a road construction project, sustainability can be measured in terms of whether the road is likely to be maintained, the extent to which it will be used and provide benefits in the future, etc. In a sector programme to support agriculture, it could be measured in terms of financial and economic viability of the agricultural production and the supporting institutions, the extent to which economic surplus is reinvested productively by farmers, etc.
Sustainability is in many ways a higher level test of whether or not the development intervention has been a success. Far too many development initiatives tend to fail once the implementation phase is over, because either the target group or the responsible parties do not have the means or sufficient motivation to provide the resources needed for the activities to go further.
Sustainability is becoming an increasingly central theme in evaluation work since many development agencies are putting greater emphasis on long term perspectives and on lasting improvements.
As a result, capacity-development of communities and organisations is a common objective of development interventions, consistent with the overall goal of promoting increased autonomy and self-reliance of partner countries for the provision of public services.
Useful questions for assessing sustainability address the extent to which capacity has been successfully developed, e.g. through participation, empowerment, ownership, local resources are available and sustained political support exists.
As well, the sustainability of development interventions depend to a large extent on whether the positive impacts justify the required investments and whether the community values the benefits sufficiently to devote scarce resources to generating them.
Because sustainability is concerned with what happens after development activities are completed they are measured ideally some years afterwards. It is difficult to provide a reliable assessment of sustainability while activities are still underway, or immediately afterwards. In such cases, the assessment is based on projections of future developments based on available knowledge about the intervention and the capacity of involved parties do deal with changing contexts. It requires an analysis of the contextual setting – its capabilities and restraints – and future scenarios.
Experiences of donor agencies conclude that a development intervention’s sustainability hinges mainly on seven areas, also called sustainability factors.
These seven factors should be taken into account all along the implementation cycle, and should be used as a checklist during the evaluation to identify relevant questions:
Policy support measures
The recipient’s commitment is one of the most commonly identified factors affecting success of development interventions. Commitment is expressed in terms of agreement on objectives, the scope of support to responsible organisations and the willingness to provide financial and personnel resources. Country commitment is also shaped by perceptions of mutuality of interests versus perceptions of predominantly donordriven interests.
Choice of technology
The partner country’s financial and institutional capabilities are crucial determinants for the technology chosen and that the technology is accepted with mechanisms for its maintenance and renewal. Evaluation teams should consider the effects of technology in society and the costs of providing and maintaining the technology versus the benefits generated.
The importance of environmental considerations is now widely recognised. Although environmental effects may be negligible seen in a narrow, short-term perspective, the broader effects may be significant in a long-term perspective. Evaluations will frequently have to look specifically at environmental policy, incentives and regulatory measures, the interests of different stakeholders, and the effects of development interventions.
Social and cultural factors influence the adaptability and relevance of various development activities. They also influence motivation among the target group members and hence whether individuals and groups will participate and accept responsibilities in the development process.
Policy support measures
Policies, priorities, and specific commitments of the recipient supporting the chances of success.
Choice of technology
Choice and adaptation of technology appropriate to existing conditions.
Exploitation, management and development of resources. Protection of the environment.
Socio-cultural integration. Impact on various groups (gender, ethnic, religious, etc.).
Institutional and organisational capacity and distribution of responsibilities between existing bodies.
Economic and financial aspects
Economic viability and financial sustainability.
Political stability, economic crises and shocks, overall level of development, balance of payments status and natural disasters.
Development interventions that are consistent with local traditions or do not assume major changes in behaviour patterns stand a better chance of success. Danida requires special attention to the roles of both women and men in implementing development interventions, particularly their access to means of production and support services, as well as their rights and benefits.
The strength of institutions and the capacity of organisations are the most important factors in the success of development interventions. Current trends to change the division of roles between public and private organisations may raise the issue of how development interventions affect the co-operation and co-ordination of participating bodies. At the more detailed level, assessment may include considerations of managerial leadership, administrative systems and the involvement of beneficiaries.
Economic and financial aspects
Evaluations should focus essentially on three aspects of economic and financial performance. Firstly, the cost effectiveness of the intervention strategy. Secondly, the economic and financial benefits of investments as compared with the funds and resources spent. Finally, the financial sustainability of operations in the future to explore whether funds are or will be sufficient to cover future operations, maintenance and depreciation of investments.
Development assistance takes place in the context of political, economic and cultural environments that are beyond its control yet can influence it significantly. Factors such as political stability, economic crises and shocks, overall level of development, balance of payments status and natural disasters can have a determining influence on the sustainability of development interventions.
Additional criteria for evaluating humanitarian action
(Adapted from Beck, T. 2006)
Evaluation of humanitarian action, in response to natural disasters and to conflicts, has been the subject some attention over the last few years with a view to improving its quality. References for evaluators and for evaluation managers, such as “Evaluating Humanitarian Assistance Programmes in Complex Emergencies”,10 “Guidance for Evaluating Humanitarian Assistance in Complex Emergencies”11 and “Evaluating Humanitarian Action using the OECD/DAC Criteria”, are key to furthering the understanding and the quality of evaluation practise in this area.
Because of some of the distinct characteristics of humanitarian action, ALNAP proposes three additional criteria (to OECD/DAC’s five): connectedness, coherence and coverage.
Evaluation of humanitarian action (EHA) is defined by ALNAP as “a systematic and impartial examination of humanitarian action intended to draw lessons to improve policy and practice and enhance accountability.”
That they are undertaken usually during severe disruptions, often prolonged in the case of complex emergencies, give EHAs some distinct characteristics that can make access to data and information difficult, for example:
- Polarisation of perspectives in conflict situations that reduces the “space” for fair and balanced views;
- High turnover of staff working in humanitarian action that affects “organisational memory”;
- Reactive and quick implementation of humanitarian action that affects planning and the identification of performance measures;
- The context of humanitarian action may be disordered, with rapid changes in circumstances, invalidating as usual social and physical conditions.
Connectedness is defined as “The need to ensure that activities of a short-term emergency nature are carried out in a context that takes longer-term and interconnected problems into account.”
Connectedness is adapted from the concept of sustainability and is a measure of the relationship between humanitarian interventions and longer term goals, in particular the linkages between humanitarian action, recovery and development.
Some of the issues to consider when addressing connectedness are relative expenditure on relief and recovery, the nature of partnerships, in particular between international and national NGOs, and the extent to which local capacity is supported and developed.
Coherence is defined as “The need to assess security, developmental, trade and military policies as well as humanitarian policies, to ensure that there is consistency and, in particular, that all policies take into account humanitarian and human rights considerations.”
The focus of this criterion is on the extent to which the policies of different actors, e.g. military, civilian, political, were complementary or contradictory. Policies may be of any type such as promoting gender equality, participation or environmental protection.
Coherence may be the most difficult criterion to evaluate, in part because it is often confused with coordination. The evaluation of coherence focuses mainly on the policy level while that of coordination more on operational issues.
Evaluation of coherence is important where there are many actors and increased risk of conflicting mandates and interests. Questions on the degree of coherence observed, or otherwise, are also important and useful to address.
Coverage is defined as “The need to reach major population groups facing life-threatening suffering wherever they are.” The key questions this criterion generates are who was supported by humanitarian action and why.
Evaluation of coverage usually takes place at three levels: international, national or regional, and local, and the ecaluation should consider whether assistance was provided proportionally according to the need at each level.
Whether protection needs have been met is an important question to address and so are issues of inclusion and exclusion bias at regional and local levels.
Political factors often determine coverage so that analysing them is key to understanding the nature and extent of coverage of groups, including issues of protection and humanitarian space.
Finally, equity questions are central to the assessment of coverage and are addressed through geographical analysis and the organisation of data by socioeconomic categories such as gender, socioeconomic groupings, ethnicity, age and ability.
9 Beck, T. (2006). Evaluating humanitarian action using the OECD-DAC criteria: An ALNAP guide for humanitarian agencies. London, UK: Overseas Development Institute.
10 Hallam, A. (1998), Evaluating Humanitarian Assistance Programmes in Complex Emergencies. London, UK: ODI. Good Practice Review 7.
11 OECD (1999), Guidance for Evaluating Humanitarian Assistance in Complex Emergencies. Paris.
This page forms part of the publication 'Evaluation Guidelines' as chapter 5 of 9
Publication may be found at the address http://www.netpublikationer.dk/um/7571/index.htm