La valutazione da vicino - politichepubbliche.org · più citati per il disegno della valutazione...

49
La valutazione da vicino

Transcript of La valutazione da vicino - politichepubbliche.org · più citati per il disegno della valutazione...

La valutazione da vicino

“Evaluation is an objective process of understanding how a policy or otherintervention was implemented, what effects it had, for whom, how and why.”1.1. La scelta di politiche programmi progetti (ppp) da sottoporre avalutazioneLa scelta è in genere demandata a un organo politico. E tuttavia il tecnicopuò indirizzarla.Evaluability AssessmentAn assessment of the extent to which an intervention can be evaluated in areliable and credible fashionCriteri: la valutazione deve essere•useful (utile e utilizzata)•comprehensive•proportionate•practicable

1 Planning and Designing Useful Evaluations

Time and resources for analysis are limitedCriteri per definire le priorità:• materiality or the size of the program;• risk to clients, stakeholders, the agency and Government;• alignment with agency and government priorities;• complexity of delivery or uncertainty about programresults; (..)• external requirements for review (such as programs subjectto a Sunset Clause); (..)

1 Planning and Designing Useful Evaluations La scelta dei ppp

1.2. Criteri per imposare la valutazione: Rigour, Utility,Feasibility and Ethics in Program EvaluationUna volta selezionati i ppp da sottoporre a valutazione, i criteripiù citati per il disegno della valutazione sono:• ‘Rigour’ in evaluation refers to the quality of the evidence, and the validityand certainty around the findings. For results driven evaluations inparticular, rigour includes assessing the extent that observed results weredue to the program.• ‘Utility’ refers to the scope for evaluation users to actually use the findings,particularly when information is needed at a certain times to informdecisions.• ‘Feasibility’ refers to the practicalities of collecting evidence in relation tothe maturity of the program, and to the availability of time, skills and relevantdata.• ‘Ethics’ refers to reducing the risk of harm from the evaluation andalso doing the evaluation in a culturally appropriate way.

1 Planning and Designing Useful Evaluations

1.3. Prerequisito per una buona valutazione: BuildingEvaluation into the Design of Programs

OMB Director Peter Orszag: new initiatives should ideally have“evaluation standards built into their DNA” (OMB 2009b)

Legislation can encourage stronger and more cost-effectiveevaluation in many ways. One is through language thatrecognizes the importance of conducting rigorousevaluations. Another is by making sure already-collectedprogram data are made available for such statistical andanalytical purposes.

1 Planning and Designing Useful Evaluations

Per poter valutare ex post una politica, occorre che dalle fasi inizialiemergano:

1. una chiara ‘teoria del cambiamento’, una policy theory: cioè unaprecisa definizione di quali meccanismi dovrebbero essere attivatiper produrre i risultati attesi, e perché

2. risultati SMART (v. slide successive)3. indicatori OVI (Objectively Verifiable Indicators). Le misurazioni

sono importanti: “Se non puoi misurare una cosa, non puoimigliorarla” (Lord Kelvin, primo presidente della InternationalElectrotechnical Commission (1906) e inventore della ‘scala diKelvin’)

Quindi, il disegno iniziale deve già prevedere alcuni indicatori dautilizzare nella fase della valutazione ex post

1 Planning and Designing Useful Evaluations Prerequisito:l’importanza del policy design

‘SMART’ results: A describable and measureable change that is derivedfrom a cause-and-effect relationship.‘SMART’ results are the same as outcomes and are defined as Specific,Measurable, Attainable, Relevant and Time-bound.

Reasonable timeframe to achieve the goal. A time-bound result isintended to establish a sense of urgency. For example, can databe collected to ensure that it aligns with the required reportingtimelines?

Time-bound

Choosing results that matter within the constraints of resources,knowledge and time. That is, results that will drive the programforward.

Relevant

Is there a realistic path to achievement? Neither out of reach norbelow standard performance.

Attainable

The need for concrete criteria for measuring progress and toknow when it has been achieved.

MeasurableClear and well defined.Specific

DescriptionCriteria

Wikipedia 2013, ‘SMART’ Criteria http://en.wikipedia.org/wiki/SMART_criteria

1 Planning and Designing Useful Evaluations Prerequisito: l’importanza delpolicy design

Indicators

Are indicators safe against manipulation?Robust

Is data/info cheaply available and easy to monitor?Easy

Are they credible for reporting purposes? Easy to interpret?Credible

Have they been discussed with staff and their shortcomings andinterpretation agreed?

Accepted

Is there a clear link between the indicator and the objective to bereached?

Relevant

DefinitionRACERcriteria

1 Planning and Designing Useful Evaluations Prerequisito: l’importanza delpolicy design

La scelta dei metodi:• quantitativi• qualitativi• mixedMixed methodsSINGLE METHOD IS NOT SUFFICIENT ANY MOREPublic policies, programmes and service delivery operate in increasingly

complex and ever-changing social, economic, ecological andpolitical contexts. No single M&E methodology can adequatelydescribe and analyze the interactions among all of these differentfactors.

Mixed methods allow for triangulation – or comparative analysis – whichis better suited to capture complex realities and to provide differentperspectives on the effect of public policies, programmes or servicedelivery.

1 Planning and Designing Useful Evaluations

1 Planning and Designing Useful Evaluations la scelta deimetodi mixed

I principals della valutazione: 3 diversi referentiWho should formulate the evaluation questions?

Who should design and implement the evaluation research process andtake responsibility for its outputs?

Three alternative models of evaluation:1. External evaluation2. Internal evaluation3. Participatory evaluation

1. External and internal evaluationBoth are based on a sharp distinction between the evaluators and the

evaluated and both rely on conventional social science researchmethods.

1 Planning and Designing Useful Evaluations

External evaluationIndependent evaluations are based on a clear demarcation between thosewho conduct the evaluation and those who are the object of evaluation.

• The evaluators stand outside the evaluated activities and have nostake in the outcome of the evaluation.

• Stakeholder participation tends to be limited.In a stronger sense, an evaluation is independent when the organisation thatformulates the evaluation questions and recruits the evaluators is alsoindependent of the evaluated activity

1 Planning and Designing Useful Evaluations i principals dellavalutazione

Internal evaluationsThe evaluators are organisationally attached to the evaluated activities.In many cases there are institutional safeguards protecting theindependence and integrity of internal evaluators.Advantages:•internal evaluators tend to have a better understanding of the organisation to beevaluated.•they are in a better position to facilitate processes of use, learning, and follow-up.•Disadvantages:•they have less credibility in relation to external audiences. When accountability isthe purpose of the evaluation they cannot replace external evaluators.

1 Planning and Designing Useful Evaluations i principals dellavalutazione

Participatory evaluationthe distinction between expert and layperson, researcher and researched, isde-emphasised and redefined.Participatory evaluations are led by professionals: they are mainlyfacilitators and instructors helping others to make the assessment.

Participation means putting ordinary people first, and redefining the roles ofexperts and laypersons.

– 1. People have a voice in matters that affect their interests.– 2. Participation helps mobilise local knowledge

1 Planning and Designing Useful Evaluations i principals dellavalutazione

Participatory evaluation•serves as an instrument of downward accountability and popularempowerment.•increases the effectiveness of efforts by mobilising popular knowledge•strengthens the participants’ sense of ownership with regard to theevaluated activities.As it allows participants to engage in open and disciplined reflection onquestions concerning the public good, it is seen as a kind of self-educationin democratic governance. (v. policy inquiry: l’analisi come riflessività)

Distinction between•the general concept of stakeholder participation•a more narrow concept of popular participation.

1 Planning and Designing Useful Evaluations i principals dellavalutazione

The best way to promote popular participation is to strengthen the elementof participation in the preceding stages of the intervention process.Necessary conditions for the successful use of participatory approaches:

Shared understanding among beneficiaries, programme staff and otherstakeholders of programme goals, objectives and methods.

Willingness among programme partners to allocate sufficient time andresources to participatory monitoring and evaluation.

A bottom-up methodology: Adopting a participatory approach can bedifficult if the intervention has been planned and managed in a top-downfashion.

A reasonably open and egalitarian social structure. Where localpopulations are internally divided by distinctions of class, power and status aparticipatory approach is not likely to be successful.

1 Planning and Designing Useful Evaluations i principals dellavalutazione Participatory evaluation

Tipi di evaluation (v. US, 2014)1. “process” evaluationanalyzes the effectiveness of how programs deliver services relative to programdesign, professional standards, or regulatory requirements. (..) Processevaluations help ensure that programs are running as intended, but in generalthese evaluations do not directly examine whether programs are achieving theiroutcome goals (GAO 2011)2. “performance” measurementPerformance measurement is a broader category that encompasses “the ongoingmonitoring and reporting of program accomplishments, particularly progress towardpre-established goals” (GAO 2011).Typically, performance measures provide a descriptive picture of how aprogram is functioning and how participants are faring on various “intermediate”outcomes, but do not attempt to rigorously identify the causal effects of theprogram. For instance, performance measures for a job training program mightcapture how many individuals are served, what fraction complete the training, andwhat fraction are employed a year later. But these measures will not answer thequestion of how much higher these individuals’ employment rates are as a result ofhaving completed the training. Nonetheless, performance measures serve asimportant indicators of program accomplishments and can help establish that aprogram is producing apparently promising (or troubling) outcomes.3.“impact” evaluationaims to measure the causal effect of a program or intervention on importantprogram outcomes.

2 Approaches to Evaluation and Key Questions

“process” evaluation

“performance” measurement

“impact” evaluation

2 Approaches to Evaluation and Key Questions

INPUTS PROCESSI VARIABILIESTERNE

OUTPUTS OUTCOMES IMPATTO

ECONOMICITA’

CONTROLLO INTERNOALL’ENTE

EFFICIENZA EFFICACIA

POLITICA VALUTATA

Dall’analisi alla valutazione: il modelllo logico

2 Approaches to Evaluation and Key Questions

Gli aspetti sottoposti a valutazione sono riconducibili alle seguenti categorie:

- effetti duraturi nel tempo- effetti a prova di condizioni avverse...

Capacità di modificarestabilmente il contesto ole condizioni di vita deidestinatari

Impatto

- adeguatezza- tempestività- soddisfazione dei destinatari

Capacità di forniresoluzioni ai problemisecondo criteri diefficacia

Risultati(Outcomes)

- beni- servizi- capacità- informazioni..

Capacità di fornire iprodotti richiesti secondocriteri di efficienza

Prodotti(Outputs)

- le interazioni con gli stakeholders- i conflitti allocativi- gli imprevisti dell’attuazione..

Capacità di gestire lacomplessità

Processo

- risorse finanziarie- risorse umane- ICT- risorse conoscitive- procurement di beni e servizi (diretto, outsourcing..)- comunicazione..

Capacità di acquisire lerisorse necessariesecondo criteri dieconomicità

Input

2 Approaches to Evaluation and Key Questions

‘al netto’ delle dinamiche esterne

di lungo periodoimpattoper le capacità dell’amministrazione

per l’ambiente fisico..

per il contesto socio-economico

per i destinataririsultato•valutazione delprogramma•efficacia, risultati,impatto•arco temporale lungo

manutenzioni..

regolazioni

servizi

benioutputtempestività...

coordinamento

consenso

leadershipprocessogli investimenti in ICT

le modalità di acquisizione dei beni e servizi dall’esterno

la nitidezza formale del mandato

risorse umane

risorse finanziarieinput:•valutazioni diperformance interna•Efficienza, output•arco temporale breve

Data l’importanza del modello logico, ne forniamo una seconda visualizzazione,perché sia chiaro lo schema di fondo

2 Approaches to Evaluation and Key Questions

2. Performance measurementQuesto esercizio può essere attuato in prospettive analitichediverse da quella della ppp evaluation

prospettiva ‘budget performance budgeting“Pay for Success” approach: a form of budgeting that relatesfunds allocated to measurable results

prospettiva management organizational Performancemetodi per aumentare l’allineamento dell’organizzazione agliobiettivi che si è data

2 Approaches to Evaluation and Key Questions

2. Performance measurement: rischiGoodhart's law is named after the banker who originated it,Charles Goodhart. “Any observed statistical regularity will tendto collapse once pressure is placed upon it for controlpurposes” Goodhart's original 1975 formulation, reprinted on p.116 in Goodhart 1981Its most popular formulation is: "When a measure becomes atarget, it ceases to be a good measure." The law is implicit inthe economic idea of rational expectations (Wikipedia)

2 Approaches to Evaluation and Key Questions

Rischi ES: metodo di finanziamento delle università

2 Approaches to Evaluation and Key Questions performancemeasurement

Rischi ES: metodo di finanziamento delle università

2 Approaches to Evaluation and Key Questions performancemeasurement

Gov’t/programproductionfunction

Users meetservicedelivery

INPUTS

OUTPUTS

OUTCOMES

IMPACTSProgram impactsconfounded by local,national, global effects

difficulty ofshowingcausality

slide da: Lori Beaman,Impact Evaluation: An Overview

La valutazione più difficile è quella finale, decisiva: Impact Evaluation

Possiamo davvero attribuire al nostro intervento i cambiamenti osservati tra idestinatari, o piuttosto questi effetti vanno attribuiti a fattori esterni al progetto?

2 Approaches to Evaluation and Key Questions

Impact Evaluation =

Form of stable outcomes evaluation

Assess net effect of a program

Compare

ProgramOutcomes

Estimate of what wouldhave happened in theabsence of the program

S. Kay Rockwell, A Hierarchy for Targeting Outcomes and Evaluating TheirAchievement http://deal.unl.edu/TOP/

2 Approaches to Evaluation and Key Questions

La valutazione dell’impattoQuesta parte della valutazione cerca di rispondere proprio alla domandapiù difficile: è fondato attribuire alla politica i cambiamenti osservati dopola sua implementazione?

E’ sbagliato inferire dalle modificazioni dell’outcome il successo dellapolitica: NO post-hoc propter hoc.

La logica controfattuale cerca di evitare questo errore: l‘effetto di unintervento è la differenza tra quanto si osserva in presenzadell'intervento e quanto si sarebbe osservato in sua assenza (ilcontrofattuale).

2 Approaches to Evaluation and Key Questions Impact evaluation

Estimation of Causal Effects of a Program orInterventionLa metafora del ‘trattamento’Consider a treatment delivered at the individual level: eitherthe individual received the treatment, or did not. Thedifference between the potential outcome if the individualreceived the treatment and the potential outcome if theindividual did not is the effect of the treatment on theindividual.

2 Approaches to Evaluation and Key Questions Impact evaluation

Estimation of Causal Effects of a Program orInterventionThe problem of not observing the counterfactualoutcomeThe challenge of estimating this treatment effect stems fromthe fact that any given individual either receives thetreatment or does not (for example, a child either does ordoes not attend preschool). Thus, for any given person onlyone of two potential outcomes can be observed. The factthat we cannot directly observe the counterfactual outcome(for example, the earnings a person who went to preschoolwould have had if they had, in fact, not gone to preschool)implies that we cannot directly measure the causal effect.Randomization provides a solution

2 Approaches to Evaluation and Key Questions Impact evaluation

Estimation of Causal Effects of a Program orInterventionRandomized experimentsBenefici e costi:-difficile selezionare gruppi di controllo adeguati-limiti operativi e normativiQuasi-experimentsBecause randomized experiments can be expensive orinfeasible, researchers have also developed methods to useas-if random variation in what is known as a quasi-experiment. The necessary condition for a high-qualityquasi-experimental design is that people are assigned to atreatment or control group in a way that mimicsrandomness. This can be done by forming treatment andcontrol groups whose individuals have similar observablecharacteristics,

2 Approaches to Evaluation and Key Questions Impact evaluation

t=0 t=1tempo

effetto osservato

controfattuale

risultati

impatto

L’analisi controfattuale

2 Approaches to Evaluation and Key Questions Impact evaluation

Un esempio. Un'amministrazione locale stanzia 10 milioni di euro per offrire contributialle imprese che assumono a tempo indeterminato propri collaboratori a progetto (5.000euro x ogni stabilizzazione).In poche settimane, sono richiesti e erogati tutti i finanziamenti, per 2000 effettivestabilizzazioni. Sembra un grande successo.Ma un ricercatore propone una verifica rispetto al numero di stabilizzazioni che sisarebbero probabilmente verificate in assenza di interventi.

Il numero di stabilizzazioni che eccedono il trend scende a 700

2 Approaches to Evaluation and Key Questions Impact evaluation

Un altro ricercatore, ancora più curioso, effettua una comparazione con lestabilizzazioni che nello stesso periodo si sono verificate in contesti economiciparagonabili a quello dell’intervento.

Ogni stabilizzazione attribuibile all’intervento ha quindi avuto un costo di 50.000 euro(10 mil./200). Con ogni probabilità, i soldi potevano essere spesi meglio(adattamento da Valut-AZIONE – CAPIRe, Dicembre 2012, Gli incentivi dati alle impreseriescono a ridurre il precariato? (da uno studio di Alberto Martini e Luca MoCostabella) http://www.capire.org/capireinforma/scaffale/valut-azione6122012.pdf

2 Approaches to Evaluation and Key Questions Impact evaluation

Recenti tendenze1. Il Social Problem SolvingSoprattutto i criteri 7. Partecipazione, e 8. Collaborazione, meritano unaparticolare attenzione.Dalla seconda metà degli anni ’80, anche nel campo delle politichepubbliche i modelli per il problem solving prestano grande considerazionealle valutazioni dei destinatari: singole persone, famiglie, imprese,associazioni.I cittadini, in quanto utilizzatori – sperimentatori delle politiche pubbliche,accumulano competenze straordinarie circa- la natura dei problemi- le loro cause- le possibili soluzioni- il funzionamento dell’implementazione- la valutazione dei risultati.Le loro esperienze possono valere come migliaia di rilevazioni sperimentalifatte sul campo.

3 Recenti tendenze

2. Una valutazione più attenta dei risultati latentiSul piano metodologico, sono stati formulati correttivi per compensare latendenza implicita in alcune applicazioni della AVP a fornire un bilancionegativo dei programmi esaminati.La valutazione di apprezzamento (appreciative evaluation), ad esempio,permette di portare alla luce i risultati positivi latenti, in modo da rafforzare lafiducia interna all’organizzazione, favorire la diffusione di buone pratiche,aumentare la credibilità verso l’esterno.In questa logica, alle tradizionali funzioni della valutazione, l’apprendimentoe l’accountability, si aggiunge anche una terza funzione: la creazione dellafiducia e del capitale sociale.

3 Recenti tendenze

3 Recenti tendenze

THE MICRO-NARRATIVE

What is it?

• the collection and aggregation of thousands of short stories from citizens using specialalgorithms to gain insight into real-time issues and changes in society

Why is it innovative?

• information collected in the shape of stories is interpreted by the person who has told astory, therefore removing the need for - and the potential bias of - a third party tointerpret the data; this meets a core challenge for M&E by reducing or eliminatingpotential biases of monitoring staff and evaluators (^process improvement)

• by using a large number of stories, this approach turns previously mostly qualitativedata (e.g., in the form of a limited number of not representative case studies included inan evaluation) into aggregated statistical data; the approach has the potential to replacetraditional monitoring tools like surveys and focus groups (=catatytic)

• pattern detection software for analyzing micro-narratives exist and the approach isalready implemented in a number of countries and projects (=concrete)

How and when

• when real-time quantitative information from a large number of beneficiaries is requiredthat cannot best to use it: otherwise be collected

General quality criteria:

CLEAR STATEMENT OF THE EVALUATION QUESTIONSThe readers can understand how the information in the report should be interpreted.

CLEAR PRESENTATION OF CRITERIA AND STANDARDS OF PERFORMANCEThe grounds for value judgements made in the report should be explicitly stated.

TRANSPARENT ACCOUNT OF RESEARCH METHODSThe report should include an account of sources of data and methods of data collection

JUSTIFIED CONCLUSIONSIt should be possible for readers to follow each step of the argument leading from question toanswer. Supporting evidence should be clearly presented and alternative explanations offindings explicitly considered and eliminated.

IMPARTIAL REPORTINGThe perspectives of all major stakeholder groups should be impartially reflected in the report.It must cover both strengths and weaknesses, and should not be written in a manner thatsuggests that it is totally unbiased and represents the final truth.

CLEAR STATEMENT OF LIMITATIONSAll studies have limitations. Therefore, an account of major limitations should normally beincluded in reports.

4. Reporting and Dissemination

Reporting format

from the Sida’s Evaluation Manual.Swedish International Development Cooperation Agency

(SIDA)

Recommended Outline

EXECUTIVE SUMMARYINTRODUCTIONTHE EVALUATED INTERVENTIONFINDINGSEVALUATIVE CONCLUSIONSLESSONS LEARNEDRECOMMENDATIONSANNEXES

Summary of the evaluation, with particular emphasis on• main findings,• conclusions,• lessons learned and• recommendations.

Should be short!!!

EXECUTIVE SUMMARY

Reporting format

INTRODUCTION

Presentation of the evaluation’s purpose, questions and mainfindings.

FINDINGS

Factual evidence,data and observations that are relevant to the specificquestions asked by the evaluation.

Reporting format

EVALUATIVE CONCLUSIONS

Assessment of the intervention and Its results against• given evaluation criteria,• standards of performance• and policy issues.

Reporting format

LESSONS LEARNED

General conclusions that are likely to have a potential for widerapplication and use.

Reporting format

RECOMMENDATIONS

Actionable proposals to the evaluation’s users for improvedintervention cycle management and policy.

Reporting format

ANNEXES

• Terms of reference,• methodology for data gathering and analysis,• references, etc.

Reporting format

Some questions to be asked

Was there a specific objective for the evaluation – also to befound in the Terms of Reference (ToR)?Were the ToR attached to the evaluation?Were the qualifications of the evaluators explicitly stated?Were there OVIs (Objectively Verifiable Indicators)?Were there any specific references to Guidelines, Manuals,Methods in the ToR and in the Evaluation itself?Is it clear from the document when, where and by whom theevaluation was made?Was a base-line study needed? If so, was it carried out?Has the issue of cost-effectiveness been dealt with? Is thereany discussion of costs and benefits in the evaluation? (...)

Bibliografia

Associazione per lo Sviluppo della Valutazione e I'Analisi delle Politiche Pubbliche(ASVAPP) (2012), Counterfactual impact evaluation of cohesion policy: Impactand cost-effectiveness of Investment subsidies in Italy, http://www.prova.org/studi-e-analisi/ASVAPP%20CIE%20WP1%20FINAL%20REPORT.pdf

Beaman L., Impact Evaluation: An Overview, UC Berkeley,http://cega.berkeley.edu/assets/cega_events/31/Impact_Evaluation_Overview.ppt.

Hallam Alistair (2011), Harnessing the Power of Evaluation in Humanitarian Action:An initiative to improve understanding and use of evaluation, The Active LearningNetwork for Accountability and Performance in Humanitarian Action (ALNAP)http://www.alnap.org/resource/6123.aspx

International Federation of Red Cross and Red Crescent Societies (2011), IFRCFramework for Evaluation http://www.ifrc.org/Global/Publications/monitoring/IFRC-Framework-for-Evaluation.pdf

Médecins Sans Frontières (MSF) (2013), Evaluation Manual. A handbook forinitiating, managing and conducting evaluations in MSF,http://evaluation.msf.at/fileadmin/evaluation/files/documents/resources_MSF/Evaluation_Manual_April_2013_online.pdf

Molund S. and Schill G. (2004), Looking Back, Moving Forward, Sida EvaluationManual,http://gametlibrary.worldbank.org/FILES/244_Evaluation%20Manual%20for%20Evaluation%20Managers%20-%20SIDA.pdf