Skip to content

Building a culture of evaluation in the public sector

14 May 2024

News and media


Bright business workplace with people in walking in blurred motion in modern office space

High quality evaluation is increasingly being recognised as a vital part of improving public policy design and implementation.

In line with this important objective ANZSOG, together with the Australian Centre for Evaluation, the APS Academy and the Treasury, sponsored Dr Dan Levy, Senior Lecturer at the Harvard Kennedy School for a program of events earlier this year where Dr Levy shared his expertise with senior federal public servants.

He spoke of the need to ignore gut feelings and received wisdom in favour of the findings of rigorous evaluation, the value of choosing appropriate evaluation methods for particular programs, and why public servants should always focus on the people they are trying to help, not become wedded to specific programs.

The central event was a large-scale hybrid workshop hosted by the APS Academy in Old Parliament house on the topic of impact evaluation, attended by over 470 public servants. The interactive workshop discuss international examples of rigorous impact evaluations and the practical challenges evaluators face.

The session was recorded and is now available through the APS Academy’s resource library.

Dr Levy spoke of his 27 years of evaluating programs all over the world, in areas such as education, health, welfare, and so on, and the lightbulb moment that shaped his career.

“About 27 years ago I was assigned to be a research assistant of a project that was evaluating the impact of giving textbooks to kids that live in rural villages in Kenya, at a time when kids in rural villages in Kenya did not have access to textbooks,” he said.

“I remember when I was told that I was going to work on this project, saying, why in the world would we want to know the impact of textbooks on learning? What are more basic inputs to learning?”

“A few months later, we do the statistical analysis and we find that – on average – the impact of textbooks in Kenya was zero. We gave the textbooks and kids didn’t seem to have learned.”

“When we looked closer there was a group of children that did seem to learn with the textbooks, and those were the children who were more advanced to begin with. That led us to do some qualitative work where we realised that perhaps part of the reason this effect was zero is because the kids that were at the bottom of the distribution did not know how to read.”

“From that moment on, I stopped trusting my gut about what works and doesn’t work, and I started looking for evidence as to what actually works.”

He said that choosing comparison groups, similar to a randomised medical trial, was a very important part of the evaluation process, and allowed you to show that impact was due to the program being evaluated.

“When you’re trying to measure impact, what you are trying to do is to see how participants in a program do fare in the program that they introduce. So that’s the yellow line. I’m going to call that the factual.”

“You are trying to compare that with the blue line, which is what would have happened to this participant in the absence of your program. So, to measure impact, it’s not enough to compare the before and after. You have to think about what would have happened to your participants if they hadn’t been in the program.”

“That’s actually very important because rarely is the case that participants are in a world where nothing is happening. Most of the time, things are happening. They might not happen as well as the program that you are evaluating, but they are happening.”

He said that evaluation was a resource-intensive process, and often evaluations of programs were being asked to answer several different questions at once.

“Always ask what decisions you are seeking to inform before you design your evaluation, because the decision you seek to inform is going to help you determine the method that you are using to inform that decision.”

“Ask ‘would I do something different if the results of the study came this way or if they came that way?’ and if your answer to that question is no, I would just do the same.”

He said that public managers also needed to be aware of the difference between statistical significance, of interest to academics, and the practical significance of creating a measurable effect large enough to justify continuing with a program ahead of other options.

He finished with another example he was involved in of evaluation of a school construction program funded by the US government.

“In one of the studies, we found basically little to no effect of these new schools – which were materially better than the ones they replaced – on school enrolment, on attendance, and especially on work.”

“So, suppose you were literally working for the agency that built the schools. What would you do with this result? I might think about what my assumptions were before the program started. What is going to motivate these students to attend more, and does that fit with what I’m actually providing?”

“But I had a private conversation with one person working at the agency that builds schools who just said: ‘we’re in the business of building schools’. They didn’t think we were in the business of increasing school involvement and attendance and learning.”

“My point is: be an advocate for the people you’re trying to help, not the program that you’re using to help those people.”

Eleanor Williams, Managing Director of the Australian Centre for Evaluation, said that the visit supported the current APS reform effort and its emphasis on building internal capacity to deliver high quality evidence for government decision making.

This event was followed by a roundtable hosted by the Behavioural Economics Team Australia (BETA) at the Department of Prime Minister and Cabinet with a focus on how to improve the supply, demand and productive use of evidence for policy-making, which was attended by senior executives from a range of agencies.

Key themes from this discussion included the value of:

  • both technical and soft skills within agencies to generate, translate and communicate high quality research and analysis to inform decisions;
  • curiosity and a ‘scout mindset’- external site from system leaders in understanding whether policies and programs are working and why; and
  • synthesising a range of perspectives and developing a good understanding of context to determine how to continually improve policies and programs (and stop them when required)

Dr Levy’s visit was rounded out by a workshop with the Australian Centre for Evaluation and Assistant Minister Andrew Leigh which emphasised the importance of having a clear, collective mission of improving the quality, volume and use of evaluation, and the need for champions and advocates at all levels and across portfolios in order to achieve system level change in the APS.