two people working on post it notes

4. Evidence standards

Understanding how evidence standards apply to your intervention.

In our guide to Understanding impact, we explore how to use your theory of change to build a measurement and evaluation framework. In this closer look we delve into how evidence standards strengthen your approach.

Standards help us think about what kinds of evidence different research designs can produce, and how stakeholders will view the credibility of that evidence for demonstrating impact. Click To Tweet

Several organisations have produced ‘standards’ of evidence, including the Social Research Unit’s standards of evidence, Nesta’s adaptation of the SRU’s standards, and the Maryland Scale of Scientific Rigour. These offer a way to think through how the credibility of evidence can be influenced by the design of an evaluation and the methods. However, it’s important to understand that the majority focus exclusively on formal evaluation or ‘impact measurement’ (the ‘is anyone better off?’ question), which may not be appropriate or feasible for your service.

Standards generally place experimental approaches, such as randomised control trials (or systematic reviews that summarise a number of RCTs) at the higher levels of their hierarchies. These approaches are favoured as they are seen as the most robust method of answering ‘what works?’ questions that focus on causality and attributing impact to a particular intervention. Being able to compare results from ‘treatment groups’ (that receive the intervention) with ‘control groups’ (that don’t) can provide strong evidence that the intervention was responsible for bringing about any observed changes.

These standards share some similarities, with higher complexity comparative approaches at the higher end of the hierarchies. However, they also exhibit some key differences:

  • The Nesta Standards of Evidence aim to draw together different approaches to evaluation (experimental and theoretical), while also thinking about what evidence is useful to help organisations and investors assess the potential for scale.
  • The EEF scale is much more focused on experimental approaches and differentiates between them in greater detail. This reflects the EEF’s focused role in developing the scientific evidence base around educational
  • Our NPC scale reflects our aim to assess charity evaluations, recognising that the high-cost research designs at the top of other hierarchies are often out of reach of individual voluntary sector organisations. Our scale therefore looks to differentiate to a greater degree between approaches that would have sat at levels 2 and 3 in the Nesta Standards.

The successful implementation of a programme is not just about the evidence behind it: the Realising Ambition Programme’s Evidence-Confidence Framework usefully extends beyond evidence standards in order to understand the scope for replication and the scale of interventions, to include factors such as whether the service is tightly defined, and whether it is effectively and faithfully delivered.

Standards can help us think about what kinds of evidence different research designs can produce, and how different stakeholders will view the credibility of that evidence in terms of demonstrating impact. However, given their focus on particular evaluative questions (i.e. ‘what works’), they are not designed to provide a comprehensive guide for organisations who are thinking about what evaluation approach to take.

For more on how to decide which design is best, see our paper on proportionate evaluation.


Understanding quality in evaluation

How to determine the quality of a study is rarely covered by evidence hierarchies, which tend to focus exclusively on the extent to which causality can be established. This will often be out of reach for charities, which makes the issue of quality far more relevant and pressing. It includes:

  • Methodological quality: How well executed was the research?
  • Appropriateness of methods: Does the method match the research questions and purpose of the study? Are samples representative?
  • Quality in reporting: Is the research presented in a way that can be appraised and used by others?
  • Relevance to policy and practice: Does the research address important policy and practice questions in a way that is both useful and useable?

With these criteria in mind, a poorly designed RCT could produce less credible and less useful results than a rigorously applied before-and-after statistical analysis.

Quality is therefore about adhering to good social research practice and understanding the context in which you are working. For example, Bond’s evidence principles focus on quality in the context of international development, providing a checklist for organisations to think through when reviewing or designing evaluations:

  • Voice and inclusion: The perspectives of people living in poverty, including the most marginalised, are included in the evidence, and a clear picture is provided of who is affected and how.
  • Appropriateness: The evidence is generated through methods that are justifiable given the nature of the purpose of the assessment.
  • Triangulation: Conclusions about the intervention’s effects use a mix of methods, data sources and perspectives.
  • Contribution: The evidence explores how change happens, the contribution of the intervention and factors outside the intervention in explaining change.
  • Transparency: The evidence discloses the details of the data sources and the methods used, the results achieved, and any limitations in the data or conclusions.



The validity of your result or measurement framework can be assessed in two ways:

  • Whether your tools for measurement are accurately capturing the phenomenon you’re trying to measure: An example of this is construct validity. Are your measures actually capturing the thing you’re trying to measure, or is it capturing something else?
  • How generalisable your research is: Examples of this are internal and external validity. Internal validity asks whether your evaluation approach is constructed with enough quality that it provides an approximate truth of what is going on within your evaluation’s context. External validity asks whether findings from an evaluation are only applicable in your own context, or could they be applied outside of your particular context?


Economic analysis

Success is not only about the effectiveness of a service, but also its cost-effectiveness. A high-impact, high-cost service may be more cost-effective than a low-cost, low-impact service. The latter could provide ‘more bang for your buck’.

There is always interest in identifying, measuring, and reporting on cost savings and efficiency gains, and converting social impact into monetary terms. Approaches that do this, such as cost-benefit analysis and social return on investment, can be very useful, but done poorly the results can be misleading.

Estimating the financial or economic value of a charity’s impact typically requires first the collection of robust impact data, then the conversion of some, or all of it, into monetary values. Both a theory of change and measurement framework can be seen as the first step in this process.


Read our full guide to using your theory of change to build a measurement and evaluation framework.

Related items