Kicking off our research programme into faith-based charities last year, it is fair to say we didn’t appreciate the complexities ahead. While we might have expected some debate about the changing role of faith in our society or the increased diversity of the population, we didn’t anticipate that the data itself would be such a challenge. In the interests of transparency and sharing learning, I want to reflect on some of the trials and tribulations over the last eight months and explain why I still have faith in data.

The first step in our research was simply to identify the population of faith-based charities, to provide a clear picture of the size and characteristics of the group. This was not as simple as it sounds. There are a number of existing methods for classifying charities as faith-based—from the Charity Commission annual return which has a ‘religious activities’ category; from a charity’s objects or statement of purpose which can include the ‘advancement of religion’; and through the International Classification of Nonprofit Organisations (ICNPO)—an adaptation of which is used by NCVO for the Almanac—which has a ‘religion’ group. However we found that these all underrepresented the true number of faith-based charities for a variety of reasons (see our methodology for more details), so we chose to develop our own approach.

Getting the data into a usable format was the next challenge. While the Charity Commission has an open data set available to download, it requires the use of a programming language such as Python or SQL to convert it into an accessible format which can be read in a programme like Microsoft Excel. We then had to create a relational database from scratch to conduct the in-depth analysis required. At NPC we believe in the power of data and have decent data analysis skills in house, but we still worked with an expert (Jay Liu of Digital-Dandelion.com) to carry out this work. This illustrates the barriers for charities in making the most of data we have highlighted before—even when data is accessible it often still requires specialist capabilities and technology to use.

Our approach combined the three classifications mentioned above with a text analysis to create a more comprehensive dataset. We created key word lists for the seven faiths that the research covers, plus a list of more general faith words, and checked these against charity names and objects. We then checked the results to see how well the analysis was working as one of the limitations of text analysis is that it can’t look at context or meaning. We found some words with two or more meanings, which meant that some charities were being classified as faith-based when they are not. We used this manual checking process to filter and narrow our key words, performing another text analysis and refining the numbers.

Once we had completed the text analysis we developed a set of ‘logics’ or rules to categorise the charities further. You can read more about this here, but I want to highlight how iterative the approach was. Our team spent a lot of time checking our data to refine our logic and a significant amount of manual testing took place, not always the most enjoyable task on a database of over 180,000 charities! But it was very rewarding to be generating new data and we believe this is the most advanced and up-to-date analysis of faith-based charities in Great Britain. This painstaking work also reminded us of the value of initiatives like the Justice Data Lab, which provide organisations with access to complex data analysis capacity in a cost effective way.

Check out our initial results here and watch this space for further analysis of our database. Despite the challenges, I look forward to understanding even more about faith-based charities through the power of data.

Footer