The importance of data in the fight against COVID-19

person working on a laptop
  • Published Date: 06 November 2020

ANZSOG has provided up-to-date information to public sector leaders on the COVID-19 crisis, through initiatives such as the Leading in a Crisis series, and the upcoming Future public sector leaders’ series, which brings together leading practitioners and academics to discuss key post-COVID issues facing public sector leaders.

This article is written by Matthew James, the Deputy Chief Executive Officer, The Australian Institute of Health and Welfare, and focuses on the importance of good, timely data in fighting COVID-19 and the challenges agencies face in trying to collect it.

The COVID-19 pandemic has raised new challenges for data collection, in particular the need for timely data – data from even a few months ago can often tell us little about the situation now. Agencies are trying to find a balance between timeliness and accuracy, in an environment where there is widespread interest in daily data and reports due to the profound impact the pandemic has had on the community.

Commencing with the first confirmed case of COVID-19 in Australia in late January 2020, the Australian Government’s Department of Health became central to managing the outbreak as a health emergency, providing relevant and timely data to the Prime Minister, Minister for Health and State and Territories across Australia. The Australian Institute of Health and Welfare (AIHW), along with other government agencies, has provided practical assistance and expertise to assist the government with its immediate data needs.

As an example, the AIHW has been compiling data each week on the use of mental health services and from the various crisis lines to help inform government and the Australian Bureau of Statistics (ABS) has been particularly active under the pandemic, providing timely and reliable survey data and collating timely administrative data from, for example, the single touch payroll system.

In some cases, we are seeing major changes from one week to the next, so you need to know what week data refers to interpret in context. The reason that timely data is so critical is that things are changing so rapidly as a result of, among other things, lockdowns that have been designed to stem the spread of the virus. To give one example, the number of people who are employed fell by 607,000 in April and by a further 264,000 in May. The decline in employment in April was by far the biggest fall in employment in a month since the current labour force series commenced in February 1978 - the next largest fall was 74,900 in November 1992. This is unprecedented in Australian history. Lenin once observed that ‘there are decades when nothing happens and there are weeks when decades happen’.

The rapid pace of change means that the traditional trade-off between data quality and timeliness has changed. With large changes in a short period of time, a data source that may not be normally be seen as the most reliable, is likely to pick up this change. The nexus between signal and noise has also changed. With rapid changes in a short period of time any data source is likely to pick up an actual signal and not just noise.

The need for data quality

While the need for timely data is without question the issue of data quality cannot be ignored. The need for quality data is highlighted in Carl Bergstrom and Jevin West’s recent book: “Calling Bullshit, the Art of Scepticism in a Data-Driven World”. As the authors note it does not matter how sophisticated analysis is if the data that go into the analysis is flawed. And it does not matter how ‘big’ data are. If the data itself is flawed, then any resulting analysis is also likely to be flawed. And while machine learning is valuable for preparing and analysing data, it is not a panacea. If the data that the machine learning algorithm uses are flawed, then any resulting conclusions are also likely to be flawed.

Many surveys have been conducted since the onset of COVID-19 but some of them should be treated with caution. Surveys that are not based on probability sampling should not be used to make claims about the general population. Surveys where any member of the public is invited to participate are unlikely to provide reliable information as respondents are unlikely to be representative of the population. Having unreliable data is not particularly helpful no matter how timely it is. To be fair, surveys that are not based on probability samples can still provide some useful information on the relationship between factors that influence outcomes for survey respondents.

It is possible to provide timely data using probability sampling as evidenced by the rapid surveys that have been conducted by the ABS. The AIHW recently collaborated with the Centre for Social Research and Methods at the Australian National University to include questions on loneliness and the level of psychological distress using the Life in Australia Panel, managed by the Social Research Centre. Importantly, this panel survey exclusively used random probability-based sampling methods and covered online and offline populations (that is, people who do and do not have access to the internet). In addition, as a panel it is possible to obtain longitudinal data including from the same respondents prior to the spread of COVID-19 which provides richer information than a series of cross-sectional snapshots. Data on psychological distress were collected in April, May and August with further data collections planned for November, 2020.

While COVID-19 has highlighted the need for timely data there is still a need for longer-term data and to remember that not all things change rapidly even in a pandemic. For example, the level of obesity is unlikely to change much, if at all, under the pandemic. In addition, some critical health data are collected through ABS surveys that are not possible to conduct at the moment using the normal face to face approach given the need for physical distancing.

Long-term versus short-term data collection

Every two years the AIHW is required under its legislation to provide a report on Australia’s health that, inter alia, ‘provides statistics and related information concerning the health of the people of Australia’. We released the latest edition of Australia’s health on 23 July 2020. In preparing the report we were conscious of the need to provide material on COVID-19 but this was challenging, because anything we said in July 2020 would quickly become quite dated. Shortly after the report was released the situation in Victoria worsened considerably.

In Australia’s health 2020, we focussed on COVID-19 through a special article on what we knew four months into the pandemic. But we could not change the whole report to focus on COVID-19 as a short-term assessment of events under the pandemic would not have served as a report card on the health of Australians, so by necessity most of the data in the report related to the situation prior to the onset of the pandemic.

The impact of the pandemic on Australia’s health will take years to assess and will be covered in detail when the next report on Australia’s health is released in 2022. Importantly the ‘snapshots’ that are part of Australia’s health will be updated regularly prior to 2022, and in addition the AIHW will move to release data on a timelier basis.

It is sometimes argued that evaluations and some types of data analysis are useless as they may tell you that a program was not effective years after the program was implemented. While these comments can be valid, they are often very misplaced. To give an example, the best international evidence from longitudinal data suggests the main benefits from preschool education are evident when young people reach their teenage years. This sort of finding would never have eventuated if decision makers demanded almost instant data on program effectiveness. In reality, important outcomes are long-term outcomes not short-term changes that are not sustained. That said, some things are evident at the time that things occur while other impacts are longer-term. Analysis of longitudinal data for people who experienced the Black Saturday bushfires suggests that for some people adverse impacts on mental health were still evident after five years.

The pandemic has also highlighted the value of comparable data and the importance of taking context into account. With consistent data based on clear metadata it is possible to make comparisons over time and across areas. When different data definitions are used comparisons becomes harder. International comparisons of simple things like deaths associated with COVID-19 have been vexed because of the different scope of data used across countries.

The use and analysis of data is likely to be permanently changed as a result of COVID-19. The demand for timely data will remain. To give an example, data on deaths by suicide in Australia are normally released over nine months after the relevant calendar year. This means that data for January of the previous year are 21 months old by the time they are released. During the pandemic, 2020 data on suspected suicides has been published by the suicide registers in Victoria and Queensland. The AIHW is working with the jurisdictions that do not currently have suicide registers to help them set up registers so that timely data on suspected suicides is available in all jurisdictions. Timely data from suicide registers does not replace the data published by the ABS each year, it complements it.

The pandemic has also highlighted some weaknesses in existing data systems and will provide a strong impetus for data improvement. Timeliness is important but it is not the only issue that should be addressed. There are major data gaps in several areas, including aged care, and much can be achieved by linking various datasets while maintaining privacy. Timeliness is critical but so is data quality.

For more information of the AIHW’s response to the COVID-19 pandemic, please visit the AIHW website.