New air pollution detection model promises more accurate picture
Air pollutants that have previously escaped monitoring can now be detected through a new method developed at the University of East Anglia (UEA), promising a more complete picture for people and policy development.
Currently, air quality assessments are based on high levels of uncertainty, which may lead to flawed policy decisions and poor health outcomes. The daily air quality index (DAQI), which is used to inform people of the amount of air pollution present each day, is calculated based on the concentrations of measured pollutants only, which may not reflect the actual air pollution.
Researchers from UEA's School of Computing Sciences and School of Environmental Sciences analyzed the measurements taken at air pollution measuring stations around the UK, to design statistical models to calculate missing pollutant concentrations.
Lead researcher, Dr. Wedad Alahamade, said the current DAQI "may miss some significant episodes where some pollutants are not measured at all stations.
"If the stations in a geographical area do not record all pollutants, they may miss the presence of large amounts a particular pollutant, hence health warnings may not be accurate.
"Our proposed model enables a more complete picture of air quality and has enabled us to discover some episodes of air pollution that were registered in some stations but missed in others because data was not being collected in those stations.
"We can estimate data values where they have not been measured, so that we may capture such pollution events. It may also give us some understanding of where further measurements are required."
The DAQI provides recommended actions and health advice in relation to air pollution. For example, it can be used by at-risk individuals to decide whether they should be doing strenuous activities outdoors.
There are 285 air quality monitoring sites across the UK, which are part of several types of networks with different objectives and coverage.
The UEA study collected data from monitoring stations called the Automatic Urban and Rural Network (AURN), between 2015 and 2018. The instruments used in this network are automated and produce hourly pollutant concentrations. These data are collected, stored and made directly available online. The 169 stations in this network are categorized into rural, urban, suburban background, roadside, or industrial.
In the U.K., the main four pollutants that are used to assess the quality of the air are ozone (O3), nitrogen dioxide (NO2), particulate matter less than 2.5µm in diameter (PM2.5) or less than 10µm in diameter (PM10), and sulfur dioxide SO2.
The study focuses on the first four main pollutants and ignores sulfur dioxide (SO2). That is because the UK met the current emission ceiling for sulfur dioxide between 2010 to 2019 due to the closure of coal plants and the restrictions on the sulfur content of fuels.
These pollutants are measured at monitoring stations and the concentrations of each pollutant become a time series (TS) requiring further transformation and analysis to produce air quality assessments.
Air quality is quantified using the DAQI, which is calculated using the concentrations of NO2, O3, PM2.5, and PM10. This index is numbered from 1–10 and divided into four bands: 'low' (1–3); 'moderate' (4–6); 'high' (7–9); and 'very high' (10). An index value is initially assigned for each pollutant depending on its measured concentration.
But Dr. Alahamade said, "Not all the stations report all the pollutants and even if a station does, it may not measure a particular pollutant all the time due to instrument down-time. Together this results in high levels of missing data.
"What makes the air pollution data analysis more complex is that pollutants have different behaviors and seasonal variation. Adding to that, pollution can be emitted from various sources and be involved in different chemical reactions and so their concentrations exhibit different temporal and spatial distributions.
"We aim to provide a DAQI that is more realistic. As DAQI is calculated from observed data only, it may give a false representation of the air quality—for example, if there were high concentrations of an air pollutant that was not being measured, the air quality may be worse than indicated by the DAQI.
"The tools that we have used in this study, from the data science domain, prove that data science is advancing to provide very interesting answers to multi-disciplinary problems."
The study, "A Multi-variate Time Series clustering approach based on Intermediate Fusion: A case study in air pollution data imputation," is published in the journal Neurocomputing.