Air quality is a critical component of a healthy environment, and understanding it requires good data. But a report by the world’s largest aggregator of open air quality data, OpenAQ, finds that half of the world’s governments do not even generate such statistics. And of those that do, only 38 per cent provide them in real-time. India is one of them.
As of February 2021, real-time data from 276 continuous ambient air quality monitoring stations (CAAQMS) in 145 Indian cities are hosted on a public dashboard maintained by the Central Pollution Control Board (CPCB), the nation’s apex pollution regulating agency. However, accessing the data through CPCB’s dashboard comes with its own set of challenges.
Over the last year, we have archived both real-time data shared by OpenAQ and historical data available on the CPCB’s CAAQM dashboard. In this blog, we run readers through challenges that we, as air quality researchers and data analysts, face while accessing the data and making sense of them. The task is likely harder for non-experts who are interested in this information. So we also suggest ways in which the CPCB can open this dashboard up to audiences with varying levels of computer literacy and understanding of air quality data.
What are the challenges in accessing air quality data from CPCB?
The website is often unresponsive.
A quick Google search on Delhi’s air quality reveals that a number of platforms (aqicn.org, IQAir, Air Matters, Accuweather, AirPollutionApi, etc.) are retrieving data from the CPCB website. The website was not built for the volume of web traffic it has received of late and tends to get unresponsive. The same query has to be made several times to get an output. Users find it harder still to retrieve high-frequency data (like hourly data) for an extended period as the server takes a longer time to respond. This torpid nature of the CPCB portal in recent times has hampered the data acquisition process of communities like OpenAQ. This is concerning, as entities like Urbanemissions.info — a site providing air quality forecasts for all of India—rely on OpenAQ as their mothership to pull data from regulatory monitors and air quality information to their user base.
The current way to access data is somewhat prohibitive
To access historical data for a single location through CPCB’s portal, users have to apply a minimum of eight filters for each query. To understand what this is like, imagine ordering pizza on an app that takes you to a new window to select each topping you want— a process nearly cumbersome enough to make you discard your order. In addition, at any given time, users can only compare pollutant levels across stations within the same city. The current data retrieval page does not provide the functionality of comparing air quality levels across cities.
The data is not without inconsistencies
We have observed that the live/real-time data updated every 15 minutes does not go through any censoring process. Archived data, by contrast, go through QA/QC. Unfortunately, since the CPCB does not provide details about these quality checks on its website, attempts to replicate its analyses of air quality trends could lead to different results depending on the data used.
What could CPCB do to make data access seamless?
Make data accessible to a wide range of people
We suggest that the data access portal's current interface be upgraded to include functionalities that support the needs of a wide range of users, from concerned citizens to experts and data junkies. Integrating quick analytics like inter-city and inter-year comparisons, or the number of unhealthy air quality days in a city in the previous year, will help people assess the air quality in their city and know whether their local administration is doing enough to improve it.
Make live data available on a cloud directory or through an API
The community of researchers, modelers and innovators who use this data has grown tremendously over the years. They analyse pollution trends across cities, calibrate air quality forecasting models, and assess the performance of low-cost air quality sensors. To allow for programmatic data access for such users, we suggest the two following steps. First, the data could be made robust through an API or a cloud directory backed by computing infrastructure that matches the data demand. Second, the archive data could be located as files on a cloud directory partitioned by a logical schema like dates and/or locations.
Provide a description of QA/QC protocols followed in data analysis
Multiple independent agencies are now providing data from regulatory monitors on their websites and phone apps. To ensure that data across platforms is consistent, we suggest that the CPCB lay down QA/QC protocols that independent data aggregators could use at their end. If needed, the CPCB could further refine these protocols through discussions with the atmospheric science and data community.
Having a dedicated platform for viewing government air quality data is a privilege enjoyed by very few countries in the world. The CPCB should now work to make this platform accessible to a broader audience. Democratising air quality data will help build awareness, support research on air quality, ignite curiosity and foster innovation.
Tanushree Ganguly is a Programme Associate and L S Kurinji is a Research Analyst at the Council on Energy, Environment and Water. Gautam Pradhan is the Co-founder of Earthmetry. Send your comments to [email protected].
Add new comment