Satellite imagery and other unconventional data sources may help infer fast and inexpensive estimation of the country’s socioeconomic indicators.
This is according to Thinking Machines Data Science Inc. founder and CEO Stephanie Sy in her presentation during the webinar titled “Smart Systems for Agile Governance under the New Normal” organized by state think tank Philippine Institute for Development Studies (PIDS).
The Philippine Statistics Authority conducts the National Demographic and Health Survey (NDHS) every four to five years to “provide up-to-date estimates of basic demographic and health indicators”. These data can be used by policymakers and other experts in “designing and evaluating programs and strategies for improving the health of the country’s population”.
According to Sy, they “combined cost-efficient machine learning with freely accessible geospatial information as a fast, low-cost, and scalable means of providing poverty estimates” in a series of studies they conducted.
Specifically, they used free and openly available satellite images and other datasets from websites such as Google Earth Engine, Facebook, and OpenStreetMap to estimate poverty indicators derived from the 2017 NDHS. These sources, she said, provided “faster, cheaper, and more granular reconstruction of poverty measures”.
“We use technology to infer useful data for areas where surveys are not feasible,” Sy explained.
She said that they used satellite imagery or images of earth captured by satellites to predict the country’s wealth.
For instance, they utilized nighttime lights as a proxy for economic development, wherein the intensity level indicates the household wealth index. Sy said that areas that appear brighter at night are assumed to be wealthier.
Aside from satellite imagery, they also relied on unconventional digital datasets such as Facebook marketing data, OpenStreetMap Data, and CheckMySchool Data to determine the wealth index. They used these public datasets to evaluate different machine learning models and determine the ones that can best predict socioeconomic indicators.
Sy claimed that the Random Forest Regression performed best in terms of predicting wealth. However, she pointed out that while this model can predict the household wealth index, exploratory analysis results showed that “it does not generalize well with other socioeconomic indicators,” such as educational attainment.
Further, she noted that there is data scarcity in the use of these methods as they require “many labeled training data for an end-to-end deep learning approach”.
Sy also mentioned that unconventional data sources could gather socioeconomic indicators to support ground-truth studies but “will never replace demographic and health surveys”.
“Machine-learning models and ground-truth surveys should be complementary. They should never be treated as replacements for each other,” she concluded.