New tool created by UChicago Data Science Institute sheds light on palm oil production

PalmWatch tracks deforestation by palm oil mills and connections to major multinational brands

Palm oil is used in a plethora of household products, from food items like packaged pastries and chips to cosmetics and soaps or even biofuels. But most palm oil is produced on mono-crop plantations, grown on huge tracts of land that were once tropical rainforests and other biodiverse ecosystems.

Mapping the links between palm oil mills, multinational corporations, and future deforestation risk is a difficult data science problem to solve, but the University of Chicago Data Science Institute and Inclusive Development International (IDI) have created a new tool to help fill gaps in understanding the problem.

The DSI and the IDI, with support from the 11th Hour Project, launched a new tool called PalmWatch on Feb. 22. Using rigorous data science and advanced, low-cost data visualization methods, PalmWatch traces palm oil supplies from the ground level, where the environmental and social impacts of palm oil cultivation occur, to the consumer brands that use the oil in their products.

“This launch of the PalmWatch tool has been a long time coming,” said David Uminsky, executive director of the Data Science Institute at the University of Chicago. “This has all the hallmarks of a great data science problem.”

“I’m very excited that this dashboard will be owned by local communities and nonprofits working in the space,” said Launa Greer, a software engineer at the DSI. “Previously, investigating the effects of palm oil supply chains was a laborious process; now groups will have analytics at their fingertips.”

Connecting data sources

In an effort to increase transparency, multinational brands do currently report the palm oil mills from which they source their material. However, creating a repository that sorts and organizes mills across the world requires collecting and standardizing this information. And even with this information, it takes additional computational methods to understand how each mill impacts local deforestation risks.

The PalmWatch project began as part of the Data Science Clinic, an experiential project-based course where students work as data scientists under the supervision of DSI staff and faculty.

To build the tool, DSI’s 11th Hour Project, led by Open Spatial Lab technical lead Dylan Halpern, first had to scrape public disclosures from thirteen multinational consumer brands that show which mills these brands source from.

This information then had to be standardized, with the palm oil mills geolocated on a searchable map. The data scientists also had to collect information about the mills, such as which companies own and operate them, which consumer brands they are affiliated with, and their RSPO certification status (a metric measuring sustainability of palm oil production).

Collecting the information was a challenge, said Greer. “Disclosures were typically located on obscure corners of the websites and difficult to scrape for information due to wildly-varying PDF layouts,” she said. “We hope that making a clean, consolidated, and machine readable dataset of mills available to the public will accelerate similar supply-chain research efforts.”

Built with future-proofing in mind

Making sure that PalmWatch would be cheap to maintain and easy to update was a vital part of the process to ensure the website will continue to be a useful investigative tool. PalmWatch was built to not require heavy computation that can add up in costs to web hosts over time.

“Ongoing funding for community-centered data science projects is not always guaranteed, so it’s important to architect software that is cheap to own in the long term,” said DSI’s Open Spatial Lab technical lead Dylan Halpern. “It’s tragic to see fantastic software engineering and community-engaged data science fade away from public view due simply to a server bill.”

Full data files are available for public download. “We realized early on that palm oil production impacts each part of the world in a unique way; we integrated a collaborative content management system so that local advocates can add critical context, news, legal briefings, and other local knowledge to PalmWatch at every level—mill, country, consumer brand, and everything in between,” said Halpern.

The development team has future plans for additional updates, including a data pipeline github, a disclosure contribution guide, and plans to offer hands-on training to social impact organizations and journalists who want to dig deeper into specific data questions.

Adapted from an article first published by the Data Science Institute.