Data science aims to find next El Niño

Multi-university collaboration will use climate data analysis to improve regional forecasts

The El Niño/La Niña pattern in the Pacific Ocean is notorious for its long-distance effects on weather as far away as Africa and the Midwestern United States. But climate experts also know of several other such patterns, known as “teleconnections,” and believe that there are many more to be discovered.

The new TRIPODS+Climate project, a collaboration among the University of Chicago, University of Wisconsin-Madison and the University of California-Irvine, will develop novel data science tools to sniff out these hidden patterns, improving weather forecasts and scientific understanding of global climate. Researchers will apply data science methods such as machine learning, network analysis and predictive modeling to the growing flood of climate data.

“There are fundamental challenges pervasive in data science that are epitomized in the climate science setting, making this collaboration a nice opportunity for advances on a number of fronts,” said Rebecca Willett, professor of computer science and statistics at UChicago. “The question really is: Can we find some middle ground that's going to allow us to harness climate data as fully as possible without ignoring existing physical models of climate?”

While El Niño, formally known as the El Niño–Southern Oscillation, is the best known climate teleconnection, scientists have found many similar patterns in the Pacific and Atlantic oceans. For example, TRIPODS+Climate co-investigators at UC-Irvine led by Prof. Efi Foufoula-Georgiou recently found that sea temperature changes near the coast of New Zealand strongly predict precipitation changes three months later and thousands of miles away in the southwestern United States.

But despite an unprecedented increase in the volume and resolution of climate observations, these phenomena are difficult to detect in the data. Researchers working with high-dimensional and noisy data must spot complex relationships across geography and time while ruling out spurious correlations and other false positives. Enter data science—the modern intersection of mathematics, statistics and computer science.

“Interrogating observations and climate model outputs to discover, characterize and understand climate modes of variability and change is fundamental for improving seasonal to sub-seasonal forecasts,” said Foufoula-Georgiou. “However, the large internal variability of the climate system, non-stationarities and space-time dependencies make it hard to discern causal predictive relationships.”

New climate models

TRIPODS+Climate will create new methodologies in machine learning and network estimation that reveal the structure of the Earth’s climate system and its regional hydroclimatic impacts. Machine learning, where statistical algorithms use large datasets to detect patterns and make predictions, can be used to find teleconnections previously hidden from human observation. Network estimation methods can mathematically conceptualize global climate as an interconnected structure of nodes, so that scientists can better quantify and understand complex influences across geography and time.

"Data science techniques are especially useful for sifting through massive troves of data to discover unexpected relationships between events,” said Wright, professor of computer sciences at UW-Madison. “We have seen examples of this phenomenon in the relationships between genetics, environment and disease. Climate science is an area in which very large collections of data are ready and waiting to be analyzed."

These tools will then be used to build new computational climate models and create new platforms for climate diagnostics and prognostics, improving seasonal and subseasonal forecasts. More accurate predictions will help scientists and policymakers understand and prepare for climate change, extreme weather events, and water allocation under conditions of high or low precipitation.

Like the other TRIPODS+X programs announced today by the NSF, TRIPODS+Climate also will strengthen the broader data science community by training students and post-docs at the interface of data and climate science.

“This project will help spread the influence of modern data science through the climate community, and put young data science researchers in touch with a critical area of research that is a rich source of data analysis problems,” Willett said.

The collaboration is an expansion of the National Science Foundation TRIPODS program, which funded several research centers in 2017 to explore the fundamentals of data science, including Wright and Willett’s Institute for Foundations of Data Science.