ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Read original: arXiv:2402.00712 - Published 9/30/2024 by Juan Nathaniel, Yongquan Qu, Tung Nguyen, Sungduk Yu, Julius Busecke, Aditya Grover, Pierre Gentine

🔮

Overview

Accurate prediction of climate beyond the weather timescale is crucial for disaster preparedness and decision-making, but challenging due to complex factors beyond initial conditions.
Existing benchmarks have limited forecasting ranges and lack physics-based constraints for explainability.
The paper proposes "ChaosBench", a benchmark to extend the predictability range of data-driven weather emulators to the subseasonal-to-seasonal (S2S) timescale.

Plain English Explanation

Predicting the climate beyond the short-term weather forecast is incredibly important for preparing for and responding to natural disasters and making informed decisions in the face of climate change. However, this is a very difficult challenge. Weather and climate are influenced by a complex web of factors, including not just the initial conditions, but also how different parts of the Earth's system like the oceans, ice, and land interact.

Existing benchmarks and evaluations of climate prediction models have typically focused on shorter timescales of up to 15 days. They also haven't included a wide range of different types of forecasting models as baselines, and haven't done a good job of ensuring the models produce physically realistic and explainable results.

To address these limitations, the researchers propose a new benchmark called "ChaosBench". This benchmark includes a much broader range of Earth system data, spanning over 45 years, to allow models to learn how the full climate system behaves. It also includes physics-based metrics, in addition to standard statistical measures, to ensure the models are producing physically consistent and explainable predictions, rather than just correlating with past data.

The researchers evaluate several state-of-the-art data-driven weather and climate models on this new benchmark, and find that methods developed for short-term weather forecasting struggle to maintain their performance when extended to the longer subseasonal-to-seasonal timescale. However, the researchers outline some promising strategies, like using ensemble predictions and better controlling error propagation, that could help extend the predictability range of these models.

Technical Explanation

The ChaosBench benchmark proposed in this paper aims to push the boundaries of data-driven weather and climate modeling beyond the typical 15-day forecasting range. It includes a much broader set of Earth system variables, spanning over 45 years, to allow models to learn the complex interactions between the atmosphere, oceans, ice, and land.

Crucially, ChaosBench includes not just deterministic and probabilistic forecast evaluation metrics, but also physics-based metrics. This ensures the models are producing predictions that are physically consistent and explainable, rather than just correlating with past data. The paper evaluates several state-of-the-art data-driven models, including ClimaX, PanguWeather, GraphCast, and FourCastNetV2, against a diverse set of physics-based forecast models from national weather agencies.

The results show that methods developed for short-term weather forecasting struggle to maintain their performance when extended to the longer subseasonal-to-seasonal timescale, often collapsing to an unskilled climatology. However, the researchers outline several promising strategies, such as the use of ensembles and better control of error propagation, that could potentially extend the predictability range of these weather emulators.

Critical Analysis

The ChaosBench benchmark presented in this paper is a valuable contribution to the field of climate and weather forecasting. By including a broader range of Earth system variables and physics-based evaluation metrics, it pushes the limits of what data-driven models can achieve beyond the typical weather forecasting timescale.

That said, the paper acknowledges several limitations and areas for future work. For example, the benchmark is still limited to reanalysis data, which may not fully capture the complexity of the real-world climate system. Additionally, the physics-based metrics used in the evaluation, while an important step forward, may not be sufficient to fully capture all the relevant physical constraints.

Furthermore, while the researchers outline some promising strategies for improving the performance of data-driven weather emulators on the subseasonal-to-seasonal timescale, more research is needed to fully realize the potential of these approaches. Hybrid models that combine data-driven and physics-based components, as explored in ClimODE, could be a fruitful direction for further investigation.

Overall, the ChaosBench benchmark represents an important step forward in pushing the boundaries of climate and weather forecasting, but there is still much work to be done to achieve accurate and reliable predictions at the subseasonal-to-seasonal timescale.

Conclusion

The paper proposes the ChaosBench benchmark as a way to extend the predictability range of data-driven weather emulators to the subseasonal-to-seasonal timescale. By including a broader range of Earth system variables and physics-based evaluation metrics, ChaosBench aims to push the limits of what these models can achieve.

The results show that existing methods developed for short-term weather forecasting struggle to maintain their performance on the longer-range task, highlighting the need for new approaches. The researchers outline several promising strategies, such as the use of ensembles and better control of error propagation, that could potentially help extend the predictability range of these models.

Overall, the ChaosBench benchmark represents an important step forward in the field of climate and weather forecasting, but more research is needed to fully realize the potential of data-driven approaches for predicting the climate beyond the typical weather timescale.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔮

ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Juan Nathaniel, Yongquan Qu, Tung Nguyen, Sungduk Yu, Julius Busecke, Aditya Grover, Pierre Gentine

Accurate prediction of climate in the subseasonal-to-seasonal scale is crucial for disaster preparedness and robust decision making amidst climate change. Yet, forecasting beyond the weather timescale is challenging because it deals with problems other than initial condition, including boundary interaction, butterfly effect, and our inherent lack of physical understanding. At present, existing benchmarks tend to have shorter forecasting range of up-to 15 days, do not include a wide range of operational baselines, and lack physics-based constraints for explainability. Thus, we propose ChaosBench, a challenging benchmark to extend the predictability range of data-driven weather emulators to S2S timescale. First, ChaosBench is comprised of variables beyond the typical surface-atmospheric ERA5 to also include ocean, ice, and land reanalysis products that span over 45 years to allow for full Earth system emulation that respects boundary conditions. We also propose physics-based, in addition to deterministic and probabilistic metrics, to ensure a physically-consistent ensemble that accounts for butterfly effect. Furthermore, we evaluate on a diverse set of physics-based forecasts from four national weather agencies as baselines to our data-driven counterpart such as ViT/ClimaX, PanguWeather, GraphCast, and FourCastNetV2. Overall, we find methods originally developed for weather-scale applications fail on S2S task: their performance simply collapse to an unskilled climatology. Nonetheless, we outline and demonstrate several strategies that can extend the predictability range of existing weather emulators, including the use of ensembles, robust control of error propagation, and the use of physics-informed models. Our benchmark, datasets, and instructions are available at https://leap-stc.github.io/ChaosBench.

9/30/2024

FuXi-S2S: A machine learning model that outperforms conventional global subseasonal forecast models

Lei Chen, Xiaohui Zhong, Hao Li, Jie Wu, Bo Lu, Deliang Chen, Shangping Xie, Qingchen Chao, Chensen Lin, Zixin Hu, Yuan Qi

Skillful subseasonal forecasts are crucial for various sectors of society but pose a grand scientific challenge. Recently, machine learning based weather forecasting models outperform the most successful numerical weather predictions generated by the European Centre for Medium-Range Weather Forecasts (ECMWF), but have not yet surpassed conventional models at subseasonal timescales. This paper introduces FuXi Subseasonal-to-Seasonal (FuXi-S2S), a machine learning model that provides global daily mean forecasts up to 42 days, encompassing five upper-air atmospheric variables at 13 pressure levels and 11 surface variables. FuXi-S2S, trained on 72 years of daily statistics from ECMWF ERA5 reanalysis data, outperforms the ECMWF's state-of-the-art Subseasonal-to-Seasonal model in ensemble mean and ensemble forecasts for total precipitation and outgoing longwave radiation, notably enhancing global precipitation forecast. The improved performance of FuXi-S2S can be primarily attributed to its superior capability to capture forecast uncertainty and accurately predict the Madden-Julian Oscillation (MJO), extending the skillful MJO prediction from 30 days to 36 days. Moreover, FuXi-S2S not only captures realistic teleconnections associated with the MJO, but also emerges as a valuable tool for discovering precursor signals, offering researchers insights and potentially establishing a new paradigm in Earth system science research.

7/8/2024

📈

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

Elena Orlova, Haokun Liu, Raphael Rossellini, Benjamin A. Cash, Rebecca Willett

Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and two-meter temperature two weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multi-model approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability.

9/17/2024

DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation

Wuxin Wang, Weicheng Ni, Tao Han, Lei Bai, Boheng Duan, Kaijun Ren

Recent advancements in deep learning (DL) have led to the development of several Large Weather Models (LWMs) that rival state-of-the-art (SOTA) numerical weather prediction (NWP) systems. Up to now, these models still rely on traditional NWP-generated analysis fields as input and are far from being an autonomous system. While researchers are exploring data-driven data assimilation (DA) models to generate accurate initial fields for LWMs, the lack of a standard benchmark impedes the fair evaluation among different data-driven DA algorithms. Here, we introduce DABench, a benchmark dataset utilizing ERA5 data as ground truth to guide the development of end-to-end data-driven weather prediction systems. DABench contributes four standard features: (1) sparse and noisy simulated observations under the guidance of the observing system simulation experiment method; (2) a skillful pre-trained weather prediction model to generate background fields while fairly evaluating the impact of assimilation outcomes on predictions; (3) standardized evaluation metrics for model comparison; (4) a strong baseline called the DA Transformer (DaT). DaT integrates the four-dimensional variational DA prior knowledge into the Transformer model and outperforms the SOTA in physical state reconstruction, named 4DVarNet. Furthermore, we exemplify the development of an end-to-end data-driven weather prediction system by integrating DaT with the prediction model. Researchers can leverage DABench to develop their models and compare performance against established baselines, which will benefit the future advancements of data-driven weather prediction systems. The code is available on this Github repository and the dataset is available at the Baidu Drive.

8/22/2024