ArchesWeather: An efficient AI weather forecasting model at 1.5{deg} resolution

Read original: arXiv:2405.14527 - Published 7/4/2024 by Guillaume Couairon, Christian Lessig, Anastase Charantonis, Claire Monteleoni

🤖

Overview

Designing effective AI-based weather forecasting systems requires embedding physical constraints as inductive priors in the neural network architecture.
One popular prior is locality, where the atmospheric data is processed with local neural interactions like 3D convolutions or 3D local attention windows as in Pangu-Weather.
However, some works have achieved great success in weather forecasting without this locality principle, at the cost of a much higher parameter count.
This paper presents ArchesWeather, a transformer model that combines 2D attention with a column-wise attention-based feature interaction module, demonstrating improved forecasting skill over the locality-based Pangu-Weather approach.

Plain English Explanation

Weather forecasting is a complex task that relies on understanding the physical processes governing the atmosphere. One way to design effective AI-based weather forecasting systems is to incorporate these physical constraints as inductive priors into the neural network architecture. A common prior is the idea of locality, where the atmospheric data is processed using local neural interactions, such as 3D convolutions or 3D local attention windows, as seen in the Pangu-Weather model.

However, the authors of this paper found that this 3D local processing in Pangu-Weather is computationally sub-optimal. Instead, they developed a new model called ArchesWeather, which uses a transformer architecture that combines 2D attention with a column-wise attention-based feature interaction module. This design, they argue, improves the forecasting skill compared to the locality-based approach.

ArchesWeather is trained using a relatively small budget of a few GPU-days and has a lower inference cost than competing methods. The authors show that an ensemble of two of their best ArchesWeather models can achieve competitive performance with the IFS HRES model and outperforms the 1.4-degree 50-member NeuralGCM ensemble for one-day-ahead forecasting.

Technical Explanation

The paper presents ArchesWeather, a transformer-based model that aims to improve upon the locality-based approach used in the Pangu-Weather model. The authors argue that the 3D local processing in Pangu-Weather is computationally sub-optimal and instead design a new architecture that combines 2D attention with a column-wise attention-based feature interaction module.

ArchesWeather is trained at a 1.5-degree resolution and 24-hour lead time, with a training budget of a few GPU-days and a lower inference cost than competing methods. The authors evaluate the performance of ArchesWeather and show that an ensemble of two of their best models achieves competitive RMSE scores with the IFS HRES model and outperforms the 1.4-degree 50-member NeuralGCM ensemble for one-day-ahead forecasting.

The authors make their code and models publicly available at https://github.com/gcouairon/ArchesWeather.

Critical Analysis

The paper presents a compelling approach to improving weather forecasting by rethinking the architectural design of the neural network. The authors' insight that the 3D local processing in Pangu-Weather is computationally sub-optimal and their exploration of an alternative transformer-based architecture with column-wise attention-based feature interaction is a valuable contribution to the field.

However, the paper does not delve into the potential limitations or caveats of the ArchesWeather approach. For example, it would be interesting to understand how the model's performance scales with different resolutions or lead times, or how it might handle extreme weather events that require more complex physical reasoning.

Additionally, the authors' claims about the computational efficiency of ArchesWeather compared to competing methods could be further explored and substantiated, perhaps by providing more detailed analysis or comparisons of inference times and memory usage.

Overall, the paper presents a solid technical contribution, but there may be opportunities for the authors to dig deeper into the nuances and potential limitations of their approach, which could lead to even more valuable insights for the weather forecasting research community.

Conclusion

This paper introduces ArchesWeather, a transformer-based model for weather forecasting that combines 2D attention with a column-wise attention-based feature interaction module. The authors demonstrate that this architecture can outperform the locality-based Pangu-Weather model, while maintaining a lower computational budget and inference cost.

The ArchesWeather approach represents an important advancement in the design of AI-based weather forecasting systems, as it challenges the prevailing assumption that embedding physical constraints as inductive priors is necessary for effective forecasting. The authors' work suggests that alternative architectural choices can lead to improved performance, opening up new avenues for exploration in this critical field.

By making their code and models publicly available, the authors are contributing to the broader research community and enabling further advancements in weather forecasting technology. As the field continues to evolve, the insights and approaches presented in this paper are likely to have a lasting impact on the development of more accurate, efficient, and robust AI-powered weather forecasting systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤖

ArchesWeather: An efficient AI weather forecasting model at 1.5{deg} resolution

Guillaume Couairon, Christian Lessig, Anastase Charantonis, Claire Monteleoni

One of the guiding principles for designing AI-based weather forecasting systems is to embed physical constraints as inductive priors in the neural network architecture. A popular prior is locality, where the atmospheric data is processed with local neural interactions, like 3D convolutions or 3D local attention windows as in Pangu-Weather. On the other hand, some works have shown great success in weather forecasting without this locality principle, at the cost of a much higher parameter count. In this paper, we show that the 3D local processing in Pangu-Weather is computationally sub-optimal. We design ArchesWeather, a transformer model that combines 2D attention with a column-wise attention-based feature interaction module, and demonstrate that this design improves forecasting skill. ArchesWeather is trained at 1.5{deg} resolution and 24h lead time, with a training budget of a few GPU-days and a lower inference cost than competing methods. An ensemble of four of our models shows better RMSE scores than the IFS HRES and is competitive with the 1.4{deg} 50-members NeuralGCM ensemble for one to three days ahead forecasting. Our code and models are publicly available at https://github.com/gcouairon/ArchesWeather.

7/4/2024

👁️

CaFA: Global Weather Forecasting with Factorized Attention on Sphere

Zijie Li, Anthony Zhou, Saurabh Patil, Amir Barati Farimani

Accurate weather forecasting is crucial in various sectors, impacting decision-making processes and societal events. Data-driven approaches based on machine learning models have recently emerged as a promising alternative to numerical weather prediction models given their potential to capture physics of different scales from historical data and the significantly lower computational cost during the prediction stage. Renowned for its state-of-the-art performance across diverse domains, the Transformer model has also gained popularity in machine learning weather prediction. Yet applying Transformer architectures to weather forecasting, particularly on a global scale is computationally challenging due to the quadratic complexity of attention and the quadratic increase in spatial points as resolution increases. In this work, we propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. More specifically, it utilizes multi-dimensional factorized kernels that convolve over different axes where the computational complexity of the kernel is only quadratic to the axial resolution instead of overall resolution. The deterministic forecasting accuracy of the proposed model on $1.5^circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models. We also showcase the proposed model holds great potential to push forward the Pareto front of accuracy-efficiency for Transformer weather models, where it can achieve better accuracy with less computational cost compared to Transformer based models with standard attention.

5/14/2024

🔎

Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets prevent these models from forecasting at finer time scales. This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales beyond training dataset. Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale (e.g., 300 seconds) and use a parallel neural networks with a learnable router for bias correction. Furthermore, we introduce a lead time-aware training framework to promote the generalization of the model at different lead times. The weight analysis of physics-AI modules indicates that physics conducts major evolution while AI performs corrections adaptively. Extensive experiments show that WeatherGFT trained on an hourly dataset, achieves state-of-the-art performance across multiple lead times and exhibits the capability to generalize 30-minute forecasts.

5/30/2024

Lightning-Fast Thunderstorm Warnings: Predicting Severe Convective Environments with Global Neural Weather Models

Monika Feldmann, Tom Beucler, Milton Gomez, Olivia Martius

Severe convective storms are among the most dangerous weather phenomena and accurate forecasts mitigate their impacts. The recently released suite of AI-based weather models produces medium-range forecasts within seconds, with a skill similar to state-of-the-art operational forecasts for variables on single levels. However, predicting severe thunderstorm environments requires accurate combinations of dynamic and thermodynamic variables and the vertical structure of the atmosphere. Advancing the assessment of AI-models towards process-based evaluations lays the foundation for hazard-driven applications. We assess the forecast skill of three top-performing AI-models for convective parameters at lead-times of up to 10 days against reanalysis and ECMWF's operational numerical weather prediction model IFS. In a case study and seasonal analyses, we see the best performance by GraphCast and Pangu-Weather: these models match or even exceed the performance of IFS for instability and shear. This opens opportunities for fast and inexpensive predictions of severe weather environments.

9/11/2024