Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps






Published 6/17/2024 by Jian Chen, Peilin Zhou, Yining Hua, Dading Chong, Meng Cao, Yaowei Li, Zixuan Yuan, Bing Zhu, Junwei Liang
Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps


Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, thereby introducing a more precise and automated solution. Leveraging Vision-Language Models (VLM) to simultaneously process visual and textual data, we offer an effective aid to enhance the analysis process of weather heatmaps. Our initial assessment of general-purpose VLMs (e.g., GPT-4-Vision) on EWED revealed poor performance, characterized by low accuracy and frequent hallucinations due to inadequate color differentiation and insufficient meteorological knowledge. To address these challenges, we introduce ClimateIQA, the first meteorological VQA dataset, which includes 8,760 wind gust heatmaps and 254,040 question-answer pairs covering four question types, both generated from the latest climate reanalysis data. We also propose Sparse Position and Outline Tracking (SPOT), an innovative technique that leverages OpenCV and K-Means clustering to capture and depict color contours in heatmaps, providing ClimateIQA with more accurate color spatial location information. Finally, we present Climate-Zoo, the first meteorological VLM collection, which adapts VLMs to meteorological applications using the ClimateIQA dataset. Experiment results demonstrate that models from Climate-Zoo substantially outperform state-of-the-art general VLMs, achieving an accuracy increase from 0% to over 90% in EWED verification. The datasets and models in this study are publicly available for future climate science research:

Create account to get full access


If you already have an account, we'll log you in


  • This paper explores the use of vision-language models for detecting extreme weather events from satellite imagery.
  • The researchers develop a novel model that can generate heatmaps to highlight regions of extreme weather activity within an image.
  • The model is evaluated on several benchmark datasets and demonstrated to outperform existing approaches for extreme weather detection.

Plain English Explanation

The paper describes a new AI system that can analyze satellite images to detect and highlight areas of extreme weather, such as hurricanes, tornadoes, or severe thunderstorms. This is an important task for meteorologists and disaster response teams to quickly identify regions that may be at risk or experiencing dangerous conditions.

The key innovation is the use of a "vision-language model" - an AI system that combines skills in computer vision (analyzing images) and natural language processing (understanding text). This allows the model to not only recognize weather patterns in the satellite imagery, but also reason about the potential impacts and severity of the detected events using relevant meteorological concepts.

The model generates a "heatmap" overlay on the satellite image, where areas of intense weather activity are highlighted in bright colors. This provides an intuitive way for meteorologists to quickly assess the situation and determine where to focus their attention and response efforts.

The researchers tested their model on several established datasets of extreme weather events and found that it outperformed previous state-of-the-art systems. This suggests the vision-language approach can be a powerful tool for automating and improving extreme weather detection from satellite data.

Technical Explanation

The paper introduces a novel vision-language model for detecting and localizing extreme weather events in satellite imagery. The model, called "ExtremeVision", combines a convolutional neural network (CNN) image encoder with a transformer-based language model to jointly reason about visual patterns and meteorological concepts.

The CNN component extracts visual features from the input satellite image, while the language model encodes relevant meteorological knowledge to guide the identification of extreme weather signatures. The two components are trained end-to-end on labeled datasets of satellite imagery with bounding box annotations for extreme events.

During inference, the model generates a heatmap overlay on the input image, highlighting the regions that are most likely to contain extreme weather activity. This heatmap output allows for precise localization of the detected events, which is critical for applications like disaster response and risk assessment.

The researchers evaluate ExtremeVision on several public benchmark datasets, including ExtremeWeather and WeatherBench. The results show that their vision-language approach outperforms previous state-of-the-art models for extreme weather detection, demonstrating the value of combining visual and textual understanding for this task.

Critical Analysis

The paper presents a compelling approach to extreme weather detection, but there are a few potential limitations and areas for further research:

  1. The model's performance is still dependent on the quality and quantity of the training data. Expanding the diversity of the datasets, especially for rare or emerging extreme weather phenomena, could further improve the model's generalization capabilities.

  2. While the heatmap visualization is useful for localization, the paper does not explore how the model's outputs could be integrated into practical meteorological decision-support systems. Weatherproof, a related work, demonstrates the potential for language-guided weather analysis.

  3. The paper does not provide a detailed analysis of the model's interpretability - i.e., how the combined vision-language reasoning process leads to the final heatmap predictions. Improving the transparency of the model's inner workings could build trust and facilitate further research and real-world applications.

Overall, the research presented in this paper represents an important step forward in using advanced AI techniques to enhance extreme weather monitoring and response capabilities. Continued exploration of vision-language models in the meteorological domain holds promise for validating deep learning weather forecast models and other valuable applications.


This paper introduces a novel vision-language model for detecting and localizing extreme weather events in satellite imagery. The model, called ExtremeVision, combines a convolutional neural network for visual feature extraction with a transformer-based language model to incorporate meteorological knowledge and reasoning.

The key contribution of the work is the generation of heatmap outputs that highlight the regions within a satellite image that are most likely to contain extreme weather activity. This localization capability is crucial for applications such as disaster response and risk assessment.

Experimental results on several benchmark datasets demonstrate the superiority of the vision-language approach over previous state-of-the-art methods for extreme weather detection. While the paper highlights the potential of this technology, further research is needed to address limitations around data diversity, model interpretability, and real-world integration.

Overall, the research presented in this paper represents an important step forward in leveraging advanced AI techniques to enhance our understanding and monitoring of extreme weather events, with significant implications for improving meteorological forecasting and disaster preparedness.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers


WeatherQA: Can Multimodal Language Models Reason about Severe Weather?

Chengqian Ma, Zhanxiang Hua, Alexandra Anderson-Frey, Vikram Iyer, Xin Liu, Lianhui Qin





Severe convective weather events, such as hail, tornadoes, and thunderstorms, often occur quickly yet cause significant damage, costing billions of dollars every year. This highlights the importance of forecasting severe weather threats hours in advance to better prepare meteorologists and residents in at-risk areas. Can modern large foundation models perform such forecasting? Existing weather benchmarks typically focus only on predicting time-series changes in certain weather parameters (e.g., temperature, moisture) with text-only features. In this work, we introduce WeatherQA, the first multimodal dataset designed for machines to reason about complex combinations of weather parameters (a.k.a., ingredients) and predict severe weather in real-world scenarios. The dataset includes over 8,000 (multi-images, text) pairs for diverse severe weather events. Each pair contains rich information crucial for forecasting -- the images describe the ingredients capturing environmental instability, surface observations, and radar reflectivity, and the text contains forecast analyses written by human experts. With WeatherQA, we evaluate state-of-the-art vision language models, including GPT4, Claude3.5, Gemini-1.5, and a fine-tuned Llama3-based VLM, by designing two challenging tasks: (1) multi-choice QA for predicting affected area and (2) classification of the development potential of severe convection. These tasks require deep understanding of domain knowledge (e.g., atmospheric dynamics) and complex reasoning over multimodal data (e.g., interactions between weather parameters). We show a substantial gap between the strongest VLM, GPT4o, and human reasoning. Our comprehensive case study with meteorologists further reveals the weaknesses of the models, suggesting that better training and data integration are necessary to bridge this gap. WeatherQA link:

Read more


Validating Deep-Learning Weather Forecast Models on Recent High-Impact Extreme Events

Validating Deep-Learning Weather Forecast Models on Recent High-Impact Extreme Events

Olivier C. Pasche, Jonathan Wider, Zhongwei Zhang, Jakob Zscheischler, Sebastian Engelke





The forecast accuracy of deep-learning-based weather prediction models is improving rapidly, leading many to speak of a second revolution in weather forecasting. With numerous methods being developed, and limited physical guarantees offered by deep-learning models, there is a critical need for comprehensive evaluation of these emerging techniques. While this need has been partly fulfilled by benchmark datasets, they provide little information on rare and impactful extreme events, or on compound impact metrics, for which model accuracy might degrade due to misrepresented dependencies between variables. To address these issues, we compare deep-learning weather prediction models (GraphCast, PanguWeather, FourCastNet) and ECMWF's high-resolution forecast (HRES) system in three case studies: the 2021 Pacific Northwest heatwave, the 2023 South Asian humid heatwave, and the North American winter storm in 2021. We find evidence that machine learning (ML) weather prediction models can locally achieve similar accuracy to HRES on record-shattering events such as the 2021 Pacific Northwest heatwave and even forecast the compound 2021 North American winter storm substantially better. However, extrapolating to extreme conditions may impact machine learning models more severely than HRES, as evidenced by the comparable or superior spatially- and temporally-aggregated forecast accuracy of HRES for the two heatwaves studied. The ML forecasts also lack variables required to assess the health risks of events such as the 2023 South Asian humid heatwave. Generally, case-study-driven, impact-centric evaluation can complement existing research, increase public trust, and aid in developing reliable ML weather prediction models.

Read more


Save It for the Hot Day: An LLM-Empowered Visual Analytics System for Heat Risk Management

Save It for the Hot Day: An LLM-Empowered Visual Analytics System for Heat Risk Management

Haobo Li, Wong Kam-Kwai, Yan Luo, Juntong Chen, Chengzhong Liu, Yaxuan Zhang, Alexis Kai Hon Lau, Huamin Qu, Dongyu Liu





The escalating frequency and intensity of heat-related climate events, particularly heatwaves, emphasize the pressing need for advanced heat risk management strategies. Current approaches, primarily relying on numerical models, face challenges in spatial-temporal resolution and in capturing the dynamic interplay of environmental, social, and behavioral factors affecting heat risks. This has led to difficulties in translating risk assessments into effective mitigation actions. Recognizing these problems, we introduce a novel approach leveraging the burgeoning capabilities of Large Language Models (LLMs) to extract rich and contextual insights from news reports. We hence propose an LLM-empowered visual analytics system, Havior, that integrates the precise, data-driven insights of numerical models with nuanced news report information. This hybrid approach enables a more comprehensive assessment of heat risks and better identification, assessment, and mitigation of heat-related threats. The system incorporates novel visualization designs, such as thermoglyph and news glyph, enhancing intuitive understanding and analysis of heat risks. The integration of LLM-based techniques also enables advanced information retrieval and semantic knowledge extraction that can be guided by experts' analytics needs. Our case studies on two cities that faced significant heatwave events and interviews with five experts have demonstrated the usefulness of our system in providing in-depth and actionable insights for heat risk management.

Read more


ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

ExtremeCast: Boosting Extreme Value Prediction for Global Weather Forecast

Wanghan Xu, Kang Chen, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai





Data-driven weather forecast based on machine learning (ML) has experienced rapid development and demonstrated superior performance in the global medium-range forecast compared to traditional physics-based dynamical models. However, most of these ML models struggle with accurately predicting extreme weather, which is closely related to the extreme value prediction. Through mathematical analysis, we prove that the use of symmetric losses, such as the Mean Squared Error (MSE), leads to biased predictions and underestimation of extreme values. To address this issue, we introduce Exloss, a novel loss function that performs asymmetric optimization and highlights extreme values to obtain accurate extreme weather forecast. Furthermore, we introduce a training-free extreme value enhancement strategy named ExEnsemble, which increases the variance of pixel values and improves the forecast robustness. Combined with an advanced global weather forecast model, extensive experiments show that our solution can achieve state-of-the-art performance in extreme weather prediction, while maintaining the overall forecast accuracy comparable to the top medium-range forecast models.

Read more
