Time Series Predictions in Unmonitored Sites: A Survey of Machine Learning Techniques in Water Resources

Read original: arXiv:2308.09766 - Published 8/15/2024 by Jared D. Willard, Charuleka Varadharajan, Xiaowei Jia, Vipin Kumar

✨

Overview

Predicting environmental variables in unstudied areas is an ongoing challenge for water resource management.
Many regions lack adequate monitoring of critical variables like river flow and water quality.
This is increasingly urgent due to climate and land use changes affecting water resources.
Machine learning methods often outperform traditional models for hydrological time series prediction.

Plain English Explanation

Understanding the water resources in an area is crucial for managing them effectively. However, monitoring all the necessary environmental variables can be difficult, especially in remote or underdeveloped regions. This leaves many areas with incomplete data about things like river flow and water quality.

As the climate changes and land use patterns shift, having accurate predictions of these environmental factors has become increasingly important. Traditional modeling approaches have struggled to keep up, but newer machine learning techniques have shown promise. They can uncover insights from large, diverse datasets in ways that rule-based models often can't.

This paper reviews the state-of-the-art in using machine learning for forecasting things like streamflow and water quality. It discusses opportunities to further improve these models by incorporating more information about the watershed characteristics, transferring learning between sites, and blending machine learning with scientific process knowledge.

Technical Explanation

The paper examines the challenge of predicting critical environmental variables like river flow and water quality in areas that lack sufficient monitoring infrastructure. This is a longstanding problem for water resource management, made more urgent by climate change and shifting land use patterns.

The authors review recent advancements in using machine learning methods to forecast hydrological time series. They find that modern machine learning approaches often outperform traditional process-based and empirical models, thanks to their ability to extract insights from large, diverse datasets.

The paper discusses opportunities to further enhance machine learning for water resource predictions. This includes incorporating more information about watershed characteristics into deep learning models, leveraging transfer learning to apply models across sites, and blending machine learning with scientific process knowledge.

The analysis suggests that most prior work has focused on deep learning frameworks trained on many sites to predict daily time series in the United States. However, the authors note a lack of thorough comparisons between different machine learning approaches. They identify several open questions, such as how to best incorporate dynamic inputs, site characteristics, mechanistic understanding, and explainable AI into machine learning models for predicting environmental variables in unmonitored areas.

Critical Analysis

The paper provides a comprehensive review of the state-of-the-art in using machine learning for predicting environmental variables like streamflow and water quality in unmonitored regions. It identifies several promising directions for further research and development, such as incorporating more contextual information about watersheds and blending machine learning with scientific process knowledge.

One potential limitation noted is the lack of thorough comparisons between different machine learning approaches. The authors suggest that most prior work has focused on deep learning frameworks, but it would be valuable to also evaluate the performance of other techniques like decision trees or random forests on these types of prediction tasks.

Additionally, the paper primarily discusses research conducted in the United States. It would be helpful to see more analysis of how these machine learning methods perform in other regions with different climates, data availability, and watershed characteristics. Expanding the geographic scope could yield additional insights and identify new challenges or opportunities.

Overall, this paper provides a solid foundation for understanding the current state of machine learning in water resources science and highlights several promising avenues for future work in this important and impactful field.

Conclusion

Predicting environmental variables like river flow and water quality in unmonitored areas remains a significant challenge for water resource management. However, the authors of this paper find that modern machine learning techniques often outperform traditional modeling approaches in this domain.

The review identifies several opportunities to further improve machine learning for hydrological time series prediction, such as incorporating more contextual information about watersheds, leveraging transfer learning, and blending machine learning with scientific process knowledge. Addressing these open questions could lead to more robust and reliable predictions to support better management of water resources, especially in the face of climate and land use changes.

While the paper focuses primarily on research in the United States, its insights have broader implications for water resource science and management around the world. Continued advancements in this field have the potential to improve decision-making and help ensure the long-term sustainability of our precious freshwater resources.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

✨

Time Series Predictions in Unmonitored Sites: A Survey of Machine Learning Techniques in Water Resources

Jared D. Willard, Charuleka Varadharajan, Xiaowei Jia, Vipin Kumar

Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world's freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics into deep learning models, transfer learning, and incorporating process knowledge into machine learning models. The analysis here suggests most prior efforts have been focused on deep learning learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.

8/15/2024

Streamflow Prediction with Uncertainty Quantification for Water Management: A Constrained Reasoning and Learning Approach

Mohammed Amine Gharsallaoui, Bhupinderjeet Singh, Supriya Savalkar, Aryan Deshwal, Yan Yan, Ananth Kalyanaraman, Kirti Rajagopalan, Janardhan Rao Doppa

Predicting the spatiotemporal variation in streamflow along with uncertainty quantification enables decision-making for sustainable management of scarce water resources. Process-based hydrological models (aka physics-based models) are based on physical laws, but using simplifying assumptions which can lead to poor accuracy. Data-driven approaches offer a powerful alternative, but they require large amount of training data and tend to produce predictions that are inconsistent with physical laws. This paper studies a constrained reasoning and learning (CRL) approach where physical laws represented as logical constraints are integrated as a layer in the deep neural network. To address small data setting, we develop a theoretically-grounded training approach to improve the generalization accuracy of deep models. For uncertainty quantification, we combine the synergistic strengths of Gaussian processes (GPs) and deep temporal models (i.e., deep models for time-series forecasting) by passing the learned latent representation as input to a standard distance-based kernel. Experiments on multiple real-world datasets demonstrate the effectiveness of both CRL and GP with deep kernel approaches over strong baseline methods.

6/4/2024

Evaluation of deep learning models for Australian climate extremes: prediction of streamflow and floods

Siddharth Khedkar, R. Willem Vervoort, Rohitash Chandra

In recent years, climate extremes such as floods have created significant environmental and economic hazards for Australia, causing damage to the environment and economy and losses of human and animal lives. An efficient method of forecasting floods is crucial to limit this damage. Techniques for flood prediction are currently based on hydrological, and hydrodynamic (physically-based) numerical models. Machine learning methods that include deep learning offer certain advantages over conventional physically based approaches, including flexibility and accuracy. Deep learning methods have been promising for predicting small to medium-sized climate extreme events over a short time horizon; however, large flooding events present a critical challenge. We present an ensemble-based machine learning approach that addresses large-scale extreme flooding challenges using a switching mechanism motivated by extreme-value theory for long-short-term-memory (LSTM) deep learning models. We use a multivariate and multi-step time-series prediction approach to predict streamflow for multiple days ahead in the major catchments of Australia. The ensemble framework also employs static information to enrich the time-series information, allowing for regional modelling across catchments. Our results demonstrate enhanced prediction of streamflow extremes, with notable efficacy for large flooding scenarios in the selected Australian catchments. Through comparative analysis, our methodology underscores the potential for deep learning models to revolutionise flood forecasting across diverse regions.

7/24/2024

Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge

Timothy Dai, Kate Maher, Zach Perzan

Process-based hydrologic models are invaluable tools for understanding the terrestrial water cycle and addressing modern water resources problems. However, many hydrologic models are computationally expensive and, depending on the resolution and scale, simulations can take on the order of hours to days to complete. While techniques such as uncertainty quantification and optimization have become valuable tools for supporting management decisions, these analyses typically require hundreds of model simulations, which are too computationally expensive to perform with a process-based hydrologic model. To address this gap, we propose a hybrid modeling workflow in which a process-based model is used to generate an initial set of simulations and a machine learning (ML) surrogate model is then trained to perform the remaining simulations required for downstream analysis. As a case study, we apply this workflow to simulations of variably saturated groundwater flow at a prospective managed aquifer recharge (MAR) site. We compare the accuracy and computational efficiency of several ML architectures, including deep convolutional networks, recurrent neural networks, vision transformers, and networks with Fourier transforms. Our results demonstrate that ML surrogate models can achieve under 10% mean absolute percentage error and yield order-of-magnitude runtime savings over processed-based models. We also offer practical recommendations for training hydrologic surrogate models, including implementing data normalization to improve accuracy, using a normalized loss function to improve training stability and downsampling input features to decrease memory requirements.

7/31/2024