Machine Learning Techniques for Data Reduction of CFD Applications

Read original: arXiv:2404.18063 - Published 4/30/2024 by Jaemoon Lee, Ki Sung Jung, Qian Gong, Xiao Li, Scott Klasky, Jacqueline Chen, Anand Rangarajan, Sanjay Ranka

📊

Overview

Presents an approach called Guaranteed Block Autoencoder that leverages Tensor Correlations (GBATC) to reduce spatiotemporal data from computational fluid dynamics (CFD) and other scientific applications
Uses a multidimensional block of tensors (spanning space and time) for both input and output to capture spatiotemporal and interspecies relationships
Applies principal component analysis (PCA) to the residual between original and reconstructed data to guarantee an error bound

Plain English Explanation

The paper introduces a new method called Guaranteed Block Autoencoder that leverages Tensor Correlations (GBATC) to compress data generated by computational fluid dynamics (CFD) simulations and other scientific applications. CFD simulations produce large amounts of spatiotemporal data, which can be challenging to store and transmit.

The GBATC approach addresses this challenge by using a special type of data structure called a tensor. A tensor is like a multidimensional grid that can capture both spatial and temporal relationships in the data. The method takes a block of this tensor data as input and tries to reconstruct it using a compact representation.

To ensure the reconstructed data is accurate, the method applies a technique called principal component analysis (PCA) to the difference between the original and reconstructed data. This allows it to identify the most important features in the residual error and store them, enabling highly accurate reconstruction even with significant data reduction.

Experiments show this approach can achieve two orders of magnitude in data reduction while keeping the error within scientifically acceptable bounds. Compared to other compression techniques like SZ, GBATC delivers substantially higher compression ratios for a given error level, or better accuracy for a given compression ratio.

Technical Explanation

The GBATC approach operates on a multidimensional block of tensors that capture both the spatial and temporal relationships in the CFD data, as well as the relationships between different chemical species represented in the simulation. This tensor structure allows the method to exploit correlations in the data across space, time, and variables.

The autoencoder architecture of GBATC learns a compressed representation of the input tensor block, which is then used to reconstruct the original data. To guarantee the error bound of the reconstructed data, the method applies principal component analysis (PCA) to the residual between the original and reconstructed tensors. This yields a basis matrix that captures the most important features in the residual, which is then used to project the residual of each individual instance. The resulting coefficients are retained to enable accurate reconstruction of the original data.

The experimental results demonstrate that GBATC can achieve two orders of magnitude in data reduction while keeping the errors within scientifically acceptable bounds. Compared to other tensor compression techniques like SZ, GBATC achieves substantially higher compression ratios for a given error bound or better accuracy for a given compression ratio.

Critical Analysis

The paper provides a comprehensive evaluation of the GBATC approach, including comparisons to other state-of-the-art techniques. However, it does not explicitly discuss the potential limitations or caveats of the method.

One area that could be explored further is the scalability of GBATC as the size and complexity of the input tensors increase. The paper focuses on relatively small CFD simulations, but it is unclear how well the method would perform on larger, more realistic scientific datasets.

Additionally, the paper does not address the computational complexity of the GBATC approach, particularly the PCA step used to guarantee the error bound. This could be an important consideration for real-world applications, where the time and resources required for compression and decompression may be a critical factor.

Finally, the paper could benefit from a more in-depth discussion of the broader implications of this research. While the focus is on CFD data, the GBATC approach may have applications in other domains that generate large spatiotemporal datasets, such as Earth system modeling or medical imaging. Exploring these potential use cases could help highlight the wider significance of the proposed technique.

Conclusion

The Guaranteed Block Autoencoder that leverages Tensor Correlations (GBATC) presented in this paper offers a promising approach for reducing the storage and transmission requirements of spatiotemporal data generated by computational fluid dynamics and other scientific applications. By exploiting the inherent structure of the data through tensor representations and applying principal component analysis to guarantee error bounds, GBATC can achieve substantial data compression while maintaining scientific accuracy.

While the paper provides a thorough evaluation of the method, further research is needed to address potential scalability and computational complexity concerns, as well as explore the broader applicability of the GBATC approach beyond the specific CFD use case. Nonetheless, this work represents an important contribution to the field of scientific data compression and could have significant implications for the efficient management and analysis of large-scale simulation data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Machine Learning Techniques for Data Reduction of CFD Applications

Jaemoon Lee, Ki Sung Jung, Qian Gong, Xiao Li, Scott Klasky, Jacqueline Chen, Anand Rangarajan, Sanjay Ranka

We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. The tensor consists of species that represent different elements in a CFD simulation. To guarantee the error bound of the reconstructed data, principal component analysis (PCA) is applied to the residual between the original and reconstructed data. This yields a basis matrix, which is then used to project the residual of each instance. The resulting coefficients are retained to enable accurate reconstruction. Experimental results demonstrate that our approach can deliver two orders of magnitude in reduction while still keeping the errors of primary data under scientifically acceptable bounds. Compared to reduction-based approaches based on SZ, our method achieves a substantially higher compression ratio for a given error bound or a better error for a given compression ratio.

4/30/2024

Attention Based Machine Learning Methods for Data Reduction with Guaranteed Error Bounds

Xiao Li, Jaemoon Lee, Anand Rangarajan, Sanjay Ranka

Scientific applications in fields such as high energy physics, computational fluid dynamics, and climate science generate vast amounts of data at high velocities. This exponential growth in data production is surpassing the advancements in computing power, network capabilities, and storage capacities. To address this challenge, data compression or reduction techniques are crucial. These scientific datasets have underlying data structures that consist of structured and block structured multidimensional meshes where each grid point corresponds to a tensor. It is important that data reduction techniques leverage strong spatial and temporal correlations that are ubiquitous in these applications. Additionally, applications such as CFD, process tensors comprising hundred plus species and their attributes at each grid point. Reduction techniques should be able to leverage interrelationships between the elements in each tensor. In this paper, we propose an attention-based hierarchical compression method utilizing a block-wise compression setup. We introduce an attention-based hyper-block autoencoder to capture inter-block correlations, followed by a block-wise encoder to capture block-specific information. A PCA-based post-processing step is employed to guarantee error bounds for each data block. Our method effectively captures both spatiotemporal and inter-variable correlations within and between data blocks. Compared to the state-of-the-art SZ3, our method achieves up to 8 times higher compression ratio on the multi-variable S3D dataset. When evaluated on single-variable setups using the E3SM and XGC datasets, our method still achieves up to 3 times and 2 times higher compression ratio, respectively.

9/10/2024

Machine Learning Techniques for Data Reduction of Climate Applications

Xiao Li, Qian Gong, Jaemoon Lee, Scott Klasky, Anand Rangarajan, Sanjay Ranka

Scientists conduct large-scale simulations to compute derived quantities-of-interest (QoI) from primary data. Often, QoI are linked to specific features, regions, or time intervals, such that data can be adaptively reduced without compromising the integrity of QoI. For many spatiotemporal applications, these QoI are binary in nature and represent presence or absence of a physical phenomenon. We present a pipelined compression approach that first uses neural-network-based techniques to derive regions where QoI are highly likely to be present. Then, we employ a Guaranteed Autoencoder (GAE) to compress data with differential error bounds. GAE uses QoI information to apply low-error compression to only these regions. This results in overall high compression ratios while still achieving downstream goals of simulation or data collections. Experimental results are presented for climate data generated from the E3SM Simulation model for downstream quantities such as tropical cyclone and atmospheric river detection and tracking. These results show that our approach is superior to comparable methods in the literature.

5/3/2024

Sparsifying dimensionality reduction of PDE solution data with Bregman learning

Tjeerd Jan Heeringa, Christoph Brune, Mengwu Guo

Classical model reduction techniques project the governing equations onto a linear subspace of the original state space. More recent data-driven techniques use neural networks to enable nonlinear projections. Whilst those often enable stronger compression, they may have redundant parameters and lead to suboptimal latent dimensionality. To overcome these, we propose a multistep algorithm that induces sparsity in the encoder-decoder networks for effective reduction in the number of parameters and additional compression of the latent space. This algorithm starts with sparsely initialized a network and training it using linearized Bregman iterations. These iterations have been very successful in computer vision and compressed sensing tasks, but have not yet been used for reduced-order modelling. After the training, we further compress the latent space dimensionality by using a form of proper orthogonal decomposition. Last, we use a bias propagation technique to change the induced sparsity into an effective reduction of parameters. We apply this algorithm to three representative PDE models: 1D diffusion, 1D advection, and 2D reaction-diffusion. Compared to conventional training methods like Adam, the proposed method achieves similar accuracy with 30% less parameters and a significantly smaller latent space.

6/19/2024