Approximately Invertible Neural Network for Learned Image Compression

Read original: arXiv:2408.17073 - Published 9/2/2024 by Yanbo Gao, Meng Fu, Shuai Li, Chong Lv, Xun Cai, Hui Yuan, Mao Ye

Approximately Invertible Neural Network for Learned Image Compression

Overview

Learned image compression using an approximately invertible neural network
Allows for efficient encoding and decoding of images
Offers potential improvements over traditional compression methods

Plain English Explanation

Learned image compression is a technique that uses a special type of neural network, called an approximately invertible neural network, to encode and decode images in an efficient way. This approach aims to improve upon traditional image compression methods, such as JPEG, by leveraging the power of machine learning.

The key idea is that the neural network can learn to represent the original image in a more compact form, called the "latent representation." This latent representation can then be transmitted or stored, and the original image can be reconstructed from it. The "approximately invertible" nature of the network means that the encoding and decoding processes are not perfect, but they are close enough to allow for high-quality image reconstruction.

One of the main benefits of this approach is that it can potentially achieve better compression ratios compared to traditional methods, while still maintaining a high level of image quality. This could be particularly useful in applications where storage space or bandwidth is limited, such as in mobile devices or remote sensing applications.

Technical Explanation

The paper introduces an approximately invertible neural network for learned image compression. The network consists of an encoder and a decoder, where the encoder converts the input image into a compact latent representation, and the decoder reconstructs the original image from this representation.

The key innovation is the use of an approximately invertible architecture, which means that the encoding and decoding processes are not perfect inverses of each other, but are close enough to allow for high-quality image reconstruction. This is achieved through the use of a specialized loss function and a carefully designed network architecture.

The authors conduct experiments to evaluate the performance of their approach on various image datasets, comparing it to traditional compression methods as well as other learned compression techniques. The results show that the proposed method can achieve better compression ratios while maintaining comparable or even superior image quality.

Critical Analysis

The paper presents a promising approach to learned image compression, but it also acknowledges some limitations and areas for further research. For example, the approximately invertible nature of the network means that there are some inevitable losses in the encoding and decoding processes, which could limit the achievable compression ratios or image quality.

Additionally, the authors note that the performance of the method may be dependent on the specific image dataset and compression targets, and further work may be needed to optimize the approach for different applications.

Overall, the research represents an interesting step forward in the field of learned image compression, but there is still room for improvement and further exploration of the techniques and their practical implications.

Conclusion

The paper introduces an approximately invertible neural network for learned image compression, which aims to achieve better compression ratios and image quality compared to traditional methods. The key innovation is the use of a specialized network architecture and loss function to enable efficient encoding and decoding of images.

The results of the experiments are promising, suggesting that this approach could be useful in applications where storage space or bandwidth is limited. However, the authors also acknowledge some limitations and areas for further research, such as the dependency on the specific dataset and compression targets.

This work contributes to the ongoing efforts to develop more efficient and effective image compression techniques, which could have widespread applications in fields like mobile computing, remote sensing, and multimedia streaming.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Approximately Invertible Neural Network for Learned Image Compression

Yanbo Gao, Meng Fu, Shuai Li, Chong Lv, Xun Cai, Hui Yuan, Mao Ye

Learned image compression have attracted considerable interests in recent years. It typically comprises an analysis transform, a synthesis transform, quantization and an entropy coding model. The analysis transform and synthesis transform are used to encode an image to latent feature and decode the quantized feature to reconstruct the image, and can be regarded as coupled transforms. However, the analysis transform and synthesis transform are designed independently in the existing methods, making them unreliable in high-quality image compression. Inspired by the invertible neural networks in generative modeling, invertible modules are used to construct the coupled analysis and synthesis transforms. Considering the noise introduced in the feature quantization invalidates the invertible process, this paper proposes an Approximately Invertible Neural Network (A-INN) framework for learned image compression. It formulates the rate-distortion optimization in lossy image compression when using INN with quantization, which differentiates from using INN for generative modelling. Generally speaking, A-INN can be used as the theoretical foundation for any INN based lossy compression method. Based on this formulation, A-INN with a progressive denoising module (PDM) is developed to effectively reduce the quantization noise in the decoding. Moreover, a Cascaded Feature Recovery Module (CFRM) is designed to learn high-dimensional feature recovery from low-dimensional ones to further reduce the noise in feature channel compression. In addition, a Frequency-enhanced Decomposition and Synthesis Module (FDSM) is developed by explicitly enhancing the high-frequency components in an image to address the loss of high-frequency information inherent in neural network based image compression. Extensive experiments demonstrate that the proposed A-INN outperforms the existing learned image compression methods.

9/2/2024

Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network

Junhui Li, Xingsong Hou

Decoding remote sensing images to achieve high perceptual quality, particularly at low bitrates, remains a significant challenge. To address this problem, we propose the invertible neural network-based remote sensing image compression (INN-RSIC) method. Specifically, we capture compression distortion from an existing image compression algorithm and encode it as a set of Gaussian-distributed latent variables via INN. This ensures that the compression distortion in the decoded image becomes independent of the ground truth. Therefore, by leveraging the inverse mapping of INN, we can input the decoded image along with a set of randomly resampled Gaussian distributed variables into the inverse network, effectively generating enhanced images with better perception quality. To effectively learn compression distortion, channel expansion, Haar transformation, and invertible blocks are employed to construct the INN. Additionally, we introduce a quantization module (QM) to mitigate the impact of format conversion, thus enhancing the framework's generalization and improving the perceptual quality of enhanced images. Extensive experiments demonstrate that our INN-RSIC significantly outperforms the existing state-of-the-art traditional and deep learning-based image compression methods in terms of perception quality.

8/27/2024

🖼️

Multiscale Augmented Normalizing Flows for Image Compression

Marc Windsheimer, Fabian Brand, Andr'e Kaup

Most learning-based image compression methods lack efficiency for high image quality due to their non-invertible design. The decoding function of the frequently applied compressive autoencoder architecture is only an approximated inverse of the encoding transform. This issue can be resolved by using invertible latent variable models, which allow a perfect reconstruction if no quantization is performed. Furthermore, many traditional image and video coders apply dynamic block partitioning to vary the compression of certain image regions depending on their content. Inspired by this approach, hierarchical latent spaces have been applied to learning-based compression networks. In this paper, we present a novel concept, which adapts the hierarchical latent space for augmented normalizing flows, an invertible latent variable model. Our best performing model achieved average rate savings of more than 7% over comparable single-scale models.

5/24/2024

FusionINN: Invertible Image Fusion for Brain Tumor Monitoring

Nishant Kumar, Ziyan Tao, Jaikirat Singh, Yang Li, Peiwen Sun, Binghui Zhao, Stefan Gumhold

Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image. However, for clinical experts, solely relying on fused images may be insufficient for making diagnostic decisions, as the fusion mechanism blends features from source images, thereby making it difficult to interpret the underlying tumor pathology. We introduce FusionINN, a novel decomposable image fusion framework, capable of efficiently generating fused images and also decomposing them back to the source images. FusionINN is designed to be bijective by including a latent image alongside the fused image, while ensuring minimal transfer of information from the source images to the latent representation. To the best of our knowledge, we are the first to investigate the decomposability of fused images, which is particularly crucial for life-sensitive applications such as medical image fusion compared to other tasks like multi-focus or multi-exposure image fusion. Our extensive experimentation validates FusionINN over existing discriminative and generative fusion methods, both subjectively and objectively. Moreover, compared to a recent denoising diffusion-based fusion model, our approach offers faster and qualitatively better fusion results.

6/11/2024