Efficient Learned Wavelet Image and Video Coding

Read original: arXiv:2405.12631 - Published 5/22/2024 by Anna Meyer, Srivatsa Prativadibhayankaram, Andr'e Kaup

🖼️

Overview

This paper introduces a new approach to improve the decoding speed of the state-of-the-art iWave++ image and video compression model, while maintaining its strong compression performance.
iWave++ uses a wavelet-based latent space and an autoregressive context model, which results in high compression quality but slow decoding speed.
The authors propose a parallelized context model that can be integrated into the iWave++ framework, leading to a significant speedup in decoding without major loss in compression performance.
The learned wavelet decomposition in iWave++ is also analyzed by visualizing its subband impulse responses.

Plain English Explanation

The paper focuses on improving the speed of iWave++, a state-of-the-art image and video compression model. iWave++ works by representing the image or video in a special "wavelet" space, which allows it to achieve very high compression rates while maintaining quality.

However, the way iWave++ does this encoding and decoding process is quite slow, making it impractical for real-world applications. The authors of this paper have found a way to speed up the decoding process by using a new type of "parallel" context model, rather than the original autoregressive one.

This new approach can decode images and videos over 350 times faster than the original iWave++, while only losing a small amount of compression performance (about 1-1.5% increase in file size). The authors also take a look under the hood of iWave++ to better understand how its wavelet-based encoding works.

Overall, this research represents an important step forward in making high-quality image and video compression more practical and accessible for real-world use cases, by addressing the key challenge of decoding speed.

Technical Explanation

The paper builds upon the iWave++ image and video compression model, which uses a wavelet-based latent space and an autoregressive context model to achieve state-of-the-art compression performance. However, the autoregressive context model in iWave++ results in slow decoding speeds.

To address this, the authors propose integrating a parallelized context model into the iWave++ framework. This allows for much faster decoding, with experimental results demonstrating a speedup factor of over 350 for image compression and 240 for video compression. At the same time, the rate-distortion performance only slightly decreases by 1.5% for image coding and 1% for video coding.

The authors also analyze the learned wavelet decomposition in iWave++ by visualizing its subband impulse responses. This provides insights into the internal workings of the model's wavelet-based latent space representation.

Critical Analysis

The paper presents a promising approach to improving the decoding speed of the iWave++ compression model, which is a key limitation of the original work. By introducing a parallelized context model, the authors are able to achieve dramatic speedups without significantly sacrificing compression performance.

However, the paper does not delve into the potential tradeoffs or limitations of this parallelized context model. For example, it's unclear whether the model has any additional memory or computational requirements compared to the original autoregressive approach. Additionally, the authors do not explore how the parallelized model might scale to higher resolutions or more complex video sequences.

Furthermore, the analysis of the learned wavelet decomposition is relatively high-level, and it would be interesting to see a more in-depth examination of how the model's internal representations compare to traditional wavelet-based approaches. Comparing the model to other state-of-the-art approaches like WaveDH or HybridFlow could also provide valuable insights.

Overall, this research represents an important step forward in making high-quality image and video compression more practical and accessible, but further investigation into the tradeoffs and limitations of the proposed approach would help strengthen the findings and their potential impact.

Conclusion

This paper introduces a novel approach to improve the decoding speed of the state-of-the-art iWave++ image and video compression model, without significantly compromising its strong compression performance. By integrating a parallelized context model into the iWave++ framework, the authors achieve a remarkable speedup in decoding, making the model more practical for real-world applications.

The analysis of the learned wavelet decomposition also provides valuable insights into the inner workings of the iWave++ model, which could inform the development of future wavelet-based compression techniques. Overall, this research represents an important advancement in the field of image and video compression, paving the way for more efficient and accessible high-quality media delivery across a wide range of devices and applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

Efficient Learned Wavelet Image and Video Coding

Anna Meyer, Srivatsa Prativadibhayankaram, Andr'e Kaup

Learned wavelet image and video coding approaches provide an explainable framework with a latent space corresponding to a wavelet decomposition. The wavelet image coder iWave++ achieves state-of-the-art performance and has been employed for various compression tasks, including lossy as well as lossless image, video, and medical data compression. However, the approaches suffer from slow decoding speed due to the autoregressive context model used in iWave++. In this paper, we show how a parallelized context model can be integrated into the iWave++ framework. Our experimental results demonstrate a speedup factor of over 350 and 240 for image and video compression, respectively. At the same time, the rate-distortion performance in terms of Bj{o}ntegaard delta bitrate is slightly worse by 1.5% for image coding and 1% for video coding. In addition, we analyze the learned wavelet decomposition by visualizing its subband impulse responses.

5/22/2024

Learned Compression for Images and Point Clouds

Mateen Ulhaq

Over the last decade, deep learning has shown great success at performing computer vision tasks, including classification, super-resolution, and style transfer. Now, we apply it to data compression to help build the next generation of multimedia codecs. This thesis provides three primary contributions to this new field of learned compression. First, we present an efficient low-complexity entropy model that dynamically adapts the encoding distribution to a specific input by compressing and transmitting the encoding distribution itself as side information. Secondly, we propose a novel lightweight low-complexity point cloud codec that is highly specialized for classification, attaining significant reductions in bitrate compared to non-specialized codecs. Lastly, we explore how motion within the input domain between consecutive video frames is manifested in the corresponding convolutionally-derived latent space.

9/16/2024

↗️

Accelerating Learned Video Compression via Low-Resolution Representation Learning

Zidian Qiu, Zongyao He, Zhi Jin

In recent years, the field of learned video compression has witnessed rapid advancement, exemplified by the latest neural video codecs DCVC-DC that has outperformed the upcoming next-generation codec ECM in terms of compression ratio. Despite this, learned video compression frameworks often exhibit low encoding and decoding speeds primarily due to their increased computational complexity and unnecessary high-resolution spatial operations, which hugely hinder their applications in reality. In this work, we introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning, aiming to significantly enhance the encoding and decoding speeds. Firstly, we diminish the computational load by reducing the resolution of inter-frame propagated features obtained from reused features of decoded frames, including I-frames. We implement a joint training strategy for both the I-frame and P-frame models, further improving the compression ratio. Secondly, our approach efficiently leverages multi-frame priors for parameter prediction, minimizing computation at the decoding end. Thirdly, we revisit the application of the Online Encoder Update (OEU) strategy for high-resolution sequences, achieving notable improvements in compression ratio without compromising decoding efficiency. Our efficiency-optimized framework has significantly improved the balance between compression ratio and speed for learned video compression. In comparison to traditional codecs, our method achieves performance levels on par with the low-decay P configuration of the H.266 reference software VTM. Furthermore, when contrasted with DCVC-HEM, our approach delivers a comparable compression ratio while boosting encoding and decoding speeds by a factor of 3 and 7, respectively. On RTX 2080Ti, our method can decode each 1080p frame under 100ms.

7/24/2024

WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration

Xinxing Cheng, Xi Jia, Wenqi Lu, Qiufu Li, Linlin Shen, Alexander Krull, Jinming Duan

Deep image registration has demonstrated exceptional accuracy and fast inference. Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner. However, due to the cascaded nature and repeated composition/warping operations on feature maps, these methods negatively increase memory usage during training and testing. Moreover, such approaches lack explicit constraints on the learning process of small deformations at different scales, thus lacking explainability. In this study, we introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales, utilizing the wavelet coefficients derived from the original input image pair. By exploiting the properties of the wavelet transform, these estimated coefficients facilitate the seamless reconstruction of a full-resolution displacement/velocity field via our devised inverse discrete wavelet transform (IDWT) layer. This approach avoids the complexities of cascading networks or composition operations, making our WiNet an explainable and efficient competitor with other coarse-to-fine methods. Extensive experimental results from two 3D datasets show that our WiNet is accurate and GPU efficient. The code is available at https://github.com/x-xc/WiNet .

7/19/2024