JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

Read original: arXiv:2308.09110 - Published 5/6/2024 by Mingyu Ouyang, Zhenzhong Chen

🌐

Overview

JPEG compression uses quantization of Discrete Cosine Transform (DCT) coefficients to reduce file size, but this can lead to loss of important image details.
Recovering compressed JPEG images in the frequency domain has gained interest as a complement to pixel-domain restoration techniques.
Existing DCT domain methods have limitations in handling a wide range of compression quality factors or recovering sparse quantized coefficients and components across color spaces.

Plain English Explanation

The JPEG image compression format uses a technique called quantization to reduce the file size of images. Quantization involves converting the image data, which is represented using a mathematical function called the Discrete Cosine Transform (DCT), into a smaller set of values. This process can lead to a significant loss of important details in the original image.

Recently, there has been growing interest in recovering the lost image information directly in the frequency domain, which is where the DCT coefficients live, rather than just working on the final pixel values. This complements the many existing techniques for restoring JPEG-compressed images in the pixel domain.

However, the current DCT domain methods have some limitations. They may not work well across a wide range of compression quality factors, or they may struggle to recover the sparse quantized coefficients and the components in different color spaces (like luminance and chrominance).

Technical Explanation

To address these challenges, the researchers propose a DCT domain spatial-frequential Transformer, called DCTransformer, for JPEG quantized coefficient recovery.

The key aspects of their approach are:

Dual-branch architecture: The model has two separate branches, one to capture the spatial correlations within the DCT coefficients and another to capture the frequential correlations.
Quantization matrix embedding: This allows the same model to handle a wide range of compression quality factors, rather than needing a separate model for each factor.
Luminance-chrominance alignment head: This produces a unified feature map to align the differently-sized luminance and chrominance components, which have different spatial resolutions.

The researchers show that their DCTransformer outperforms the current state-of-the-art techniques for JPEG artifact removal in their experiments.

Critical Analysis

The paper presents a novel and effective approach for recovering JPEG-compressed images in the frequency domain. The authors have addressed important limitations of prior DCT domain methods, such as the ability to handle a wide range of quality factors and align components across color spaces.

However, the paper does not discuss any potential caveats or limitations of the proposed DCTransformer. For example, it would be helpful to know how the method performs on low-quality or heavily compressed images, where the quantization artifacts may be more severe.

Additionally, the paper could have explored the computational complexity and inference time of the DCTransformer, as these factors are crucial for real-world image processing applications. Comparisons to efficient large language models or diffusion-based image compression would also provide helpful context.

Overall, the spatial-frequency dual-domain feature fusion network proposed in this paper represents a significant advancement in JPEG artifact removal, but further research could explore its limitations and practical implications more thoroughly.

Conclusion

The researchers have developed a novel DCT domain Transformer, called DCTransformer, that can effectively recover JPEG-compressed images by capturing both spatial and frequential correlations within the DCT coefficients. The model's ability to handle a wide range of compression quality factors and align components across color spaces makes it a promising advancement in the field of JPEG artifact removal. With further exploration of its limitations and practical considerations, the DCTransformer could have valuable applications in image processing and compression.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

Mingyu Ouyang, Zhenzhong Chen

JPEG compression adopts the quantization of Discrete Cosine Transform (DCT) coefficients for effective bit-rate reduction, whilst the quantization could lead to a significant loss of important image details. Recovering compressed JPEG images in the frequency domain has recently garnered increasing interest, complementing the multitude of restoration techniques established in the pixel domain. However, existing DCT domain methods typically suffer from limited effectiveness in handling a wide range of compression quality factors or fall short in recovering sparse quantized coefficients and the components across different colorspaces. To address these challenges, we propose a DCT domain spatial-frequential Transformer, namely DCTransformer, for JPEG quantized coefficient recovery. Specifically, a dual-branch architecture is designed to capture both spatial and frequential correlations within the collocated DCT coefficients. Moreover, we incorporate the operation of quantization matrix embedding, which effectively allows our single model to handle a wide range of quality factors, and a luminance-chrominance alignment head that produces a unified feature map to align different-sized luminance and chrominance components. Our proposed DCTransformer outperforms the current state-of-the-art JPEG artifact removal techniques, as demonstrated by our extensive experiments.

5/6/2024

🏷️

JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Woo Kyoung Han, Sunghoon Im, Jaedeok Kim, Kyong Hwan Jin

We propose a practical approach to JPEG image decoding, utilizing a local implicit neural representation with continuous cosine formulation. The JPEG algorithm significantly quantizes discrete cosine transform (DCT) spectra to achieve a high compression rate, inevitably resulting in quality degradation while encoding an image. We have designed a continuous cosine spectrum estimator to address the quality degradation issue that restores the distorted spectrum. By leveraging local DCT formulations, our network has the privilege to exploit dequantization and upsampling simultaneously. Our proposed model enables decoding compressed images directly across different quality factors using a single pre-trained model without relying on a conventional JPEG decoder. As a result, our proposed network achieves state-of-the-art performance in flexible color image JPEG artifact removal tasks. Our source code is available at https://github.com/WooKyoungHan/JDEC.

4/9/2024

Approximate DCT and Quantization Techniques for Energy-Constrained Image Sensors

Ming-Che Li, Archisman Ghosh, Shreyas Sen

Recent expansions in multimedia devices gather enormous amounts of real-time images for processing and inference. The images are first compressed using compression schemes, like JPEG, to reduce storage costs and power for transmitting the captured data. Due to inherent error resilience and imperceptibility in images, JPEG can be approximated to reduce the required computation power and area. This work demonstrates the first end-to-end approximation computing-based optimization of JPEG hardware using i) an approximate division realized using bit-shift operators to reduce the complexity of the quantization block, ii) loop perforation, and iii) precision scaling on top of a multiplier-less fast DCT architecture to achieve an extremely energy-efficient JPEG compression unit which will be a perfect fit for power/bandwidth-limited scenario. Furthermore, a gradient descent-based heuristic composed of two conventional approximation strategies, i.e., Precision Scaling and Loop Perforation, is implemented for tuning the degree of approximation to trade off energy consumption with the quality degradation of the decoded image. The entire RTL design is coded in Verilog HDL, synthesized, mapped to TSMC 65nm CMOS technology, and simulated using Cadence Spectre Simulator under 25$^{circ}$textbf{C}, TT corner. The approximate division approach achieved around $textbf{28%}$ reduction in the active design area. The heuristic-based approximation technique combined with accelerator optimization achieves a significant energy reduction of $textbf{36%}$ for a minimal image quality degradation of $textbf{2%}$ SAD. Simulation results also show that the proposed architecture consumes 15uW at the DCT and quantization stages to compress a colored 480p image at 6fps.

6/25/2024

Bi-Level Spatial and Channel-aware Transformer for Learned Image Compression

Hamidreza Soltani, Erfan Ghasemi

Recent advancements in learned image compression (LIC) methods have demonstrated superior performance over traditional hand-crafted codecs. These learning-based methods often employ convolutional neural networks (CNNs) or Transformer-based architectures. However, these nonlinear approaches frequently overlook the frequency characteristics of images, which limits their compression efficiency. To address this issue, we propose a novel Transformer-based image compression method that enhances the transformation stage by considering frequency components within the feature map. Our method integrates a novel Hybrid Spatial-Channel Attention Transformer Block (HSCATB), where a spatial-based branch independently handles high and low frequencies at the attention layer, and a Channel-aware Self-Attention (CaSA) module captures information across channels, significantly improving compression performance. Additionally, we introduce a Mixed Local-Global Feed Forward Network (MLGFFN) within the Transformer block to enhance the extraction of diverse and rich information, which is crucial for effective compression. These innovations collectively improve the transformation's ability to project data into a more decorrelated latent space, thereby boosting overall compression efficiency. Experimental results demonstrate that our framework surpasses state-of-the-art LIC methods in rate-distortion performance.

8/9/2024