Taming Lookup Tables for Efficient Image Retouching

Read original: arXiv:2403.19238 - Published 7/16/2024 by Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

Taming Lookup Tables for Efficient Image Retouching

Overview

This paper presents a method for efficiently storing and applying lookup tables (LUTs) for image retouching tasks.
The authors develop a novel approach to compress LUTs, allowing for their use in real-time applications without sacrificing quality.
The proposed technique leverages tensor decomposition to dramatically reduce the memory footprint of LUTs, making them more practical for deployment.

Plain English Explanation

Image editing and retouching often rely on lookup tables (LUTs) - mathematical functions that map input pixel values to desired output values. These LUTs can be used to apply various effects, such as color correction, tone mapping, and stylization. [object Object]

However, traditional LUTs can be quite large, often requiring hundreds of kilobytes or even megabytes of storage. This can make them impractical for use in real-time applications, such as video processing or mobile apps, where memory and computational resources are limited.

The researchers in this paper develop a new way to compress LUTs, reducing their size by up to 90% without sacrificing visual quality. [object Object] Their approach involves using a technique called tensor decomposition to factorize the LUT into a smaller, more efficient representation. This allows the LUT to be stored and applied much more quickly, making it suitable for a wider range of applications.

Technical Explanation

The authors propose a method for compressing lookup tables (LUTs) used in image retouching tasks. Traditionally, LUTs can be quite large, often requiring hundreds of kilobytes or even megabytes of storage. This can make them impractical for real-time applications, such as video processing or mobile apps, where memory and computational resources are limited.

To address this issue, the researchers develop a novel approach that leverages tensor decomposition to dramatically reduce the memory footprint of LUTs. [object Object] Specifically, they factorize the LUT into a set of smaller tensors, which can be stored and applied much more efficiently. This allows the LUT to be used in real-time applications without sacrificing visual quality.

The authors evaluate their method on a range of image retouching tasks, including color correction, tone mapping, and style transfer. They demonstrate that their compressed LUTs can achieve comparable or even better performance than traditional LUTs, while requiring significantly less memory and computational resources. [object Object]

Critical Analysis

The authors provide a thorough evaluation of their method, including comparisons to state-of-the-art LUT compression techniques and real-world applications. However, they do not address some potential limitations of their approach.

For example, the tensor decomposition process may not work as well for LUTs with more complex, non-linear structures. Additionally, the compressed LUTs may not be as flexible or customizable as their full-size counterparts, which could limit their usefulness in certain applications. [object Object]

Further research could explore ways to address these limitations, such as developing more advanced tensor decomposition algorithms or exploring hybrid approaches that combine compressed LUTs with other image processing techniques.

Conclusion

This paper presents a novel method for compressing lookup tables (LUTs) used in image retouching tasks. By leveraging tensor decomposition, the researchers are able to dramatically reduce the memory footprint of LUTs, making them more practical for real-time applications without sacrificing visual quality.

The proposed technique has the potential to significantly improve the efficiency and widespread adoption of LUT-based image processing, with applications in areas such as video editing, mobile photography, and computational photography. As the demands for high-quality, real-time image and video processing continue to grow, this research represents an important step towards more efficient and scalable solutions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Taming Lookup Tables for Efficient Image Retouching

Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT.

7/16/2024

Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong

Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut.

5/9/2024

WB LUTs: Contrastive Learning for White Balancing Lookup Tables

Sai Kumar Reddy Manne, Michael Wan

Automatic white balancing (AWB), one of the first steps in an integrated signal processing (ISP) pipeline, aims to correct the color cast induced by the scene illuminant. An incorrect white balance (WB) setting or AWB failure can lead to an undesired blue or red tint in the rendered sRGB image. To address this, recent methods pose the post-capture WB correction problem as an image-to-image translation task and train deep neural networks to learn the necessary color adjustments at a lower resolution. These low resolution outputs are post-processed to generate high resolution WB corrected images, forming a bottleneck in the end-to-end run time. In this paper we present a 3D Lookup Table (LUT) based WB correction model called WB LUTs that can generate high resolution outputs in real time. We introduce a contrastive learning framework with a novel hard sample mining strategy, which improves the WB correction quality of baseline 3D LUTs by 25.5%. Experimental results demonstrate that the proposed WB LUTs perform competitively against state-of-the-art models on two benchmark datasets while being 300 times faster using 12.7 times less memory. Our model and code are available at https://github.com/skrmanne/3DLUT_sRGB_WB.

4/17/2024

In-Loop Filtering via Trained Look-Up Tables

Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time and computational complexity, and high demands of high-performance hardware, which is challenging to apply to the general uses of coding scene. To address this limitation, inspired by explorations in image restoration, we propose an efficient and practical in-loop filtering scheme by adopting the Look-up Table (LUT). We train the DNN of in-loop filtering within a fixed filtering reference range, and cache the output values of the DNN into a LUT via traversing all possible inputs. At testing time in the coding process, the filtered pixel is generated by locating input pixels (to-be-filtered pixel with reference pixels) and interpolating cached filtered pixel values. To further enable the large filtering reference range with the limited storage cost of LUT, we introduce the enhanced indexing mechanism in the filtering process, and clipping/finetuning mechanism in the training. The proposed method is implemented into the Versatile Video Coding (VVC) reference software, VTM-11.0. Experimental results show that the ultrafast, very fast, and fast mode of the proposed method achieves on average 0.13%/0.34%/0.51%, and 0.10%/0.27%/0.39% BD-rate reduction, under the all intra (AI) and random access (RA) configurations. Especially, our method has friendly time and computational complexity, only 101%/102%-104%/108% time increase with 0.13-0.93 kMACs/pixel, and only 164-1148 KB storage cost for a single model. Our solution may shed light on the journey of practical neural network-based coding tool evolution.

9/12/2024