Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs

Read original: arXiv:2404.10700 - Published 7/16/2024 by Georgy Perevozchikov, Nancy Mehta, Mahmoud Afifi, Radu Timofte

Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs

Overview

The research paper "Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs" presents a novel approach to image signal processing (ISP) using a transformer-based model called Rawformer.
The key idea is to enable learnable and unpaired raw-to-raw image translation, which can be applied to improve traditional camera ISPs.
The paper explores the capabilities of Rawformer in tasks such as demosaicing, denoising, and white balance adjustment, demonstrating its effectiveness compared to existing methods.

Plain English Explanation

The paper describes a new way to process images captured by digital cameras. Typically, cameras have an "image signal processor" (ISP) that takes the raw image data and applies various adjustments to produce the final image you see. This ISP is usually hard-coded and not very flexible.

The researchers have developed a new approach called "Rawformer" that uses a type of artificial intelligence model called a transformer to handle the image processing tasks. The key advantage is that Rawformer can be "trained" on examples, allowing it to learn how to process images in an optimal way, rather than being limited by a fixed algorithm.

Additionally, Rawformer can perform these processing tasks without needing the "ground truth" final image. It can learn directly from the raw camera data, which makes it more flexible and easier to apply to different camera hardware and settings.

The paper shows that Rawformer can outperform traditional ISP methods in tasks like removing color artifacts, reducing image noise, and adjusting white balance. This suggests Rawformer could lead to better-quality images from digital cameras in the future.

Technical Explanation

The paper introduces a transformer-based model called Rawformer that can perform unpaired raw-to-raw image translation for learnable camera ISPs. Unlike traditional ISPs that use fixed, hand-crafted algorithms, Rawformer is a data-driven approach that can be trained on examples to learn optimal image processing.

The key innovation is the use of a transformer architecture, which has shown great success in various computer vision tasks. Rawformer takes the raw camera sensor data as input and learns to apply the necessary processing steps, such as demosaicing, denoising, and white balance adjustment, to produce the final high-quality image.

Importantly, Rawformer can be trained in an unpaired manner, meaning it does not require the ground truth, final image during training. This makes it more flexible and easier to apply to different camera hardware and settings, compared to approaches that rely on paired training data.

The paper presents extensive experiments evaluating Rawformer's performance on various tasks and datasets. The results demonstrate that Rawformer outperforms traditional ISP methods as well as other learnable ISP approaches, such as ParaISP and DSLR-Net, in terms of image quality and processing efficiency.

Critical Analysis

The paper presents a compelling approach to learnable and flexible camera ISPs using the Rawformer transformer-based model. The key strength is the ability to perform unpaired raw-to-raw image translation, which overcomes the limitations of traditional ISPs and previous learnable ISP methods that require paired training data.

One potential limitation is the computational complexity of the transformer architecture, which may pose challenges for real-time deployment on resource-constrained devices like smartphones. The paper acknowledges this and suggests exploring more efficient transformer variants or hybrid architectures as future work.

Additionally, the paper focuses on evaluating Rawformer's performance on standard image processing tasks, but it would be interesting to see how the model handles more complex scenarios, such as handling different camera sensors and settings or integrating with other computational photography techniques like multi-spectral imaging.

Overall, the Rawformer approach represents a promising step towards more flexible and learnable camera ISPs, which could lead to significant improvements in image quality and computational efficiency for a wide range of applications.

Conclusion

The "Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs" paper introduces a novel transformer-based model that enables learnable and unpaired raw-to-raw image translation for camera image signal processing (ISP). By leveraging the flexibility and representational power of transformers, Rawformer can outperform traditional ISP methods and previous learnable ISP approaches in tasks like demosaicing, denoising, and white balance adjustment.

The key advantage of Rawformer is its ability to learn optimal image processing directly from raw camera sensor data, without requiring paired ground truth images. This makes it more adaptable to different camera hardware and settings, opening up new opportunities for computational photography and imaging applications. While the computational complexity of transformers remains a consideration, the promising results in this paper suggest that Rawformer and similar learnable ISP approaches could lead to significant improvements in image quality and efficiency in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs

Georgy Perevozchikov, Nancy Mehta, Mahmoud Afifi, Radu Timofte

Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuning for each new camera model, as is often the case for nearly every module in traditional ISPs. However, the key challenge with the recent learning-based ISPs is the urge to collect large paired datasets for each distinct camera model due to the influence of intrinsic camera characteristics on the formation of input raw images. This paper tackles this challenge by introducing a novel method for unpaired learning of raw-to-raw translation across diverse cameras. Specifically, we propose Rawformer, an unsupervised Transformer-based encoder-decoder method for raw-to-raw translation. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques, and preserving a more robust correlation between the original and translated raw images. The codes and the pretrained models are available at https://github.com/gosha20777/rawformer.

7/16/2024

Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu

Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In this paper, we propose a novel pipeline, Uni-ISP, which unifies the learning of ISPs from multiple cameras, offering an accurate and versatile processor to multiple camera models. The core of Uni-ISP is leveraging device-aware embeddings through learning inverse/forward ISPs and its special training scheme. By doing so, Uni-ISP not only improves the performance of inverse/forward ISPs but also unlocks a variety of new applications inaccessible to existing learned ISPs. Moreover, since there is no dataset synchronously captured by multiple cameras for training, we construct a real-world 4K dataset, FiveCam, comprising more than 2,400 pairs of sRGB-RAW images synchronously captured by five smartphones. We conducted extensive experiments demonstrating Uni-ISP's accuracy in inverse/forward ISPs (with improvements of +1.5dB/2.4dB PSNR), its versatility in enabling new applications, and its adaptability to new camera models.

6/4/2024

ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho

RAW images are rarely shared mainly due to its excessive data size compared to their sRGB counterparts obtained by camera ISPs. Learning the forward and inverse processes of camera ISPs has been recently demonstrated, enabling physically-meaningful RAW-level image processing on input sRGB images. However, existing learning-based ISP methods fail to handle the large variations in the ISP processes with respect to camera parameters such as ISO and exposure time, and have limitations when used for various applications. In this paper, we propose ParamISP, a learning-based method for forward and inverse conversion between sRGB and RAW images, that adopts a novel neural-network module to utilize camera parameters, which is dubbed as ParamNet. Given the camera parameters provided in the EXIF data, ParamNet converts them into a feature vector to control the ISP networks. Extensive experiments demonstrate that ParamISP achieve superior RAW and sRGB reconstruction results compared to previous methods and it can be effectively used for a variety of applications such as deblurring dataset synthesis, raw deblurring, HDR reconstruction, and camera-to-camera transfer.

4/16/2024

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Ziteng Cui, Tatsuya Harada

sRGB images are now the predominant choice for pre-training visual models in computer vision research, owing to their ease of acquisition and efficient storage. Meanwhile, the advantage of RAW images lies in their rich physical information under variable real-world challenging lighting conditions. For computer vision tasks directly based on camera RAW data, most existing studies adopt methods of integrating image signal processor (ISP) with backend networks, yet often overlook the interaction capabilities between the ISP stages and subsequent networks. Drawing inspiration from ongoing adapter research in NLP and CV areas, we introduce RAW-Adapter, a novel approach aimed at adapting sRGB pre-trained models to camera RAW data. RAW-Adapter comprises input-level adapters that employ learnable ISP stages to adjust RAW inputs, as well as model-level adapters to build connections between ISP stages and subsequent high-level networks. Additionally, RAW-Adapter is a general framework that could be used in various computer vision frameworks. Abundant experiments under different lighting conditions have shown our algorithm's state-of-the-art (SOTA) performance, demonstrating its effectiveness and efficiency across a range of real-world and synthetic datasets.

8/28/2024