BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Read original: arXiv:2407.03535 - Published 7/30/2024 by Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull
Total Score

0

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • The paper introduces a new dataset called BVI-RLV for low-light video enhancement.
  • The dataset contains registered low-light videos and their corresponding high-quality reference frames.
  • The paper also provides benchmarks and evaluation metrics for low-light video enhancement algorithms.

Plain English Explanation

The paper introduces a new dataset called BVI-RLV that can be used to train and evaluate algorithms for enhancing low-light videos. Low-light videos are often blurry, noisy, and hard to see clearly. The BVI-RLV dataset provides registered low-light videos along with high-quality reference frames that show what the videos would look like if they were captured in good lighting conditions.

By having this dataset, researchers and developers can use it to train AI models to take low-light videos and make them look much clearer and brighter, without losing important details. The paper also provides standard benchmarks and evaluation metrics that can be used to compare the performance of different low-light video enhancement algorithms.

This is important because being able to see clearly in low-light conditions has many real-world applications, such as security cameras, self-driving cars, and night-time photography. The BVI-RLV dataset and benchmarks provide a valuable resource for advancing the state-of-the-art in this area of computer vision and image processing.

Technical Explanation

The BVI-RLV dataset contains 200 low-light video sequences captured using a high-quality professional camera. Each low-light video is registered and aligned with a corresponding high-quality reference frame, which represents what the video would look like in good lighting conditions.

The authors also provide several benchmark tasks and evaluation metrics for assessing the performance of low-light video enhancement algorithms. These include:

  • Subjective Evaluation: Measuring human perception of enhanced video quality.
  • Objective Evaluation: Calculating metrics like PSNR, SSIM, and LPIPS to quantify the similarity between enhanced videos and reference frames.
  • Consistency Evaluation: Evaluating the temporal consistency of enhanced videos.

The paper presents a new spatio-temporal aligned SUNet model that leverages the BVI-RLV dataset to perform low-light video enhancement. The model takes a low-light video as input and outputs an enhanced version that is closer to the high-quality reference.

Critical Analysis

The BVI-RLV dataset and benchmarks represent a valuable contribution to the field of low-light video enhancement. By providing a standardized dataset and evaluation protocols, the paper enables more rigorous and comparable research in this area.

However, one potential limitation of the dataset is that it only contains 200 video sequences, which may not be sufficient to capture the full diversity of low-light conditions encountered in real-world scenarios. Additionally, the dataset is focused on videos captured with a professional camera, and the performance of algorithms on consumer-grade cameras may differ.

The paper also does not explore the robustness of the proposed SUNet model to factors like camera motion, object occlusion, or scene complexity. Further research is needed to understand the generalization capabilities of low-light video enhancement algorithms.

Conclusion

The BVI-RLV dataset and benchmarks provide a valuable resource for advancing the state-of-the-art in low-light video enhancement. By enabling more rigorous and comparable research in this area, the paper has the potential to drive significant improvements in the ability to see clearly in challenging lighting conditions. This has important implications for a wide range of real-world applications, from security and surveillance to autonomous vehicles and night-time photography.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement
Total Score

0

BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement

Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull

Low-light videos often exhibit spatiotemporal incoherent noise, compromising visibility and performance in computer vision applications. One significant challenge in enhancing such content using deep learning is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels. We provide benchmarks based on four different technologies: convolutional neural networks, transformers, diffusion models, and state space models (mamba). Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets. Our dataset and links to benchmarks are publicly available at https://doi.org/10.21227/mzny-8c77.

Read more

7/30/2024

BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement
Total Score

0

BVI-Lowlight: Fully Registered Benchmark Dataset for Low-Light Video Enhancement

Nantheera Anantrasirichai, Ruirui Lin, Alexandra Malyugina, David Bull

Low-light videos often exhibit spatiotemporal incoherent noise, leading to poor visibility and compromised performance across various computer vision applications. One significant challenge in enhancing such content using modern technologies is the scarcity of training data. This paper introduces a novel low-light video dataset, consisting of 40 scenes captured in various motion scenarios under two distinct low-lighting conditions, incorporating genuine noise and temporal artifacts. We provide fully registered ground truth data captured in normal light using a programmable motorized dolly, and subsequently, refine them via image-based post-processing to ensure the pixel-wise alignment of frames in different light levels. This paper also presents an exhaustive analysis of the low-light dataset, and demonstrates the extensive and representative nature of our dataset in the context of supervised learning. Our experimental results demonstrate the significance of fully registered video pairs in the development of low-light video enhancement methods and the need for comprehensive evaluation. Our dataset is available at DOI:10.21227/mzny-8c77.

Read more

5/28/2024

EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More
Total Score

0

EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More

Kanghao Chen, Guoqiang Liang, Hangyu Li, Yunfan Lu, Lin Wang

Event cameras offer significant advantages for low-light video enhancement, primarily due to their high dynamic range. Current research, however, is severely limited by the absence of large-scale, real-world, and spatio-temporally aligned event-video datasets. To address this, we introduce a large-scale dataset with over 30,000 pairs of frames and events captured under varying illumination. This dataset was curated using a robotic arm that traces a consistent non-linear trajectory, achieving spatial alignment precision under 0.03mm and temporal alignment with errors under 0.01s for 90% of the dataset. Based on the dataset, we propose textbf{EvLight++}, a novel event-guided low-light video enhancement approach designed for robust performance in real-world scenarios. Firstly, we design a multi-scale holistic fusion branch to integrate structural and textural information from both images and events. To counteract variations in regional illumination and noise, we introduce Signal-to-Noise Ratio (SNR)-guided regional feature selection, enhancing features from high SNR regions and augmenting those from low SNR regions by extracting structural information from events. To incorporate temporal information and ensure temporal coherence, we further introduce a recurrent module and temporal loss in the whole pipeline. Extensive experiments on our and the synthetic SDSD dataset demonstrate that EvLight++ significantly outperforms both single image- and video-based methods by 1.37 dB and 3.71 dB, respectively. To further explore its potential in downstream tasks like semantic segmentation and monocular depth estimation, we extend our datasets by adding pseudo segmentation and depth labels via meticulous annotation efforts with foundation models. Experiments under diverse low-light scenes show that the enhanced results achieve a 15.97% improvement in mIoU for semantic segmentation.

Read more

8/30/2024

A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement
Total Score

0

A Spatio-temporal Aligned SUNet Model for Low-light Video Enhancement

Ruirui Lin, Nantheera Anantrasirichai, Alexandra Malyugina, David Bull

Distortions caused by low-light conditions are not only visually unpleasant but also degrade the performance of computer vision tasks. The restoration and enhancement have proven to be highly beneficial. However, there are only a limited number of enhancement methods explicitly designed for videos acquired in low-light conditions. We propose a Spatio-Temporal Aligned SUNet (STA-SUNet) model using a Swin Transformer as a backbone to capture low light video features and exploit their spatio-temporal correlations. The STA-SUNet model is trained on a novel, fully registered dataset (BVI), which comprises dynamic scenes captured under varying light conditions. It is further analysed comparatively against various other models over three test datasets. The model demonstrates superior adaptivity across all datasets, obtaining the highest PSNR and SSIM values. It is particularly effective in extreme low-light conditions, yielding fairly good visualisation results.

Read more

7/15/2024