Standard compliant video coding using low complexity, switchable neural wrappers

Read original: arXiv:2407.07395 - Published 7/11/2024 by Yueyu Hu, Chenhao Zhang, Onur G. Guleryuz, Debargha Mukherjee, Yao Wang

Standard Compliant Video Coding Using Low Complexity, Switchable Neural Wrappers

Overview

This paper proposes a novel approach to video coding that leverages neural networks while maintaining compliance with standard video codecs.
The key innovations include a low-complexity neural network "wrapper" that can be seamlessly integrated into existing codecs, and the ability to dynamically switch between neural and traditional coding modes.
The goal is to achieve superior video quality and compression efficiency compared to traditional codecs, while preserving compatibility with standard video players and infrastructure.

Plain English Explanation

The researchers have developed a new way to encode and compress video that uses artificial intelligence (AI) techniques, but is still compatible with the standard video formats that are widely used today. Typical video codecs, like H.264 or VP9, have a fixed set of rules for how to encode video data. The researchers have created a "wrapper" that can sit on top of these standard codecs and use AI neural networks to make the encoding more efficient, resulting in better video quality and smaller file sizes.

Crucially, this AI-powered encoding can be dynamically switched on and off, so the video player can seamlessly switch between the standard codec and the AI-enhanced version. This ensures the video will play correctly on any device, without requiring special hardware or software. The researchers claim this approach is less computationally complex than previous attempts to integrate AI into video codecs, making it practical for real-world use.

Technical Explanation

The paper introduces a "switchable neural wrapper" that can be integrated with existing video codecs to enhance their compression performance using neural networks, while maintaining compliance with the original video coding standards.

The key components of the proposed approach include:

A low-complexity neural network architecture that can be efficiently implemented alongside traditional codec modules.
Dynamic switching between neural and standard codec modes to ensure compatibility with existing video players and infrastructure.
Techniques to enable the neural network to operate on individual video frames or blocks, rather than the entire video, further reducing computational complexity.

The authors demonstrate the effectiveness of their approach through experiments comparing the video quality and compression efficiency of the neural wrapper integrated with standard codecs, such as AV1 and VVC, to the traditional codecs alone. Their results show significant improvements in PSNR and Bjontegaard-Delta Bitrate metrics.

Critical Analysis

The proposed neural wrapper approach addresses an important challenge in video coding - how to leverage the power of deep learning without breaking compatibility with existing video infrastructure. By maintaining standard codec compliance, the researchers have made their solution more practical and easier to deploy in real-world scenarios.

However, the paper does not provide a deep analysis of the tradeoffs involved in the dynamic switching mechanism between neural and traditional coding modes. It is unclear how the system determines which mode to use and how this impacts the overall performance. Additionally, the computational complexity of the neural wrapper is claimed to be low, but more detailed benchmarking against alternative neural video coding approaches would help quantify this claim.

While the experimental results are promising, further research is needed to evaluate the performance of the neural wrapper on a wider range of video content and coding standards. Comparison to state-of-the-art neural video coding methods would also provide valuable insights into the relative strengths and weaknesses of the proposed approach.

Conclusion

This paper presents a novel technique for integrating neural networks into standard video codecs to improve compression efficiency and video quality, while preserving compatibility with existing video infrastructure. The key innovation is the low-complexity, switchable neural wrapper that can dynamically adapt to the best coding mode for a given scenario.

If successfully deployed, this technology could enable significant advancements in video compression, leading to reduced bandwidth requirements and storage costs, as well as potentially improved video experiences for end-users. The ability to seamlessly integrate with existing video codecs and platforms is a crucial step towards practical adoption of AI-powered video coding in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Standard compliant video coding using low complexity, switchable neural wrappers

Yueyu Hu, Chenhao Zhang, Onur G. Guleryuz, Debargha Mukherjee, Yao Wang

The proliferation of high resolution videos posts great storage and bandwidth pressure on cloud video services, driving the development of next-generation video codecs. Despite great progress made in neural video coding, existing approaches are still far from economical deployment considering the complexity and rate-distortion performance tradeoff. To clear the roadblocks for neural video coding, in this paper we propose a new framework featuring standard compatibility, high performance, and low decoding complexity. We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video codec, to encode videos at different resolutions. The rate-distorion optimal downsampling ratio is signaled to the decoder at the per-sequence level for each target rate. We design a low complexity neural post-processor architecture that can handle different upsampling ratios. The change of resolution exploits the spatial redundancy in high-resolution videos, while the neural wrapper further achieves rate-distortion performance improvement through end-to-end optimization with a codec proxy. Our light-weight post-processor architecture has a complexity of 516 MACs / pixel, and achieves 9.3% BD-Rate reduction over VVC on the UVG dataset, and 6.4% on AOM CTC Class A1. Our approach has the potential to further advance the performance of the latest video coding standards using neural processing with minimal added complexity.

7/11/2024

↗️

Accelerating Learned Video Compression via Low-Resolution Representation Learning

Zidian Qiu, Zongyao He, Zhi Jin

In recent years, the field of learned video compression has witnessed rapid advancement, exemplified by the latest neural video codecs DCVC-DC that has outperformed the upcoming next-generation codec ECM in terms of compression ratio. Despite this, learned video compression frameworks often exhibit low encoding and decoding speeds primarily due to their increased computational complexity and unnecessary high-resolution spatial operations, which hugely hinder their applications in reality. In this work, we introduce an efficiency-optimized framework for learned video compression that focuses on low-resolution representation learning, aiming to significantly enhance the encoding and decoding speeds. Firstly, we diminish the computational load by reducing the resolution of inter-frame propagated features obtained from reused features of decoded frames, including I-frames. We implement a joint training strategy for both the I-frame and P-frame models, further improving the compression ratio. Secondly, our approach efficiently leverages multi-frame priors for parameter prediction, minimizing computation at the decoding end. Thirdly, we revisit the application of the Online Encoder Update (OEU) strategy for high-resolution sequences, achieving notable improvements in compression ratio without compromising decoding efficiency. Our efficiency-optimized framework has significantly improved the balance between compression ratio and speed for learned video compression. In comparison to traditional codecs, our method achieves performance levels on par with the low-decay P configuration of the H.266 reference software VTM. Furthermore, when contrasted with DCVC-HEM, our approach delivers a comparable compression ratio while boosting encoding and decoding speeds by a factor of 3 and 7, respectively. On RTX 2080Ti, our method can decode each 1080p frame under 100ms.

7/24/2024

PNVC: Towards Practical INR-based Video Compression

Ge Gao, Ho Man Kwan, Fan Zhang, David Bull

Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.

9/4/2024

🤿

A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath

Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 coding significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong compression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective.

4/19/2024