PNVC: Towards Practical INR-based Video Compression

Read original: arXiv:2409.00953 - Published 9/4/2024 by Ge Gao, Ho Man Kwan, Fan Zhang, David Bull

PNVC: Towards Practical INR-based Video Compression

Overview

This paper presents a practical approach to video compression using implicit neural representations (INRs)
INRs can efficiently represent video frames and enable more effective compression compared to traditional video codecs
The proposed method, called PNVC, achieves high compression ratios while maintaining good visual quality

Plain English Explanation

PNVC is a new way to compress video files that uses implicit neural representations (INRs). Traditional video codecs can struggle to efficiently represent the complex patterns and details in video frames. In contrast, INRs can capture this information very compactly.

The key idea behind PNVC is to represent each video frame as an INR, which is a mathematical function that can generate the full frame from just a few parameters. This allows the video to be encoded using much less data than traditional methods. PNVC also employs other techniques, such as leveraging temporal redundancy between frames, to further improve compression.

Importantly, PNVC is designed to be practical and scalable, overcoming limitations of previous INR-based video codecs. The researchers show that PNVC can achieve high compression ratios while preserving good visual quality, making it a promising approach for real-world video applications.

Technical Explanation

The core of PNVC is the use of implicit neural representations (INRs) to encode video frames. INRs are a compact mathematical representation that can generate a full image or video frame from just a small set of parameters. This allows for much more efficient encoding compared to traditional pixel-based approaches.

PNVC first encodes a keyframe using an INR. It then models the differences between subsequent frames as displacements from the keyframe, also represented compactly using INRs. This leverages temporal redundancy to further reduce the bitrate required to encode the video.

The authors also introduce several practical techniques to make PNVC scalable and effective:

Efficient INR architecture: A lightweight neural network is used to represent the INRs, minimizing the number of parameters required.
Adaptive quantization: The quantization of INR parameters is dynamically adjusted based on perceptual importance to balance bitrate and quality.
Multi-scale coding: The video is encoded at multiple spatial resolutions to adapt to bandwidth constraints.

Through extensive experiments, the authors demonstrate that PNVC can achieve state-of-the-art compression performance while maintaining high visual quality, outperforming traditional video codecs like H.265. This makes PNVC a promising approach for practical video compression applications.

Critical Analysis

The authors provide a thorough evaluation of PNVC, including comparisons to leading video codecs on a range of test sequences. The results clearly demonstrate the advantages of the INR-based approach for video compression.

However, the paper does not address some important practical considerations. For example, it is unclear how PNVC would scale to high-resolution or high-framerate video, or how it would perform under real-world network conditions with variable bandwidth. The computational complexity of the encoding and decoding processes is also not fully characterized.

Additionally, the authors acknowledge that PNVC currently struggles with certain types of video content, such as fast-moving scenes with complex occlusions. Further research may be needed to improve the robustness of the approach.

Despite these limitations, PNVC represents a significant step towards practical INR-based video compression. The core ideas and techniques presented in this paper could inspire further innovations in this emerging field.

Conclusion

This paper introduces PNVC, a novel video compression method that leverages the power of implicit neural representations (INRs) to achieve high compression ratios with good visual quality. By encoding video frames and their temporal differences using compact INR representations, PNVC outperforms traditional video codecs while maintaining a practical and scalable design.

The researchers have demonstrated the potential of INR-based approaches for video compression, opening up new directions for further research and development. As video content continues to grow in importance across many domains, practical solutions like PNVC could have a significant impact by enabling more efficient storage, transmission, and distribution of video data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

PNVC: Towards Practical INR-based Video Compression

Ge Gao, Ho Man Kwan, Fan Zhang, David Bull

Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.

9/4/2024

NVRC: Neural Video Representation Compression

Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed. In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation. Based on the novel entropy coding and quantization models proposed, NVRC, for the first time, is able to optimize an INR-based video codec in a fully end-to-end manner. To further minimize the additional bitrate overhead introduced by the entropy models, we have also proposed a new model compression framework for coding all the network, quantization and entropy model parameters hierarchically. Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. As far as we are aware, this is the first time an INR-based video codec achieving such performance. The implementation of NVRC will be released at www.github.com.

9/12/2024

🧠

Parameter-Efficient Instance-Adaptive Neural Video Compression

Hyunmo Yang, Seungjun Oh, Eunbyung Park

Learning-based Neural Video Codecs (NVCs) have emerged as a compelling alternative to the standard video codecs, demonstrating promising performance, and simple and easily maintainable pipelines. However, NVCs often fall short of compression performance and occasionally exhibit poor generalization capability due to inference-only compression scheme and their dependence on training data. The instance-adaptive video compression techniques have recently been suggested as a viable solution, fine-tuning the encoder or decoder networks for a particular test instance video. However, fine-tuning all the model parameters incurs high computational costs, increases the bitrates, and often leads to unstable training. In this work, we propose a parameter-efficient instance-adaptive video compression framework. Inspired by the remarkable success of parameter-efficient fine-tuning on large-scale neural network models, we propose to use a lightweight adapter module that can be easily attached to the pretrained NVCs and fine-tuned for test video sequences. The resulting algorithm significantly improves compression performance and reduces the encoding time compared to the existing instant-adaptive video compression algorithms. Furthermore, the suggested fine-tuning method enhances the robustness of the training process, allowing for the proposed method to be widely used in many practical settings. We conducted extensive experiments on various standard benchmark datasets, including UVG, MCL-JVC, and HEVC sequences, and the experimental results have shown a significant improvement in rate-distortion (RD) curves (up to 5 dB PSNR improvements) and BD rates compared to the baselines NVC. Our code is available on https://github.com/ohsngjun/PEVC}{https://github.com/ohsngjun/PEVC.

6/12/2024

Hierarchical B-frame Video Coding for Long Group of Pictures

Ivan Kirillov, Denis Parkhomenko, Kirill Chernyshev, Alexander Pletnev, Yibo Shi, Kai Lin, Dmitry Babin

Learned video compression methods already outperform VVC in the low-delay (LD) case, but the random-access (RA) scenario remains challenging. Most works on learned RA video compression either use HEVC as an anchor or compare it to VVC in specific test conditions, using RGB-PSNR metric instead of Y-PSNR and avoiding comprehensive evaluation. Here, we present an end-to-end learned video codec for random access that combines training on long sequences of frames, rate allocation designed for hierarchical coding and content adaptation on inference. We show that under common test conditions (JVET-CTC), it achieves results comparable to VTM (VVC reference software) in terms of YUV-PSNR BD-Rate on some classes of videos, and outperforms it on almost all test sets in terms of VMAF BD-Rate. On average it surpasses open LD and RA end-to-end solutions in terms of VMAF and YUV BD-Rates.

6/26/2024