Causal Contextual Prediction for Learned Image Compression
Causal Contextual Prediction for Learned Image Compression
Over the past several years, we have witnessed impressive progress in the field of learned image compression. Recent learned image codecs are commonly based on autoencoders, that first encode an image into low-dimensional latent representations and then decode them for reconstruction purposes.To capture spatial dependencies in the latent space, prior works exploit hyperprior and spatial context model to build an entropy model, which estimates the bit-rate for end-to-end rate-distortion optimization. However, such an entropy model is suboptimal from two aspects: (1) It fails to capture spatially global correlations among the latents. (2) Cross-channel relationships of the latents are still underexplored. In this paper, we propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space. A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts. Furthermore, we propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points. Both these two models facilitate entropy estimation without the transmission of overhead. In addition, we further adopt a new separate attention module to build more powerful transform networks. Experimental results demonstrate that our full image compression model outperforms standard VVC/H.266 codec on Kodak dataset in terms of both PSNR and MS-SSIM, yielding the state-of-the-art rate-distortion performance.
学习图像压缩的因果上下文预测
在过去的几年中,我们见证了学习图像压缩领域的惊人进步。最近学习的图像编解码器通常基于自动编码器,该编码器首先将图像编码为低维潜在表示,然后将其解码以进行重建。.. 为了捕获潜在空间中的空间依赖性,现有技术利用超先验和空间上下文模型来构建熵模型,该熵模型估计用于端对端速率失真优化的比特率。但是,这样的熵模型在两个方面是次优的:(1)它无法捕获潜在之间的空间全局相关性。(2)潜在的跨渠道关系仍未得到开发。在本文中,我们提出了单独的熵编码的概念,以利用串行解码过程来对潜在空间中的因果上下文熵进行预测。提出了一种因果上下文模型,该模型将跨渠道的潜在者分开,并利用跨渠道的关系来生成高度信息化的上下文。此外,我们提出了一个因果全局预测模型,能够找到全局参考点以准确预测未知点。这两个模型都促进了熵估计,而没有开销的传递。此外,我们进一步采用了新的单独的关注模块来构建功能更强大的转换网络。实验结果表明,我们的完整图像压缩模型在PSNR和MS-SSIM方面都优于Kodak数据集上的标准VVC / H.266编解码器,从而产生了最新的速率失真性能。 (阅读更多)