1. 首页
  2. 人工智能
  3. 论文/代码
  4. Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processi

Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processi

上传者: 2021-01-24 08:20:26上传 .PDF文件 815.52 KB 热度 7次

Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video Processing

Convolutional Neural Networks with 3D kernels (3D CNNs) currently achieve state-of-the-art results in video recognition tasks due to their supremacy in extracting spatiotemporal features within video frames. There have been many successful 3D CNN architectures surpassing the state-of-the-art results successively.However, nearly all of them are designed to operate offline creating several serious handicaps during online operation. Firstly, conventional 3D CNNs are not dynamic since their output features represent the complete input clip instead of the most recent frame in the clip. Secondly, they are not temporal resolution-preserving due to their inherent temporal downsampling. Lastly, 3D CNNs are constrained to be used with fixed temporal input size limiting their flexibility. In order to address these drawbacks, we propose dissected 3D CNNs, where the intermediate volumes of the network are dissected and propagated over depth (time) dimension for future calculations, substantially reducing the number of computations at online operation. For action classification, the dissected version of ResNet models performs 74-90% fewer computations at online operation while achieving $\sim$5% better classification accuracy on the Kinetics-600 dataset than conventional 3D ResNet models. Moreover, the advantages of dissected 3D CNNs are demonstrated by deploying our approach onto several vision tasks, which consistently improved the performance.

剖析的3D CNN:用于高效在线视频处理的时间跳过连接

具有3D内核(3D CNN)的卷积神经网络由于在提取视频帧中的时空特征方面具有优势,因此目前在视频识别任务中达到了最新水平。已经有许多成功的3D CNN架构相继超过了最新技术成果。.. 但是,几乎所有这些设备都设计为可离线运行,从而在在线运行期间造成一些严重的障碍。首先,传统的3D CNN并不是动态的,因为它们的输出特征代表完整的输入片段,而不是片段中的最新帧。其次,由于其固有的时间下采样,它们不能保持时间分辨率。最后,3D CNN被限制在固定的时间输入大小下使用,从而限制了它们的灵活性。为了解决这些缺点,我们提出了解剖3D CNN,其中网络的中间体积在深度(时间)维度上进行了解剖和传播,以便将来进行计算,从而大大减少了在线操作时的计算数量。对于动作分类, 〜 与传统的3D ResNet模型相比,Kinetics-600数据集的分类精度提高了5%。此外,通过将我们的方法部署到多个视觉任务上可以证明解剖3D CNN的优势,从而不断提高性能。 (阅读更多)

下载地址
用户评论