Rotate to Attend: Convolutional Triplet Attention Module
Rotate to Attend: Convolutional Triplet Attention Module
Benefiting from the capability of building inter-dependencies among channels or spatial locations, attention mechanisms have been extensively studied and broadly used in a variety of computer vision tasks recently. In this paper, we investigate light-weight but effective attention mechanisms and present triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure.For an input tensor, triplet attention builds inter-dimensional dependencies by the rotation operation followed by residual transformations and encodes inter-channel and spatial information with negligible computational overhead. Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module. We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide extensive in-sight into the performance of triplet attention by visually inspecting the GradCAM and GradCAM++ results. The empirical evaluation of our method supports our intuition on the importance of capturing dependencies across dimensions when computing attention weights. Code for this paper can be publicly accessed at https://github.com/LandskapeAI/triplet-attention
旋转以参加:卷积三重注意模块
受益于在通道或空间位置之间建立相互依存的能力,注意力机制最近已被广泛研究并广泛用于各种计算机视觉任务中。在本文中,我们研究了轻量但有效的注意力机制,并提出了三重注意,这是一种通过使用三分支结构捕获跨维度交互来计算注意权重的新方法。.. 对于输入张量,三元组注意力通过旋转操作和残差变换建立维度间的依存关系,并以可忽略的计算开销对通道间和空间信息进行编码。我们的方法既简单又有效,并且可以轻松地作为附加模块插入经典骨干网。我们证明了我们的方法在各种挑战性任务中的有效性,包括ImageNet-1k上的图像分类以及MSCOCO和PASCAL VOC数据集上的目标检测。此外,我们通过肉眼检查GradCAM和GradCAM ++结果,深入了解三重态注意力的表现。对我们的方法进行的经验评估支持我们的直觉,即在计算注意力权重时捕获跨维度依赖性的重要性。 (阅读更多)