TDAF: Top-Down Attention Framework for Vision Tasks
TDAF: Top-Down Attention Framework for Vision Tasks
Human attention mechanisms often work in a top-down manner, yet it is not well explored in vision research. Here, we propose the Top-Down Attention Framework (TDAF) to capture top-down attentions, which can be easily adopted in most existing models.The designed Recursive Dual-Directional Nested Structure in it forms two sets of orthogonal paths, recursive and structural ones, where bottom-up spatial features and top-down attention features are extracted respectively. Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner. Empirical evidence shows that our TDAF can capture effective stratified attention information and boost performance. ResNet with TDAF achieves 2.0% improvements on ImageNet. For object detection, the performance is improved by 2.7% AP over FCOS. For pose estimation, TDAF improves the baseline by 1.6%. And for action recognition, the 3D-ResNet adopting TDAF achieves improvements of 1.7% accuracy.
TDAF:视觉任务的自上而下的注意框架
人们的注意力机制通常以自上而下的方式起作用,但是在视觉研究中却没有得到很好的探索。在这里,我们提出了自上而下的注意力框架(TDAF)来捕获自上而下的注意力,可以在大多数现有模型中轻松采用。.. 设计的递归双向嵌套结构形成了两组正交路径,即递归路径和结构路径,分别提取了自下而上的空间特征和自上而下的注意力特征。这样的空间和注意力特征被深深地嵌套,因此,所提出的框架以自顶向下和自底向上的混合方式工作。经验证据表明,我们的TDAF可以捕获有效的分层注意力信息并提高绩效。具有TDAF的ResNet在ImageNet上实现了2.0%的改进。对于对象检测,性能比FCOS提高了2.7%。对于姿势估计,TDAF将基线提高了1.6%。对于动作识别,采用TDAF的3D-ResNet可以提高1.7%的准确性。 (阅读更多)