1. 首页
  2. 人工智能
  3. 论文/代码
  4. SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

上传者: 2021-01-24 09:09:11上传 .PDF文件 2.10 MB 热度 18次

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection).The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search. Using similar building blocks, SpineNet models outperform ResNet-FPN models by ~3% AP at various scales while using 10-20% fewer FLOPs. In particular, SpineNet-190 achieves 52.5% AP with a MaskR-CNN detector and achieves 52.1% AP with a RetinaNet detector on COCO for a single model without test-time augmentation, significantly outperforms prior art of detectors. SpineNet can transfer to classification tasks, achieving 5% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset. Code is at: https://github.com/tensorflow/tpu/tree/master/models/official/detection.

SpineNet:学习按比例排列的骨干以进行识别和本地化

卷积神经网络通常将输入图像编码为分辨率降低的一系列中间特征。尽管此结构适用于分类任务,但是对于需要同时识别和定位(例如,对象检测)的任务而言,它的性能并不理想。.. 提出了编码器-解码器体系结构,以通过将解码器网络应用于为分类任务设计的骨干模型来解决此问题。在本文中,我们认为编码器/解码器体系结构由于主干的规模减小而无法有效生成强大的多尺度特征。我们提出了SpineNet,这是一个具有比例排列的中间特征和跨比例连接的主干,可以通过神经体系结构搜索在对象检测任务中学习。使用相似的构建块,SpineNet模型在各种规模上的性能比ResNet-FPN模型高出约3%,而使用的FLOP则减少了10-20%。特别是,对于单个模型,SpineNet-190在不增加测试时间的情况下,使用MaskR-CNN检测器可达到52.5%的AP,而在COCO上使用RetinaNet检测器可达到52.1%的AP,大大优于检测器的现有技术。SpineNet可以转移到分类任务,在具有挑战性的iNaturalist细粒度数据集上将top-1准确性提高5%。代码位于:https://github.com/tensorflow/tpu/tree/master/models/official/detection。 (阅读更多)

用户评论