Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning
Online Bag-of-Visual-Words Generation for Unsupervised Representation Learning
Learning image representations without human supervision is an important and active research field. Several recent approaches have successfully leveraged the idea of making such a representation invariant under different types of perturbations, especially via contrastive-based instance discrimination training.Although effective visual representations should indeed exhibit such invariances, there are other important characteristics, such as encoding contextual reasoning skills, for which alternative reconstruction-based approaches might be better suited. With this in mind, we propose a teacher-student scheme to learn representations by training a convnet to reconstruct a bag-of-visual-words (BoW) representation of an image, given as input a perturbed version of that same image. Our strategy performs an online training of both the teacher network (whose role is to generate the BoW targets) and the student network (whose role is to learn representations), along with an online update of the visual-words vocabulary (used for the BoW targets). This idea effectively enables fully online BoW-guided unsupervised learning. Extensive experiments demonstrate the interest of our BoW-based strategy which surpasses previous state-of-the-art methods (including contrastive-based ones) in several applications. For instance, in downstream tasks such Pascal object detection, Pascal classification and Places205 classification, our method improves over all prior unsupervised approaches, thus establishing new state-of-the-art results that are also significantly better even than those of supervised pre-training. We provide the implementation code at https://github.com/valeoai/obow.
在线视觉袋词生成,用于无监督表示学习
在没有人工监督的情况下学习图像表示是一个重要而活跃的研究领域。几种最新的方法已经成功地利用了使这种表示在不同类型的摄动下不变的想法,特别是通过基于对比的实例判别训练。.. 尽管有效的视觉表示确实应该表现出这种不变性,但是还有其他重要的特征,例如编码上下文推理技能,基于此的替代重建方法可能会更适合。考虑到这一点,我们提出了一种师生计划,通过训练卷积网络来重建图像的视觉词袋(BoW)表示来学习表示,该输入是该图像的扰动版本。我们的策略对教师网络(其作用是生成BoW目标)和学生网络(其作用是学习表示)进行在线培训,以及视觉单词词汇的在线更新(用于BoW)目标)。这个想法有效地实现了完全在线的BoW指导的无监督学习。广泛的实验证明了我们基于BoW的策略的兴趣,该策略在一些应用中超过了以前的最新方法(包括基于对比的方法)。例如,在诸如Pascal对象检测,Pascal分类和Places205分类之类的下游任务中,我们的方法对所有先前的非监督方法进行了改进,从而建立了最新技术,甚至比监督预训练的结果还要好得多。 。我们在https://github.com/valeoai/obow提供了实现代码。我们的方法对所有先前的无监督方法进行了改进,从而建立了最新的结果,甚至比有监督的预训练也要好得多。我们在https://github.com/valeoai/obow提供了实现代码。我们的方法对所有先前的无监督方法进行了改进,从而建立了最新的结果,甚至比有监督的预训练也要好得多。我们在https://github.com/valeoai/obow提供了实现代码。 (阅读更多)