Densely Guided Knowledge Distillation using Multiple Teacher Assistants

上传者：orientation7121 2021-01-24 08:37:34上传 .PDF文件 1.41 MB 热度 87次

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies have been performed to resolve the poor learning issue of the student network when the student and teacher model sizes significantly differ.In this paper, we propose a densely guided knowledge distillation using multiple teacher assistants that gradually decrease the model size to efficiently bridge the gap between teacher and student networks. To stimulate more efficient learning of the student network, we guide each teacher assistant to every other smaller teacher assistant step by step. Specifically, when teaching a smaller teacher assistant at the next step, the existing larger teacher assistants from the previous step are used as well as the teacher network to increase the learning efficiency. Moreover, we design stochastic teaching where, for each mini-batch during training, a teacher or a teacher assistant is randomly dropped. This acts as a regularizer like dropout to improve the accuracy of the student network. Thus, the student can always learn rich distilled knowledge from multiple sources ranging from the teacher to multiple teacher assistants. We verified the effectiveness of the proposed method for a classification task using Cifar-10, Cifar-100, and Tiny ImageNet. We also achieved significant performance improvements with various backbone architectures such as a simple stacked convolutional neural network, ResNet, and WideResNet.

使用多个老师助理进行密集指导的知识蒸馏

随着深度神经网络的成功，正在积极研究指导从大型教师网络学习小型学生网络的知识提炼，以进行模型压缩和转移学习。但是，当学生和老师的模型大小明显不同时，很少有研究可以解决学生网络学习效果差的问题。.. 在本文中，我们提出了使用多个助教的密集指导知识提炼，这些助教逐渐减小模型的大小，以有效地弥合师生网络之间的差距。为了促进学生网络的更有效学习，我们会逐步指导每个助教到其他每个较小的助教。具体而言，当在下一步中教较小的助教时，将使用上一步中现有的较大的助教以及教师网络来提高学习效率。此外，我们设计了随机教学，在训练过程中，对于每个小批量，随机分配一名教师或助教。这可以像丢包一样充当正则化器，以提高学生网络的准确性。从而，学生可以始终从多个来源（从老师到多个助教）学习丰富的提炼知识。我们验证了使用Cifar-10，Cifar-100和Tiny ImageNet进行分类任务的方法的有效性。我们还通过各种骨干架构（例如简单的堆叠式卷积神经网络，ResNet和WideResNet）实现了显着的性能改进。（阅读更多）

下载地址

用户评论

更多下载

下载地址

 立即下载

用户评论

发表评论

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

随着深度神经网络的成功，正在积极研究指导从大型教师网络学习小型学生网络的知识提炼，以进行模型压缩和转...

大小：1.41 MB | 2021-01-24 08:37:34

sequencelevel knowledge distillation

sequence-levelknowledgedistillation

大小：0B | 2020-02-16 22:08:31

Efficient Image Inpainting with Knowledge Distillation

基于知识蒸馏的高效图像修复，成楚璇，沈奇威，近年来，深度学习在图像分类、识别、分割和生成领域都取得了...

大小：1.01MB | 2020-07-20 14:49:19

Structured Attention Knowledge Distillation for Lightweight Netw

知识蒸馏是通过设计的损失函数将教师网络学到的有效知识转移到学生网络中帮助学生网络以更低的计算成本获得...

大小：3.94MB | 2023-01-19 05:48:47

Online_Knowledge_Distillation_via_Collaborative_Learning

A simple reimplement Online Knowledge Distillation...

大小：44.73 KB | 2021-01-22 05:55:13

Augmenting Knowledge Distillation with Peer to Peer Mutual Learn

知识蒸馏KD是一种有效的模型压缩技术是教授紧凑的学生网络来模仿复杂且训练有素的教师网络的行为.相比之...

大小：2.77MB | 2023-01-30 10:17:52

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

如今，深度神经网络（DNN）在解决自然语言处理，语音识别和计算机视觉中的许多具有挑战性的AI任务方面...

大小：2.21 MB | 2021-01-24 06:19:34

2018_knowledge_guided_nlp_en.pdf

刘知远老师知识指导的自然语言处理演讲PPT,利用自然语言知识库提升自然语言处理任务效果。 Knowl...

大小：2.52MB | 2020-08-22 20:10:34

Learning to Augment for Data_Scarce Domain BERT Knowledge Distillation

尽管诸如BERT之类的经过预训练的语言模型在各种自然语言处理任务中均取得了令人满意的性能，但要在实时...

大小：738.39 KB | 2021-01-24 09:17:47

2019_knowledge_guided_nlp_cn.pdf

本文介绍清华大学刘知远老师的《知识指导的自然语言处理》,大致内容如下: 自然语言处理与AI 数据驱动...

大小：10.24MB | 2020-08-22 20:10:33

Multiple Diffractions of Guided Optical Waves with Multifrequency Magnetostatic

The multifrequency diffraction effects of guided o...

大小：153KB | 2021-04-07 09:25:57

Multiple comparison using R

thisbookisforstatisticsteachersandstudents，undergr...

大小：0B | 2019-09-09 02:54:12

Guided wave tomography for multiple defects identification in platelike structur

板类结构中基于超声导波的多损伤定位方法，曾亮，林京，概率重构损伤检测算法（RAPID）是结构中对关键...

大小：0B | 2020-02-24 16:32:20

Cluster Ensembles–A Knowledge Reuse Framework for Combining Multiple Partition

这是一篇关于聚类融合的PPT，首次给聚类融合下了一个明确的定义。很不错哦～～

大小：0B | 2020-05-15 07:37:18

Localization of multiple disjoint sources with prior knowledge on source locatio

Sensor location errors are known to be able to deg...

大小：890KB | 2021-04-07 01:43:41

Extending Knowledge Bases using images笔记

文章来源: 1. Abstract Traditional knowledge base gener...

大小：594KB | 2021-01-15 08:52:56