1. 首页
  2. 人工智能
  3. 论文/代码
  4. Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model I

Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model I

上传者: 2021-01-24 05:08:31上传 .PDF文件 606.73 KB 热度 13次

Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

We study in this paper how to initialize the parameters of multinomiallogistic regression (a fully connected layer followed with softmax and crossentropy loss), which is widely used in deep neural network (DNN) models forclassification problems. As logistic regression is widely known not having aclosed-form solution, it is usually randomly initialized, leading to severaldeficiencies especially in transfer learning where all the layers except forthe last task-specific layer are initialized using a pre-trained model.Thedeficiencies include slow convergence speed, possibility of stuck in localminimum, and the risk of over-fitting. To address those deficiencies, we firststudy the properties of logistic regression and propose a closed-formapproximate solution named regularized Gaussian classifier (RGC). Then we adoptthis approximate solution to initialize the task-specific linear layer anddemonstrate superior performance over random initialization in terms of bothaccuracy and convergence speed on various tasks and datasets. For example, forimage classification, our approach can reduce the training time by 10 times andachieve 3.2% gain in accuracy for Flickr-style classification. For objectdetection, our approach can also be 10 times faster in training for the sameaccuracy, or 5% better in terms of mAP for VOC 2007 with slightly longertraining.

重新审视深度学习中的多项式Logistic回归:用于图像识别的数据相关模型初始化

本文研究了如何初始化多项式Logistic回归的参数(全连接层,后跟softmax和交叉熵损失),该参数已广泛用于深度神经网络(DNN)模型中的分类问题。由于众所周知的逻辑回归没有封闭形式的解决方案,因此通常会对其进行随机初始化,从而导致一些缺陷,尤其是在迁移学习中,其中使用最后训练的模型初始化除最后一个特定于任务的层以外的所有层。.. 缺陷包括收敛速度慢,卡在局部最小值中的可能性以及过度拟合的风险。为了解决这些不足,我们首先研究了逻辑回归的性质,并提出了一个封闭的近似解,称为正则化高斯分类器(RGC)。然后,我们采用这种近似解决方案来初始化特定于任务的线性层,并在各种任务和数据集的准确性和收敛速度方面都表现出优于随机初始化的性能。例如,对于图像分类,我们的方法可以将训练时间减少10倍,并且在Flickr样式分类中的准确性提高3.2%。对于目标检测,我们的方法在相同精度下的训练速度也可以快10倍,而在VOC 2007中,通过稍长的训练,其mAP可以提高5%。 (阅读更多)

下载地址
用户评论