Predicting Training Time Without Training

Name: Predicting Training Time Without Training
Rating: 4.5 (52 reviews)
Author: hospitable_26882

上传者：hospitable_26882 2021-01-24 08:55:49上传 .PDF文件 2.02 MB 热度 52次

Predicting Training Time Without Training

We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function. To do so, we leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.This allows us to approximate the training loss and accuracy at any point during training by solving a low-dimensional Stochastic Differential Equation (SDE) in function space. Using this result, we are able to predict the time it takes for Stochastic Gradient Descent (SGD) to fine-tune a model to a given loss without having to perform any training. In our experiments, we are able to predict training time of a ResNet within a 20% error margin on a variety of datasets and hyper-parameters, at a 30 to 45-fold reduction in cost compared to actual training. We also discuss how to further reduce the computational and memory cost of our method, and in particular we show that by exploiting the spectral properties of the gradients' matrix it is possible predict training time on a large dataset while processing only a subset of the samples.

无需培训即可预测培训时间

我们解决了预测预训练的深度网络收敛到损失函数给定值所需的优化步骤数的问题。为此，我们利用了以下事实：在微调过程中，深层网络的训练动力学可以很好地被线性模型的训练动力学近似。.. 这使我们能够通过求解函数空间中的低维随机微分方程（SDE）来估计训练过程中任意点的训练损失和准确性。使用此结果，我们可以预测随机梯度下降（SGD）将模型微调到给定损失所需的时间，而无需执行任何训练。在我们的实验中，我们能够预测ResNet在各种数据集和超参数上误差在20％以内的训练时间，与实际训练相比，其成本降低了30到45倍。我们还讨论了如何进一步降低我们方法的计算和存储成本，尤其是表明，通过利用梯度矩阵的光谱特性，可以在仅处理一部分样本的情况下预测大型数据集上的训练时间。（阅读更多）

下载地址

用户评论

更多下载

下载地址

 立即下载

用户评论

发表评论

Predicting Training Time Without Training

我们解决了预测预训练的深度网络收敛到损失函数给定值所需的优化步骤数的问题。为此，我们利用了以下事实：...

大小：2.02 MB | 2021-01-24 08:55:49

Time Management Training

我们作为一个高新科技企业的员工，要步上职业化的道路，成为一个强调实效性的职业人士，不应该把以上原因当...

大小：0B | 2019-08-17 23:00:51

Training Neural Networks without Gradients

Withthegrowingimportanceoflargenetworkmodelsandeno...

大小：0B | 2020-03-28 09:08:25

Deep Neural Network Training without Multiplications

深度神经网络真的需要乘法吗？在这里，我们建议仅使用整数加法指令代替浮点乘法指令，将两个IEEE754...

大小：187.04 KB | 2021-01-24 06:18:57

Kazan training TAKT time VS cycle time

takttimevscycletimedefinition

大小：0B | 2019-08-17 23:00:47

KNX Training Requirements for KNX Training Centres

ItisindispensablethataKNXinstallationisproperlypro...

大小：0B | 2020-01-14 12:57:06

BA Training

为希望了解需求分析的同学们尽一点绵薄之力。上述内容是在参加两天全天培训时,边听边写的,所以可能会有各...

大小：39KB | 2021-02-01 23:22:05

ciscotcl training

cisco tcl book

大小：2.32MB | 2020-10-19 16:20:28

cobol training

cobol study, 是用于初学者

大小：349KB | 2021-04-23 17:30:57

SystemC Training

深亚微米半导体技术的进展与成熟使复杂的片上系统（SoC）设计变得越来越普遍，同时对传统的ASIC设计...

大小：4.1019 MB | 2022-10-31 15:56:47

ARM TRAINING

arm m3 training , 3-days

大小：45.39MB | 2020-09-16 15:31:36

training sep

for training, shared for self.

大小：19.16MB | 2020-09-17 07:52:01

informatica training

informaticatraining

大小：0B | 2019-09-28 15:39:14

hypermesh training

介绍了HYPERMESH的基础操作过程，有利于初学者上手。

大小：0B | 2019-09-24 17:48:49

SQL Training

oraclesqltraining初学者学习

大小：0B | 2019-09-08 21:34:41

BO Training

BOBaictraining.ItteachesyouhowtouseBO

大小：0B | 2019-09-05 12:43:34