1. 首页
  2. 人工智能
  3. 论文/代码
  4. 根本使连贯一致:测量梯度对齐的演变

根本使连贯一致:测量梯度对齐的演变

上传者: 2021-01-22 04:06:29上传 .PDF文件 1.19 MB 热度 7次

我们提出了一个新的指标( 米 -相干性)以实验性地研究训练过程中每个示例梯度的对齐方式。直观地给出样本大小 米 , 米 -相干性是样本中平均可从任何一个样本的梯度沿一小步获益的样本数。..

Making Coherence Out of Nothing At All: Measuring Evolution of Gradient Alignment

We propose a new metric ($m$-coherence) to experimentally study the alignment of per-example gradients during training. Intuitively, given a sample of size $m$, $m$-coherence is the number of examples in the sample that benefit from a small step along the gradient of any one example on average.We show that compared to other commonly used metrics, $m$-coherence is more interpretable, cheaper to compute ($O(m)$ instead of $O(m^2)$) and mathematically cleaner. (We note that $m$-coherence is closely connected to gradient diversity, a quantity previously used in some theoretical bounds.) Using $m$-coherence, we study the evolution of alignment of per-example gradients in ResNet and EfficientNet models on ImageNet and several variants with label noise, particularly from the perspective of the recently proposed Coherent Gradients (CG) theory that provides a simple, unified explanation for memorization and generalization [Chatterjee, ICLR 20]. Although we have several interesting takeaways, our most surprising result concerns memorization. Naively, one might expect that when training with completely random labels, each example is fitted independently, and so $m$-coherence should be close to 1. However, this is not the case: $m$-coherence reaches moderately high values during training (though still much smaller than real labels), indicating that over-parameterized neural networks find common patterns even in scenarios where generalization is not possible. A detailed analysis of this phenomenon provides both a deeper confirmation of CG, but at the same point puts into sharp relief what is missing from the theory in order to provide a complete explanation of generalization in neural networks.

用户评论