1. 首页
  2. 人工智能
  3. 论文/代码
  4. Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearitie

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearitie

上传者: 2021-01-24 05:04:39上传 .PDF文件 7.65 MB 热度 10次

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over large output vocabularies.Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix factorization. However, little is known about the limitations of Linear-Softmax for quantities of practical interest such as cross entropy or mode estimation, a direction that we explore here. As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits. We theoretically investigate the rank increasing capabilities of such monotonic functions. Empirically, our method improves in two different quality metrics over the traditional Linear-Softmax layer in synthetic and real language model experiments, adding little time or memory overhead, while being comparable to the more computationally expensive mixture of Softmaxes.

通过可学习的单调点向非线性打破Softmax瓶颈

最终线性层顶部的Softmax函数是在神经网络中输出概率分布的事实上的方法。在许多应用程序中,例如语言模型或文本生成,此模型必须在较大的输出词汇表上产生分布。.. 近来,由于它与矩阵分解中的秩瓶颈有关,已经显示出有限的表示能力。但是,对于实际感兴趣的数量(如交叉熵或模态估计),我们对Linear-Softmax的局限性知之甚少,这是我们在此探索的方向。作为缓解此问题的有效解决方案,我们建议在logit上学习参数单调函数。我们从理论上研究了这种单调函数的秩递增能力。根据经验,在合成和真实语言模型实验中,我们的方法在传统的Linear-Softmax层上改进了两个不同的质量指标,增加了很少的时间或内存开销,同时可与更昂贵的Softmaxes混合物相提并论。 (阅读更多)

下载地址
用户评论