Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearitie

上传者：standardisation6457 2021-01-24 05:04:39上传 .PDF文件 7.65 MB 热度 13次

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities

The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over large output vocabularies.Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix factorization. However, little is known about the limitations of Linear-Softmax for quantities of practical interest such as cross entropy or mode estimation, a direction that we explore here. As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits. We theoretically investigate the rank increasing capabilities of such monotonic functions. Empirically, our method improves in two different quality metrics over the traditional Linear-Softmax layer in synthetic and real language model experiments, adding little time or memory overhead, while being comparable to the more computationally expensive mixture of Softmaxes.

通过可学习的单调点向非线性打破Softmax瓶颈

最终线性层顶部的Softmax函数是在神经网络中输出概率分布的事实上的方法。在许多应用程序中，例如语言模型或文本生成，此模型必须在较大的输出词汇表上产生分布。.. 近来，由于它与矩阵分解中的秩瓶颈有关，已经显示出有限的表示能力。但是，对于实际感兴趣的数量（如交叉熵或模态估计），我们对Linear-Softmax的局限性知之甚少，这是我们在此探索的方向。作为缓解此问题的有效解决方案，我们建议在logit上学习参数单调函数。我们从理论上研究了这种单调函数的秩递增能力。根据经验，在合成和真实语言模型实验中，我们的方法在传统的Linear-Softmax层上改进了两个不同的质量指标，增加了很少的时间或内存开销，同时可与更昂贵的Softmaxes混合物相提并论。（阅读更多）

下载地址

用户评论

更多下载

下载地址

 立即下载

用户评论

发表评论

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non_linearitie

最终线性层顶部的Softmax函数是在神经网络中输出概率分布的事实上的方法。在许多应用程序中，例如语...

大小：7.65 MB | 2021-01-24 05:04:39

BREAKING THE SOFTMAX BOTTLENECK A HIGH_RANK RNN LANGUAGE MODEL

论文：BREAKINGTHESOFTMAXBOTTLENECKAHIGH-RANKRNNLANGUA...

大小：0B | 2019-06-23 21:12:54

Multistability of Competitive Neural Networks with Non monotonic Piecewise Linea

具有非单调分段线性激活函数的竞争神经网络的多稳定性，聂小兵，曹进德，本文研究了竞争神经网络的多稳定性...

大小：187KB | 2020-07-24 05:20:27

Unsupervised Object Category Discovery via Information Bottleneck Method

Wepresentanovelapproachtoautomaticallydiscoverobje...

大小：0B | 2020-03-20 09:00:18

A_Theory_of_the_Learnable

valiant的顶级论文,奠定了他获得图灵奖的基础。论文用形式化的语言刻画了学习,回答了什么叫学习,...

大小：823KB | 2020-08-09 15:58:37

Glow_TTS A Generative Flow for Text_to_Speech via Monotonic Alignment Search

最近，已经提出了文本到语音（TTS）模型，例如FastSpeech和ParaNet，用于从文本中并行...

大小：1.16 MB | 2021-01-24 04:02:20

license pointwise

pointwise网格

大小：0B | 2020-06-14 23:13:14

Pointwise usermanual

Pointwise知道的人不用多解释～～～～～～

大小：0B | 2019-06-05 02:50:31

A Deep Non negative Matrix Factorization Approach via Autoencoder for Nonlinear

A Deep Non-negative Matrix Factorization Approach ...

大小：768KB | 2021-03-19 08:06:29

Robust Recovery of Low Rank Matrices via Non Convex Optimization

Robust Recovery of Low-Rank Matrices via Non-Conve...

大小：286KB | 2021-04-18 10:05:42

Pointwise_training

pointwise是一款应用于CFD划分高品质网格的商业软件

大小：0B | 2020-06-11 11:17:24

Pointwise简单实例

很好用的

大小：0B | 2019-09-14 23:46:15

rethinking_bottleneck_design

rethinking_bottleneck_design

大小：283.06 MB | 2021-01-22 05:36:21

HTML转义字符amp npsp表示non_breaking space xa0

Scrapy : Select tag with non-breaking space with x...

大小：140.67 KB | 2021-06-17 21:09:06

Pattern Discovery in Brain Imaging Genetics via SCCA Modeling with A Generic Non

Pattern Discovery in Brain Imaging Genetics via SC...

大小：2MB | 2021-02-23 15:56:34

Delay dependent dissipative control for a class of non linear system via Takagi

Delay-dependent dissipative control for a class of...

大小：429KB | 2021-03-15 05:08:02