SMYRF: Efficient Attention using Asymmetric Clustering

上传者：qqpriority88383 2021-01-24 04:00:24上传 .PDF文件 4.87 MB 热度 21次

SMYRF: Efficient Attention using Asymmetric Clustering

We propose a novel type of balanced clustering algorithm to approximate attention. Attention complexity is reduced from $O(N^2)$ to $O(N \log N)$, where $N$ is the sequence length.Our algorithm, SMYRF, uses Locality Sensitive Hashing (LSH) in a novel way by defining new Asymmetric transformations and an adaptive scheme that produces balanced clusters. The biggest advantage of SMYRF is that it can be used as a drop-in replacement for dense attention layers without any retraining. On the contrary, prior fast attention methods impose constraints (e.g. queries and keys share the same vector representations) and require re-training from scratch. We apply our method to pre-trained state-of-the-art Natural Language Processing and Computer Vision models and we report significant memory and speed benefits. Notably, SMYRF-BERT outperforms (slightly) BERT on GLUE, while using $50\%$ less memory. We also show that SMYRF can be used interchangeably with dense attention before and after training. Finally, we use SMYRF to train GANs with attention in high resolutions. Using a single TPU, we were able to scale attention to 128x128=16k and 256x256=65k tokens on BigGAN on CelebA-HQ.

SMYRF：使用非对称聚类的有效注意力

我们提出了一种新型的平衡聚类算法来近似注意力。注意复杂度从 Ø（ñ2）至 Ø（ñ日志⁡ñ），在哪里 ñ 是序列长度。.. 我们的算法SMYRF通过定义新的非对称变换和产生平衡簇的自适应方案，以新颖的方式使用了局部敏感哈希（LSH）。SMYRF的最大优点是，它可以用作密集注意力层的直接替代品，而无需任何重新培训。相反，现有的快速关注方法会施加约束（例如，查询和键共享相同的矢量表示），并且需要从头开始进行重新训练。我们将我们的方法应用于经过预训练的最新自然语言处理和计算机视觉模型，并且报告了显着的内存和速度优势。值得注意的是，在使用时，SMYRF-BERT在GLUE上的表现（略）优于BERT 50％更少的内存。我们还显示，在训练前后，SMYRF可以在注意力高度集中互换使用。最后，我们使用SMYRF在高分辨率下训练GAN。使用单个TPU，我们能够将注意力扩展到CelebA-HQ上BigGAN上的128x128 = 16k和256x256 = 65k令牌上。（阅读更多）

下载地址

用户评论

更多下载

下载地址

 立即下载

用户评论

发表评论

SMYRF Efficient Attention using Asymmetric Clustering

我们提出了一种新型的平衡聚类算法来近似注意力。注意复杂度从Ø（ñ2）至Ø（ñ日志⁡ñ），在哪里ñ是序...

大小：4.87 MB | 2021-01-24 04:00:24

SMYRF_Efficient Attention using Asymmetric Clustering

我们提出了一种新型的平衡聚类算法来近似注意力。注意复杂度从Ø（ñ2）至Ø（ñ日志⁡ñ），其中N是序列...

大小：2.23 MB | 2021-01-24 03:51:52

Efficient kAnonymization Using Clustering Techniques

详细介绍了高效的K—匿名聚类技术，包括基本定义，算法等。

大小：0B | 2019-09-24 16:28:05

Efficient Clustering of HighDimensional Data Sets with...

聚类辅助算法canopy算法的初始文章，详细介绍了canopy算法的设立、规则和应用场景。

大小：0B | 2020-06-10 19:22:14

Efficient unidirectional launching of surface plasmons by a cascade asymmetric g

Efficient unidirectional launching of surface plas...

大小：1.82MB | 2021-02-08 11:10:17

smyrf

[NeurIPS 2020] Official Implementation: "SMYRF: Ef...

大小：13.86 MB | 2021-01-24 03:52:00

Face Hallucination Using Split_Attention in Split_Attention Network

近来，注意力机制已被应用于基于卷积神经网络（CNN）的超分辨率（SR）任务，以探索内部特征图的相关性...

大小：4.29 MB | 2021-01-24 07:38:05

An_efficient_k′means_clustering_algorithm

An_efficient_k′-means_clustering_algorithm

大小：0B | 2020-01-11 04:41:39

BAM A lightweight but efficient Balanced attention mechanism for super resolutio

BAM(项目有点混乱,我稍后会清理代码) 该项目是根据IDN建立的,感谢所有其他使代码易于访问的研究...

大小：8.9MB | 2021-04-23 21:05:55

Efficient Broadcasting Using Network Coding

大小：0B | 2019-01-01 15:43:23

Efficient parallel implementation of a density peaks clustering algorithm on gra

Efficient parallel implementation of a density pea...

大小：2.13MB | 2021-02-08 23:49:55

An efficient k_means clustering algorithms Analysis and implementation

讲聚类算法的，做毕业设计的外文翻译可以用

大小：0B | 2018-12-27 06:43:32

Robust ensemble clustering using probability trajectories

Robust ensemble clustering using probability traje...

大小：941KB | 2021-02-07 22:38:08

Document clustering using locality preserving indexing

利用谱聚类进行文档分类处理讲述谱聚类的基本方法和应用

大小：0B | 2019-06-01 13:36:39

speech_to_text_using_attention_mechanism源码

speech_to_text_using_attention_mechanism

大小：8KB | 2021-05-11 22:54:25

Creating An Efficient Verification Environment using Synopsy

VerificaTIon environment is for no doubt most comp...

大小：24.019 KB | 2022-10-31 11:20:30