Objective-Based Hierarchical Clustering of Deep Embedding Vectors

上传者：expansive3736 2021-01-24 06:05:25上传 .PDF文件 446.64 KB 热度 57次

Objective-Based Hierarchical Clustering of Deep Embedding Vectors

We initiate a comprehensive experimental study of objective-based hierarchical clustering methods on massive datasets consisting of deep embedding vectors from computer vision and NLP applications. This includes a large variety of image embedding (ImageNet, ImageNetV2, NaBirds), word embedding (Twitter, Wikipedia), and sentence embedding (SST-2) vectors from several popular recent models (e.g. ResNet, ResNext, Inception V3, SBERT).Our study includes datasets with up to $4.5$ million entries with embedding dimensions up to $2048$. In order to address the challenge of scaling up hierarchical clustering to such large datasets we propose a new practical hierarchical clustering algorithm B++&C. It gives a 5%/20% improvement on average for the popular Moseley-Wang (MW) / Cohen-Addad et al. (CKMM) objectives (normalized) compared to a wide range of classic methods and recent heuristics. We also introduce a theoretical algorithm B2SAT&C which achieves a $0.74$-approximation for the CKMM objective in polynomial time. This is the first substantial improvement over the trivial $2/3$-approximation achieved by a random binary tree. Prior to this work, the best poly-time approximation of $\approx 2/3 + 0.0004$ was due to Charikar et al. (SODA'19).

深度嵌入向量的基于目标的层次聚类

我们在包含来自计算机视觉和NLP应用程序的深层嵌入向量的海量数据集上启动了基于目标的层次聚类方法的综合实验研究。这包括来自最近流行的几种模型（例如ResNet，ResNext，Inception V3，SBERT）的各种图像嵌入（ImageNet，ImageNetV2，NaBirds），单词嵌入（Twitter，Wikipedia）和句子嵌入（SST-2）向量。.. 我们的研究包括多达 4.5 嵌入尺寸达百万的条目 2048 。为了解决将分层聚类扩展到如此大的数据集的挑战，我们提出了一种新的实用的分层聚类算法B ++＆C。与流行的Moseley-Wang（MW）/ Cohen-Addad等人相比，该方法平均提高了5％/ 20％。（CKMM）目标（标准化）与各种经典方法和最新启发式方法进行比较。我们还介绍了一种理论算法B2SAT＆C，该算法可实现 0.74 -多项式时间内CKMM目标的近似值。这是对微不足道的第一个重大改进 2/3 -由随机二叉树实现的逼近。在进行这项工作之前，最好采用 ≈2/3+0.0004 是由于Charikar等人。（SODA'19）。（阅读更多）

下载地址

用户评论

更多下载

下载地址

 立即下载

用户评论

发表评论

Objective_Based Hierarchical Clustering of Deep Embedding Vectors

我们在包含来自计算机视觉和NLP应用程序的深层嵌入向量的海量数据集上启动了基于目标的层次聚类方法的综...

大小：446.64 KB | 2021-01-24 06:05:25

A potential based clustering method with hierarchical optimization

A potential-based clustering method with hierarchi...

大小：1.64MB | 2021-02-27 05:47:41

Multi modal Deep Embedding via Hierarchical Grounded Compositional Semantics

Multi-modal Deep Embedding via Hierarchical Ground...

大小：2MB | 2021-02-22 08:05:55

An improved Software Refactoring Method Based on Hierarchical Clustering Algorit

An improved Software Refactoring Method Based on H...

大小：240KB | 2021-02-09 09:21:57

A negative selection algorithm based on hierarchical clustering of self set

A negative selection algorithm based on hierarchic...

大小：1.13MB | 2021-02-23 22:05:13

Kmeans and Hierarchical Clustering

AndrewW.Moore的课件K-meansandHierarchicalClustering简介...

大小：0B | 2019-09-15 00:54:14

论文研究Density Clustering Pruning Method Based on Reconstructed Support Vectors for

一种基于重新构造支持向量的密度聚类剪枝稀疏算法，司刚全，石建全，最小二乘支持向量机是通过解线性方程组...

大小：0B | 2020-04-18 14:40:06

sparse manifold clustering and embedding

文中提出了一种基于多重非线性流形数据的降维方法

大小：0B | 2019-07-23 11:28:46

Hierarchical clustering analysis.m

基于主成分的标准欧式距离和最短距离法，绘制聚类谱图，添加样本标签

大小：0B | 2019-06-21 13:22:41

层次聚类hierarchical-clustering

大小：0B | 2019-01-15 20:52:07

Knowledge Graph Embedding with Hierarchical Relation Structure

大小：0B | 2019-01-02 12:31:23

Single Channel Speech Separation Based on Deep Clustering with Local Optimizatio

Single-Channel Speech Separation Based on Deep Clu...

大小：512KB | 2021-03-25 09:21:05

FaceNet A Unified Embedding for Face Recognition and Clustering

一种统一的人脸识别与聚类嵌入算法

大小：14.12MB | 2020-12-17 15:21:16

Hierarchical Clustering for Unstructured Volumetric Scalar Fields

大小：0B | 2018-12-07 22:02:02

机器学习层次聚类hierarchical clustering

关于层次聚类(hierarchicalclustering)的基本步骤：1、假设每个样本为一类，计算...

大小：0B | 2019-09-26 05:58:24

A Link Clustering Based Approach for Clustering Categorical Data

基于链接聚类的符号属性聚类，何增友，XuXiaofei，Categoricaldatacluster...

大小：0B | 2020-05-30 19:25:41