python - 使用 t-SNE 降维进行聚类

Question

问题是哪个应该首先出现：a）聚类或b）降维算法？换句话说，我可以应用像 t-SNE 这样的伪（因为它不是真的）降维方法，然后使用聚类算法来提取聚类，还是应该在原始高维空间上执行聚类并用于仅给节点着色？下面的代码是一个很好的开始方式还是我完全弄错了？

adjMat = g.get_adjacency(attribute='weight') #get the adjacency matrix from a really large graph
adjMat = np.array(adjMat.data)
adjMat = adjMat.T #use the incoming interaction vectors 
#initiate the t-SNE algorithm
tsne = manifold.TSNE() #set dimensionality reduction algorithm
manifoldCoords = tsne.fit_transform(adjMat) 
#initiate clustering algorithm
clusteralgorithm = clusterAlgs.KMeans() #set clustering algorithm
linear_clusters = clusteralgorithm.fit_predict(manifoldCoords) #extract clusters

score 3 · Accepted Answer

执行降维然后聚类总是更好。

这背后的原因是高维空间中的距离表现得很奇怪。另一个有趣的现象是最近点和最远点之间的比率接近 1。

我建议您阅读这个问题，虽然它询问欧几里得距离，但总体而言，您可以找到许多有趣的信息。

score 2 · Accepted Answer

先降维再聚类是很常见的。仅仅因为对高维数据进行聚类很困难，而降维使其更“易于处理”。

只要你不要忘记聚类本质上是不可靠的（所以不要相信结果，但要研究它们）你应该没问题。

python - 使用 t-SNE 降维进行聚类

2 回答 2

Related

Reference