1

I have the following dataset:

firm_id firm_id_
1         2
1         4
1         5
2         1
2         3
3         2
3         6
4         1
4         5
4         6
5         4
5         7
6         3
...

This data says for exampe that firm_id = 1 is directly connected to firm_id = 2, 4, and 5 and indirectly connected (within two paths) to firm_id = 3, 6, and 7. I can use some Python package like networkx to build the network of firm's connectivity. Now, I want to use Spectral Clustering (I guess this the correct methodology) to form clusters based on distance (number of edges separating each firm) and see how these clusters are connected to each other.

I would first define an adjacency matrix W of the above data. I then use enter image description here where dist is the distance between firm i and firm j, and c is a scale parameter to each element in W and then compute the Laplacian matrix (see here for example).

Now my question is can Spectral Clustering give me the link between each clusters and how far apart are the clusters (how many edges separate the clusters)? I was thinking to use this, the scikit package in Python but I have no idea how I can generate the links between clusters using sklearn.cluster.

4

2 回答 2

2

社区检测网络是我需要的:

http://perso.crans.org/aynaud/communities/

于 2014-04-17T12:38:10.553 回答
1

要使谱聚类和这些方法正常工作,您需要具有相似性

您的数据似乎只是一个图表,即连接实例的边。您应该查看图分区,也许还有仅适用于图结构的社区检测算法,并且不要假设您有连续的相似性度量。

于 2014-04-18T09:02:55.833 回答