5

在 Weka 中使用 Kmeans 时,可以在模型的结果输出上调用 getAssignments() 来获取每个给定实例的集群分配。这是一个(截断的)Jython 示例:

>>>import weka.clusterers.SimpleKMeans as kmeans
>>>kmeans.buildClusterer(data)
>>>assignments = kmeans.getAssignments()
>>>assignments
>>>array('i',[14, 16, 0, 0, 0, 0, 16,...])

每个簇号的索引对应于实例。因此,实例 0 在集群 14 中,实例 1 在集群 16 中,依此类推。

我的问题是:Xmeans 有类似的东西吗?我在这里浏览了整个 API并没有看到类似的东西。

4

1 回答 1

7

这是 Weka listserv 对我的问题的答复:

 "Not as such. But all clusterers have a clusterInstance() method. You can 
 pass each training instance through the trained clustering model to 
 obtain the cluster index for each."

这是我对这个建议的 Jython 实现:

 >>> import java.io.FileReader as FileReader
 >>> import weka.core.Instances as Instances
 >>> import weka.clusterers.XMeans as xmeans
 >>> import java.io.BufferedReader as read
 >>> import java.io.FileReader
 >>> import java.io.File
 >>> read = read(FileReader("some arff file"))
 >>> data = Instances(read)
 >>> file = FileReader("some arff file")
 >>> data = Instances(file)
 >>> xmeans = xmeans()
 >>> xmeans.setMaxNumClusters(100)  
 >>> xmeans.setMinNumClusters(2) 
 >>> xmeans.buildClusterer(data)# here's our model 
 >>> enumerated_instances = data.enumerateInstances() #get the index of each instance 
 >>> for index, instance in enumerate(enumerated_instances):
         cluster_num = xmeans.clusterInstance(instance) #pass each instance through the model
         print "instance # ",index,"is in cluster ", cluster_num #pretty print results

 instance # 0 is in cluster  1
 instance # 1 is in cluster  1
 instance # 2 is in cluster  0
 instance # 3 is in cluster  0

我将所有这些作为参考,因为可以使用相同的方法为 Weka 的任何集群器的结果获取集群分配。

于 2012-09-23T16:58:58.660 回答