python - Python：确定k-means聚类可能性最高的质心数组（scipy）

Question

我正在使用 k-means 的 scipy 实现，需要一种方法来循环计算 n 次，记录每个循环的质心输出数组，并计算可能性最高的输出。编辑我设置 k = 4，所以每个解决方案都有 4 个元素。我需要确定最常出现的质心数组（即 4 个元素的集合）。

我的质心数组看起来像：

[[ 75]
 [115]
 [163]
 [ 16]]

在手动运行代码时，由于 k-means 的随机属性，出现了 4-6 个解决方案。本质上，我想计算每个数组在 n 上的出现次数并返回最可能的数组。

编辑澄清，基于 Jblasco 对问题的解释。

每次算法运行时，它都会返回一个与上面类似的质心数组。运行算法 3 次我会得到类似的东西：

[[ 75]  [[ 73]  [[ 75]
 [115]   [112]   [115]
 [163]   [167]   [163]
 [ 16]], [ 14]], [ 16]]

我想完成两件主要的事情：

1) 循环产生这些质心的代码
2) 确定最可能的（频繁的、常见的）解决方案，在这种情况下是：

[[ 75]
 [115]
 [163]
 [ 16]]

score 0 · Accepted Answer

如果我理解正确，您将得到一个看起来像您显示的数组的 n 次，并且您想计算数组中 75 的次数、数组中的 115 的次数等等。如果这是正确的，我会想到类似的东西：

keep_count = {}
for attempt in range(n):
    get_one_of_those_arrays  <-- sorry, no idea how the function of k-means work
    for result in array:        
        if keep_count.has_key(result):
            keep_count[result] +=1
        else:
            keep_count[result] = 1

现在 keep_count 拥有每个结果数字的所有出现次数。一种稍微优雅的方式使用 defaultdict：

    import collections
    keep_count = collections.defaultdict(int)  # initializes dict to int 0, no more if
    for attempt in range(n):
        get_one_of_those_arrays
        for result in array:
            keep_count[result] += 1

score 0 · Accepted Answer

好的，现在我明白你的意思了，让我们重新开始，好吗？类似问题的类似思维方式。我承认这不是最优雅的可能性，但我认为它有效且直观：

keep_count = {}
for attempt in range(n):
    array = get_one_of_those_arrays  <-- sorry, no idea how the function of k-means work
    array = tuple(array.reshape(4))
    if keep_count.has_key(array):
        keep_count[array] +=1
    else:
        keep_count[result] = 1

现在我们只需要找到最大值的位置：

max_value = max(keep_count.values())
max_pos = keep_count.values().index(max_value)
most_frequent = keep_count.keys()[max_pos]

python - Python：确定k-means聚类可能性最高的质心数组（scipy）

2 回答 2

Related

Reference