python - 从给定的 n 个点中选择最远的 k 个点

Question

我在维度d中有一组由n个点组成的 S，如果需要，我可以计算所有成对的距离。我需要在这个集合中选择k个点，以使它们的成对距离之和最大。换句话说，稍微更多的数学单词，我希望 S 中的 p1, ..., pk 使得 sum(i,j < k) dist(pi, pj) 是最大的。

我知道这个问题与这个问题有关（这与我的问题基本相同，但 k=2），也可能与这个问题有关（用“最远”而不是“最近”）。

我对此不太确定，但也许所有可能的解决方案在凸包中都有它们的所有点？

任何合理的近似/启发式都可以。

虚拟奖励点#1 适用于任何函数，该函数在四个点中给出分数（其中一个可能是平方距离之和的平方根）。

如果解决方案很容易在 python + numpy/scipy 中实现，则虚拟奖励点 #2。

score 6 · Accepted Answer

Your problem seemed similar to the weighted minimum vertex cover problem (which is NP-complete). Thanks to @Gareth Rees for the comments below clarifying that I was incorrect in understanding a vertex cover's relationship to the set you're looking for. But you may still investigate the vertex cover problem and literature because your problem might be discussed alongside it, as they still do share some features.

If you're willing to work with diameter instead of summed graph weight, you could use the approach for the minimal diameter set that you linked in your question. If your current distance measure is called d (the one for which you want the points furthest from each other) then just define d' = 1/d and solve the minimum distance problem with d'.

There might also be a relationship between some form of graph cutting algorithm, like say normalized cut, and the subset you seek. If your distance measure is used as the graph weight or affinity between nodes, you might be able to modify an existing graph cutting objective function to match your objective function (looking for the group of k nodes that have maximum summed weight).

This seems like a combinatorially difficult problem. You might consider something simple like simulated annealing. The proposal function could just choose at point that's currently in the k-subset at random and replace it randomly with a point not currently in the k-subset.

You would need a good cooling schedule for the temperature term and may need to use reheating as a function of cost. But this sort of this is really simple to program. As long as n is reasonably small, you can then just constantly randomly select k-subsets and anneal towards a k-subset with very large total distance.

This would only give you an approximation, but even deterministic methods probably will solve this approximately.

Below is a first hack at what the simulated annealing code might be. Note that I'm not making guarantees about this. It could be an inefficient solution if calculating distance is too hard or the problem instance size grows too large. I'm using very naive geometric cooling with a fixed cooling rate, and you may also want to tinker with a fancier proposal than just randomly swapping around nodes.

all_nodes = np.asarray(...) # Set of nodes
all_dists = np.asarray(...) # Pairwise distances

N = len(all_nodes)
k = 10 # Or however many you want.

def calculate_distance(node_subset, distances):
    # A function you write to determine sum of distances
    # among a particular subset of nodes.    

# Initial random subset of k elements
shuffle = np.random.shuffle(all_nodes) 
current_subset = shuffle[0:k]
current_outsiders = shuffle[k:]

# Simulated annealing parameters.
temp = 100.0
cooling_rate = 0.95
num_iters = 10000

# Simulated annealing loop.
for ii in range(num_iters):
    proposed_subset = current_subset.copy()
    proposed_outsiders =  current_outsiders.copy()

    index_to_swap = np.random.randint(k)
    outsider_to_swap = np.random.randint(N - k)

    tmp = current_subset[index_to_swap]
    proposed_subset[index_to_swap] = current_outsiders[outsider_to_swap]
    proposed_outsiders[outsider_to_swap] = tmp

    potential_change = np.exp((-1.0/temp)*
        calculate_distance(proposed_subset,all_dists)/
        calculate_distance(current_subset, all_dists)) 

    if potential_change > 1 or potential_change >= np.random.rand():
         current_subset = proposed_subset
         current_outsiders = proposed_outsiders

    temp = cooling_rate * temp

score 6 · Accepted Answer

这个贪心算法怎么样：

将 S 中距离最大的 2 个点添加到解中
直到你得到一个大小为 k 的解，向解中添加从它到解中已经存在的所有点的距离总和最大的点。

让我们称任何 2 点之间的最大距离为 D，对于我们添加到解中的每个点，由于三角形不等式，我们至少添加 D。因此解决方案将至少为 (k-1)*D，而任何解决方案都将具有 (k-1)^2 距离，但没有一个超过 D，因此在最坏的情况下，您会得到 k 倍最优解。

我不确定这是可以证明这个启发式的最严格的界限。

score 2 · Accepted Answer

第 1 步：预先计算所有对 pi,pj 的 dist(pi, pj)

第 2 步：构造一个完整的图 V={p1,...,pn}，边权重 w_ij = dist(pi, pj)

第 3 步：解决最大边缘加权团 (MEC) 问题。

MEC 绝对是 NP 完全的，它与二次规划密切相关。因此，如果 n 很大（甚至中等大小），您可能会专注于启发式算法。

请注意，通过预先计算边权重，距离函数没有限制

score 0 · Accepted Answer

这是一个小 n 的工作（蛮力）实现，如果没有别的，它显示了生成器理解的表现力：

selection = max(
    itertools.combinations(points, k),
    key=lambda candidate: sum(
        dist(p, q) for p, q in itertools.combinations(candidate, 2)
    )
)

虽然这最终会调用dist很多：

(k! / 2! / (k-2)!) * (n! / k! / (n-k)! == n! /(2(k-2)!(n-k)!)

python - 从给定的 n 个点中选择最远的 k 个点

4 回答 4

Related

Reference