1

给定一组位置和一个位置,从集合中找到最接近单个位置的位置。这不是寻找通过节点的路径;这是关于鸟瞰图的距离。

这些位置是“节点”的属性,(它用于有限元软件扩展)。问题是:这需要很长时间。我正在寻找更快的东西。一个用户必须在一组 100 万个位置(集合保持不变)上调用此函数最多 500 次(使用不同的单个位置)。

在进行此计算之前,我宁愿不限制集合;我不必查询数据库或任何东西;我觉得这个简单的算术无论如何都应该在几毫秒内完成。我不明白为什么需要这么长时间。

# excerpt of how LocationByNodeId looks like. 40k keys is a small model, can contain up to a million keys.
node_location_by_nodeId = {43815: (3.2835714285714266, -1.8875000000000068, 0.23571428571420952), 43816: (3.227857142857142, -1.8875000000000068, 0.23571428571421035)}
location_in_space=(1,3,7)

def node_closest_to_location_in_space(location_in_space):
    global node_location_by_nodeId
    distances = {}
    for NodeId in node_location_by_nodeId:
        NodeLocation = node_location_by_nodeId[NodeId]
        distances[NodeId] = (NodeLocation[0] - location_in_space[0])**2 + 
                            (NodeLocation[1] - location_in_space[1])**2 + 
                            (NodeLocation[2] - location_in_space[2])**2
    return min(distances, key=distances.get) # I don't really get this statement, i got it from here. Maybe this one is slow?

node_closest_to_location_in_space(location_in_space)

编辑:从以下答案中获取的解决方案将运行时间减少到大数据集中原始运行时间的 35%(在 120 万个集合中进行 400 次调用)。

closest_node = None
closest_distance = 1e100  # An arbitrary, HUGE, value
x,y,z = location_in_space[:3]
for NodeId, NodeLocation in LocationByNodeId.iteritems():
    distance = (NodeLocation[0] - x)**2 + (NodeLocation[1] - y)**2 + (NodeLocation[2] - z)**2
    if distance < closest_distance:
        closest_distance = distance
        closest_node = NodeId
return closest_node
4

3 回答 3

1

您不能在未排序的字典上运行简单的线性搜索并期望它很快(至少不是很快)。有很多算法可以帮助您以非常优化的方式解决这个问题。

建议的R-Tree是存储您的位置的完美数据结构。

您还可以在此维基百科页面中查找解决方案:最近邻搜索

于 2013-06-07T09:29:37.763 回答
0

distances每次运行此函数时,您都在创建和销毁包含一百万个项目的字典 ( ),但这甚至不是必需的。尝试这个:

def node_closest_to_location_in_space(location_in_space)
    global node_location_by_nodeId
    closest_node = None
    closest_distance = 1e100  # An arbitrary, HUGE, value
    for NodeId, NodeLocation in node_location_by_nodeId.iteritems():
        distance = (NodeLocation[0] - location_in_space[0])**2 + 
                   (NodeLocation[1] - location_in_space[1])**2 + 
                   (NodeLocation[2] - location_in_space[2])**2
        if distance <= closest_distance:
            closest_distance = distance
            closest_node = NodeId
    return (closest_node, closest_distance)

distances我相信每次调用该函数时创建和拆除该 dict 所涉及的开销是导致性能下降的原因。如果是这样,这个版本应该更快。

于 2013-06-07T08:52:32.883 回答
0

索引到您的位置参数需要时间,并且所有数百万节点的位置都不会改变,因此将这些不变量从 for 循环中取出:

for NodeId, NodeLocation in node_location_by_nodeId.iteritems():
    distance = (NodeLocation[0] - location_in_space[0])**2 + 
               (NodeLocation[1] - location_in_space[1])**2 + 
               (NodeLocation[2] - location_in_space[2])**2
    if distance <= closest_distance:
        closest_distance = distance
        closest_node = NodeId

变成:

x,y,z = location_in_space
for NodeId, NodeLocation in node_location_by_nodeId.iteritems():
    distance = (NodeLocation[0] - x)**2 + 
               (NodeLocation[1] - y)**2 + 
               (NodeLocation[2] - z)**2
    if distance <= closest_distance:
        closest_distance = distance
        closest_node = NodeId

现在,这些成为简单(且更快)的本地值引用。

您还可以尝试用对 的调用替换距离计算math.hypot,这是在快速 C 代码中实现的:

from math import hypot

x,y,z = location_in_space
for NodeId, NodeLocation in node_location_by_nodeId.iteritems():
    distance = hypot(hypot((NodeLocation[0] - x), (NodeLocation[1] - y)),(NodeLocation[2] - z))
    if distance <= closest_distance:
        closest_distance = distance
        closest_node = NodeId

hypot写成只做 2D 距离计算,所以要做 3D 你必须打电话hypot(hypot(xdist,ydist),zdist)。)

于 2013-06-07T12:11:38.117 回答