我有一个间隔列表,我需要返回与查询中传递的间隔重叠的那些。特殊之处在于,在典型查询中,大约三分之一甚至一半的间隔将与查询中给出的间隔重叠。此外,最短间隔与最长间隔的比例不超过 1:5。我实现了自己的区间树(增强的红黑树)——我不想使用现有的实现,因为我需要对闭区间和一些特殊功能的支持。我用 6000 个间隔的树中的 6000 个查询测试了查询速度(因此 n=6000 和 m=3000(应用程序))。事实证明,蛮力和使用树一样好:
Computation time - loop: 125.220461 s
Tree setup: 0.05064 s
Tree Queries: 123.167337 s
让我使用渐近分析。n:查询次数;n:间隔数;应用程序。n/2:查询中返回的间隔数:
时间复杂度蛮力:n*n
时间复杂度树:n*(log(n)+n/2) --> 1/2 n n + n log(n) --> n*n
所以结果是说对于大的n,两者应该大致相同。考虑到 n*n 前面的常数 1/2,仍然有人会以某种方式期望树明显更快。因此,对于我得到的结果,我可以想象三个可能的原因:
a)我的实施是错误的。(我应该像下面那样使用 BFS 吗?) b)我的实现是正确的,但是我让 Python 的事情变得很麻烦,所以它需要更多的时间来处理树而不是处理蛮力。c) 一切正常 - 这就是大型 n 的行为方式
我的查询函数如下所示:
from collections import deque
def query(self,low,high):
result = []
q = deque([self.root]) # this is the root node in the tree
append_result = result.append
append_q = q.append
pop_left = q.popleft
while q:
node = pop_left() # look at the next node
if node.overlap(low,high): # some overlap?
append_result(node.interval)
if node.low != None and low <= node.get_low_max(): # en-q left node
append_q(node.low)
if node.high != None and node.get_high_min() <= high: # en-q right node
append_q(node.high)
我像这样构建树:
def build(self, intervals):
"""
Function which is recursively called to build the tree.
"""
if intervals is None:
return None
if len(intervals) > 2: # intervals is always sorted in increasing order
mid = len(intervals)//2
# split intervals into three parts:
# central element (median)
center = intervals[mid]
# left half (<= median)
new_low = intervals[:mid]
#right half (>= median)
new_high = intervals[mid+1:]
#compute max on the lower side (left):
max_low = max([n.get_high() for n in new_low])
#store min on the higher side (right):
min_high = new_high[0].get_low()
elif len(intervals) == 2:
center = intervals[1]
new_low = [intervals[0]]
new_high = None
max_low = intervals[0].get_high()
min_high = None
elif len(intervals) == 1:
center = intervals[0]
new_low = None
new_high = None
max_low = None
min_high = None
else:
raise Exception('The tree is not behaving as it should...')
return(Node(center, self.build(new_low),self.build(new_high),
max_low, min_high))
编辑:
一个节点表示如下:
class Node:
def __init__(self, interval, low, high, max_low, min_high):
self.interval = interval # pointer to corresponding interval object
self.low = low # pointer to node containing intervals to the left
self.high = high # pointer to node containing intervals to the right
self.max_low = max_low # maxiumum value on the left side
self.min_high = min_high # minimum value on the right side
子树中的所有节点都可以这样获取:
def subtree(current):
node_list = []
if current.low != None:
node_list += subtree(current.low)
node_list += [current]
if current.high != None:
node_list += subtree(current.high)
return node_list
ps 请注意,通过利用存在如此多的重叠并且所有间隔都有相当的长度,我设法实现了一种基于排序和二等分的简单方法,该方法在 80 秒内完成,但我会说这是过度拟合...... ,通过渐近分析,我发现它应该有app。与使用树相同的运行时...