python - 两组区间之差

Question

我正在尝试编写一些代码来计算两组间隔 A - B 的差异，间隔端点是整数，但我正在努力寻找有效的解决方案，任何建议都会非常感谢示例：[(1, 4 ), (7, 9)] - [(3,5)] = [(1, 3), (7, 9)]

好吧，这是我迄今为止尝试过的最好的（两个列表已经排序）

class tp():
   def __repr__(self):
       return '(%.2f,%.2f)' % (self.start, self.end)
   def __init__(self,start,end): 
       self.start=start
       self.end=end



z=[tp(3,5)] #intervals to be subtracted
s=[tp(1, 4)),tp(7, 9), tp(3,4),tp(4,6)]

for x in s[:]:
   if z.end < x.start:
    break
   elif z.start < x.start and z.end > x.start and z.end < x.end:
    x.start=z.end
   elif z.start < x.start and z.end > x.end:
    s.remove(x)
   elif z.start > x.start and z.end < x.end:
    s.append(tp(x.start,z.start))
    s.append(tp(z.end,x.end))
    s.remove(x)
   elif z.start > x.start and z.start < x.end and z.end > x.end:
    x.end=z.start
   elif z.start > x.end:
    continue

score 13 · Accepted Answer

使操作高效的唯一方法是保持区间列表排序且不重叠（可以在中完成O(n log n)）。[参见下面的注释]。

由于列表都已排序且不重叠，任何集合操作（并集、交集、差集、对称差集）都可以通过简单的合并来执行。

合并操作很简单：按顺序同时遍历两个参数的端点。（请注意，每个区间列表的端点都是排序的，因为我们要求区间不重叠。）对于发现的每个端点，决定它是否在结果中。如果结果当前有奇数个端点并且新端点不在结果中，则将其添加到结果中；同样，如果结果当前具有偶数个端点并且新端点在结果中，则将其添加到结果中。在此操作结束时，结果是一个端点列表，在区间开始和区间结束之间交替。

这是在python中：

# If using python 3, uncomment the following:
# from functools import reduce

# In all of the following, the list of intervals must be sorted and 
# non-overlapping. We also assume that the intervals are half-open, so
# that x is in tp(start, end) iff start <= x and x < end.

def flatten(list_of_tps):
  """Convert a list of intervals to a list of endpoints"""
  return reduce(lambda ls, ival: ls + [ival.start, ival.end],
                list_of_tps,
                [])

def unflatten(list_of_endpoints):
  """Convert a list of endpoints, with an optional terminating sentinel,
     into a list of intervals"""
  return [tp(list_of_endpoints[i], list_of_endpoints[i + 1])
          for i in range(0, len(list_of_endpoints) - 1, 2)]

def merge(a_tps, b_tps, op):
  """Merge two lists of intervals according to the boolean function op"""
  a_endpoints = flatten(a_tps)
  b_endpoints = flatten(b_tps)

  sentinel = max(a_endpoints[-1], b_endpoints[-1]) + 1
  a_endpoints += [sentinel]
  b_endpoints += [sentinel]

  a_index = 0
  b_index = 0

  res = []

  scan = min(a_endpoints[0], b_endpoints[0])
  while scan < sentinel:
    in_a = not ((scan < a_endpoints[a_index]) ^ (a_index % 2))
    in_b = not ((scan < b_endpoints[b_index]) ^ (b_index % 2))
    in_res = op(in_a, in_b)

    if in_res ^ (len(res) % 2): res += [scan]
    if scan == a_endpoints[a_index]: a_index += 1
    if scan == b_endpoints[b_index]: b_index += 1
    scan = min(a_endpoints[a_index], b_endpoints[b_index])

  return unflatten(res)

def interval_diff(a, b):
  return merge(a, b, lambda in_a, in_b: in_a and not in_b)

def interval_union(a, b):
  return merge(a, b, lambda in_a, in_b: in_a or in_b)

def interval_intersect(a, b):
  return merge(a, b, lambda in_a, in_b: in_a and in_b)

笔记

间隔[a, b)和[b, c)不重叠，因为它们是不相交的；b只属于第二个。但是这两个区间的并集应该仍然是[a,c)。但是出于此答案中功能的目的，我们还应该要求间隔不相邻。扩展不重叠以包括间隔相邻的情况；否则，我们可能会发现不必要地包含在输出中的邻接点。（严格来说这并没有错，但如果输出是确定性的，那么测试函数会更容易。）

这是一个函数的示例实现，它将任意间隔列表规范化为排序的、非重叠的间隔。
```
def interval_normalise(a):
    rv = sorted(a, key = lambda x: x.start)
    out = 0
    for scan in range(1, len(rv)):
        if rv[scan].start > rv[out].end:
            if rv[out].end > rv[out].start: out += 1
            rv[out] = rv[scan]
        elif rv[scan].end > rv[out].end:
            rv[out] = tp(rv[out].start, rv[scan].end)
    if rv and rv[out].end > rv[out].start: out += 1
    return rv[:out]
```

score 2 · Accepted Answer

这可以通过扫描线算法来解决。这个想法是将两个集合的所有间隔起点保留在一个排序数组中，并将终点保留在另一个排序数组中，并用它们属于哪个集合的信息来标记它们。例如

       A              B
[(1, 4), (7, 9)] - [(3,5)]
A: start:[1,7] end:[4,9], B: start:[3]end:[5]
start:[(1,a),(3,b),(7,a)]
end: [(4,a),(5,b),(9,a)]

现在有两个指针指向每个数组的开头。在循环中递增一个，它指向以 a 开头的最低值添加间隔，直到它们以 b 或 aeg 结束，我们将按此顺序迭代点

(1,a) (3,b) (4,a) (5,b) (7,a) (9,a)
# and adding intervals where we have seen an start a and an end a or b
(1,3) (7,9)

这导致在区间数方面的线性解决方案。

python - 两组区间之差

2 回答 2

笔记

Related

Reference