1

我在 Nx2 数组中有数字,我想将其简化为每个重叠组的最小值和最大值,作为一个较小的 Nx2 数组。

如果配对的任一侧的数字在另一个配对中,则此处的组是一个组,并且这全局扩展到所有配对。在所有情况下,最终配对将仅包括一组直接相邻的数字。

import numpy as np
x = np.array([
       [ 45,  47], #group 1
       [ 46,  47], #group 1
       [ 53,  54], #group 2
       [ 63,  66], #group 3
       [ 64,  66], #group 3
       [ 65,  66], #group 3
       [ 66,  67], #group 3
       [ 68,  70], #group 4
       [ 69,  70], #group 4
       [ 70,  71], #group 4
       [ 70,  72], #group 4
       [ 80,  81], #group 5
       [ 92,  93], #group 6
       [ 94,  95], #group 7
       [ 94,  96], #group 7
       [ 94,  97], #group 7
       [ 94,  98], #group 7
       [103, 104]]) #group 8

所需的输出:

array([
    [45, 47], #g1
    [53, 54], #g2
    [63, 67], #g3
    [68, 72], #g4
    [80, 81], #g5
    [92, 93], #g6
    [94, 98], #g7
    [103, 104]]) #g8
4

2 回答 2

3

如果可以使用pandas,您可以按重叠间隔进行分组,并为每个组聚合新的开始值和结束值。

import pandas as pd

df = pd.DataFrame(x, columns = ['start','end'])
df.groupby((~df.end.shift().ge(df.start)).cumsum()).agg({'start':'min', 'end':'max'}).to_numpy()

出去:

array([[ 45,  47],
       [ 53,  54],
       [ 63,  67],
       [ 68,  72],
       [ 80,  81],
       [ 92,  93],
       [ 94,  98],
       [103, 104]])
于 2020-12-12T02:19:05.520 回答
2

假设区域已排序...

def merge_regions(regions):
    # Init the first region
    final_regions = []
    final_regions.append(regions[0])
    for i in range(1, len(regions)):
        region = regions[i]
        last_region = final_regions[-1]
        if region[0] <= last_region[1]:
            # Regions overlap, get the new end
            new_end = max(region[1], last_region[1])
            final_regions[-1] = [last_region[0], new_end]
        else:
            final_regions.append(region)
    return final_regions

输入:

[
       [ 45,  47], #group 1
       [ 46,  47], #group 1
       [ 53,  54], #group 2
       [ 63,  66], #group 3
       [ 64,  66], #group 3
       [ 65,  66], #group 3
       [ 66,  67], #group 3
       [ 68,  70], #group 4
       [ 69,  70], #group 4
       [ 70,  71], #group 4
       [ 70,  72], #group 4
       [ 80,  81], #group 5
       [ 92,  93], #group 6
       [ 94,  95], #group 7
       [ 94,  96], #group 7
       [ 94,  97], #group 7
       [ 94,  98], #group 7
       [103, 104]]

输出:

[[45, 47],
 [53, 54],
 [63, 67],
 [68, 72],
 [80, 81],
 [92, 93],
 [94, 98],
 [103, 104]]
于 2020-12-12T01:53:31.673 回答