python - 用于在训练之外的对象级别上从语义分割评估预测边界框的度量

Question

语境

为简单起见，让我们假设我们正在对一系列宽度为w、具有三个通道（r、g、b）和n 个标签类的单像素高图像执行语义分割。

换句话说，单个图像可能看起来像：

img = [
    [r1, r2, ..., rw], # channel r
    [g1, g2, ..., gw], # channel g
    [b1, b2, ..., bw], # channel b
]

并有尺寸[3, w]。

那么对于具有及其标签的给定图像，w=10基本n=3事实可能是：

# ground "truth"
target = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0,    1,    1,    1,    0,    0,    1,    1,    1,    1],    # class 1
  [0,    0,    0,    0,    1,    1,    1,    1,    0,    0],    # class 2
  [1,    0,    0,    0,    0,    0,    0,    0,    0,    0],    # class 3
])

我们的模型可能会预测为输出：

# prediction
output = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
  [0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
  [0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])

为进一步简单起见，让我们通过将模型的输出二值化来转换模型的输出0.9

# binary mask with cutoff 0.9
b_mask = np.array([
  #0     1     2     3     4     5     6     7     8     9      # position
  [0,    0,    1,    1,    0,    0,    0,    0,    1,    0],    # class 1
  [0,    0,    0,    0,    1,    0,    1,    1,    0,    0],    # class 2
  [1,    0,    0,    0,    0,    0,    0,    0,    0,    0],    # class 3
])

然后，如果我们要查看每个类的“对象”，边界框（或者在这种情况下只是边界，即[start, stop]像素），我们从二进制掩码中预测的对象“引入”一个对象：

# "detected" objects
p_obj = [
  [[2, 3], [8, 8]],  # class 1
  [[4, 4], [6, 7]],  # class 2
  [[0, 0]]           # class 3
]

与基本事实的对象相比：

# true objects
t_obj = [
  [[1, 3], [6, 9]],  # class 1
  [[4, 7]],          # class 2
  [[0, 0]]           # class 3
]

问题

如果我想要一个度量来描述边界的准确性，平均而言，每个对象，什么是合适的度量？

我在训练预测边界框的模型的上下文中理解 IOU ，例如它是一个对象到对象的比较，但是当一个对象可能被分割成多个时应该怎么办？

目标

我想要一个指标，每个班级都给我这样的东西：

class 1: [-1, 2]  # bounding boxes for class one, on average start one
                  # pixel before they should and end two pixels after 
                  # they should

class 2: [ 0, 3]  # bounding boxes for class two, on average start 
                  # exactly where they should and end three pixels  
                  # after they should

class 3: [ 3, -1] # bounding boxes for class three, on average start 
                  # three pixels after where they begin and end one 
                  # pixels too soon

但是当一个对象被分割成几个时，我不确定如何最好地解决这个问题......

score 0 · Accepted Answer

假设

你专门问一维的情况，所以我们这里解决一维的情况，但是方法本质上是二维的。

让我们假设您有两个基本实况边界框：框 1 和框 2。

此外，让我们假设我们的模型不是那么好并且预测超过 2 个盒子（也许它发现了一些新的东西，也许它把一个盒子分成了两个）。

对于这个演示，让我们考虑一下这是我们正在使用的：

# labels
# box 1: x----y 
# box 2: x++++y
# 0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20
#             x--------y        x+++++++++++++++++++++++++++++y     TRUTH
#             a-----------b                                         PRED 1, BOX 1
#                   a+++++++++++++++++b                             PRED 2, BOX 2
#                a++++++++++++++++++++++++++++++++b                 PRED 3, BOX 2

核心问题

您想要的实际上是您的预测与目标对齐的分数......但是哦，不！哪些目标属于哪些预测？

选择您选择的距离函数，并将每个预测与基于该函数的目标配对。在这种情况下，我将对一维情况使用修改后的并集交集 (IOU)。我选择了这个函数，因为我希望上图中的 PRED 2 和 3 都与框 2 对齐。

每个预测都有一个分数，将其与产生最佳分数的目标配对。

现在使用一对一的预测目标对，计算您想要的任何内容。

具有上述假设的演示

从以上假设：

pred_boxes = [
    [4,  8],
    [6, 12],
    [5, 16]
]

true_boxes = [
    [4,   7],
    [10, 20]
]

联合交叉的一维版本：

def iou_1d(predicted_boundary, target_boundary):
  '''Calculates the intersection over union (IOU) based on a span.

  Notes:
    boundaries are provided in the the form of [start, stop].
    boundaries where start = stop are accepted
    boundaries are assumed to be only in range [0, int < inf)

  Args:
    predicted_boundary (list): the [start, stop] of the predicted boundary
    target_boundary (list): the ground truth [start, stop] for which to compare

  Returns:
    iou (float): the IOU bounded in [0, 1]
  '''

  p_lower, p_upper = predicted_boundary
  t_lower, t_upper = target_boundary

  # boundaries are in form [start, stop] and 0<= start <= stop
  assert 0<= p_lower <= p_upper
  assert 0<= t_lower <= t_upper

   # no overlap, pred is too far left or pred is too far right
  if p_upper < t_lower or p_lower > t_upper:
    return 0

  if predicted_boundary == target_boundary:
    return 1

  intersection_lower_bound = max(p_lower, t_lower)
  intersection_upper_bound = min(p_upper, t_upper)


  intersection = intersection_upper_bound - intersection_lower_bound
  union = max(t_upper, p_upper) - min(t_lower, p_lower)  
  union = union if union != 0 else 1  
  return min(intersection / union, 1)

一些简单的帮手：

from math import sqrt
def euclidean(u, v):
  return sqrt((u[0]-v[0])**2 + (u[1]-v[1])**2)

def mean(arr):
  return sum(arr) / len(arr)

我们如何调整边界：

def align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn=iou_1d, take=max):
  '''Aligns predicted_bondary to the closest target_boundary based on the 
    alignment_scoring_fn

  Args:
    predicted_boundary (list): the predicted boundary in form of [start, stop]

    target_boundaries (list): a list of all valid target boundaries each having
      form [start, stop]

    alignment_scoring_fn (function): a function taking two arguments each of 
      which is a list of two elements, the first assumed to be the predicted
      boundary and the latter the target boundary. Should return a single number.

    take (function): should either be min or max. Selects either the highest or
      lower score according to the alignment_scoring_fn

  Returns:
    aligned_boundary (list): the aligned boundary in form [start, stop]
  '''
  scores = [
      alignment_scoring_fn(predicted_boundary, target_boundary) 
      for target_boundary in target_boundaries
  ]



  # boundary did not align to any boxes, use fallback scoring mechanism to break
  # tie
  if not any(scores):
    scores = [
      1 / euclidean(predicted_boundary, target_boundary)
      for target_boundary in target_boundaries
    ]

  aligned_index = scores.index(take(scores))
  aligned = target_boundaries[aligned_index]
  return aligned

我们如何计算差异：

def diff(u, v):
  return [u[0] - v[0], u[1] - v[1]]

将它们合二为一：

def aligned_distance_1d(predicted_boundaries, target_boundaries, alignment_scoring_fn=iou_1d, take=max, distance_fn=diff, aggregate_fn=mean):
  '''Returns the aggregated distance of predicted boundings boxes to their aligned bounding box based on alignment_scoring_fn and distance_fn

  Args:
    predicted_boundaries (list): a list of all valid target boundaries each 
      having form [start, stop]

    target_boundaries (list): a list of all valid target boundaries each having
      form [start, stop]

    alignment_scoring_fn (function): a function taking two arguments each of 
      which is a list of two elements, the first assumed to be the predicted
      boundary and the latter the target boundary. Should return a single number.

    take (function): should either be min or max. Selects either the highest or
      lower score according to the alignment_scoring_fn

    distance_fn (function): a function taking two lists and should return a
      single value.

    aggregate_fn (function): a function taking a list of numbers (distances 
      calculated by distance_fn) and returns a single value (the aggregated 
      distance)

  Returns:
    aggregated_distnace (float): return the aggregated distance of the 
      aligned predicted_boundaries

      aggregated_fn([distance_fn(pair) for pair in paired_boundaries(predicted_boundaries, target_boundaries)])
  '''


  paired = [
      (predicted_boundary, align_1d(predicted_boundary, target_boundaries, alignment_scoring_fn))
      for predicted_boundary in predicted_boundaries
  ]
  distances = [distance_fn(*pair) for pair in paired]
  aggregated = [aggregate_fn(error) for error in zip(*distances)]
  return aggregated

跑：

aligned_distance_1d(pred_boxes, true_boxes)

# [-3.0, -3.6666666666666665]

请注意，对于许多预测和许多目标，有许多方法可以优化代码。在这里，我分解了主要的功能块，所以很清楚发生了什么。

现在这有意义吗？好吧，因为我希望 pred 2 和 3 与框 2 对齐，是的，两个开始都在事实之前，并且都过早结束。

问题的解决方案

复制粘贴您的示例：

# "detected" objects
p_obj = [
  [[2, 3], [8, 8]],  # class 1
  [[4, 4], [6, 7]],  # class 2
  [[0, 0]]           # class 3
] 

# true objects
t_obj = [
  [[1, 3], [6, 9]],  # class 1
  [[4, 7]],          # class 2
  [[0, 0]]           # class 3
]

因为你知道每个班级的箱子，这很容易：

[
    aligned_distance_1d(p_obj[cls_no], t_obj[cls_no])
    for cls_no in range(len(t_obj))
]


# [[1.5, -0.5], [1.0, -1.5], [0.0, 0.0]]

这个输出有意义吗？

从健全性检查开始，让我们看看第 3 类。[start, stop] 的平均距离都是 0。有道理。

1级怎么样？两个预测都开始得太晚（2 > 1, 8 > 6），但只有一个结束得太早（8 < 9）。所以有道理。

现在让我们看看第 2 类，这就是为什么您似乎提出了这个问题（预测多于目标）。

如果我们要画出分数所暗示的内容，那将是：

#  0  1  2  3  4  5  6  7  8  9
#              ----------        # truth [4, 7]
#                 ++             # pred  [4 + 1, 7 - 1.5]

它看起来不太好，但这只是一个例子......

这有意义吗？是/否。是的，就我们如何计算指标而言。一个太早停止了 3 个值，另一个太晚了 2 个开始。否，因为您的预测实际上都没有涵盖值 5，但该指标使您相信情况确实如此……

结论

这是一个错误的指标吗？

取决于您将其用于/试图展示什么。但是，由于您使用二进制掩码来生成预测边界，因此这是该问题不可忽略的根源。也许有更好的策略来从标签概率中获取边界。

python - 用于在训练之外的对象级别上从语义分割评估预测边界框的度量

语境

问题

目标

1 回答 1

假设

核心问题

具有上述假设的演示

问题的解决方案

结论

Related

Reference