语境
为简单起见,让我们假设我们正在对一系列宽度为w、具有三个通道(r、g、b)和n 个标签类的单像素高图像执行语义分割。
换句话说,单个图像可能看起来像:
img = [
[r1, r2, ..., rw], # channel r
[g1, g2, ..., gw], # channel g
[b1, b2, ..., bw], # channel b
]
并有尺寸[3, w]
。
那么对于具有及其标签的给定图像,w=10
基本n=3
事实可能是:
# ground "truth"
target = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 1, 1, 1, 0, 0, 1, 1, 1, 1], # class 1
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
我们的模型可能会预测为输出:
# prediction
output = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0.11, 0.71, 0.98, 0.95, 0.20, 0.15, 0.81, 0.82, 0.95, 0.86], # class 1
[0.13, 0.17, 0.05, 0.42, 0.92, 0.89, 0.93, 0.93, 0.67, 0.21], # class 2
[0.99, 0.33, 0.20, 0.12, 0.15, 0.15, 0.20, 0.01, 0.02, 0.13], # class 3
])
为进一步简单起见,让我们通过将模型的输出二值化来转换模型的输出0.9
# binary mask with cutoff 0.9
b_mask = np.array([
#0 1 2 3 4 5 6 7 8 9 # position
[0, 0, 1, 1, 0, 0, 0, 0, 1, 0], # class 1
[0, 0, 0, 0, 1, 0, 1, 1, 0, 0], # class 2
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], # class 3
])
然后,如果我们要查看每个类的“对象”,边界框(或者在这种情况下只是边界,即[start, stop]
像素),我们从二进制掩码中预测的对象“引入”一个对象:
# "detected" objects
p_obj = [
[[2, 3], [8, 8]], # class 1
[[4, 4], [6, 7]], # class 2
[[0, 0]] # class 3
]
与基本事实的对象相比:
# true objects
t_obj = [
[[1, 3], [6, 9]], # class 1
[[4, 7]], # class 2
[[0, 0]] # class 3
]
问题
如果我想要一个度量来描述边界的准确性,平均而言,每个对象,什么是合适的度量?
我在训练预测边界框的模型的上下文中理解 IOU ,例如它是一个对象到对象的比较,但是当一个对象可能被分割成多个时应该怎么办?
目标
我想要一个指标,每个班级都给我这样的东西:
class 1: [-1, 2] # bounding boxes for class one, on average start one
# pixel before they should and end two pixels after
# they should
class 2: [ 0, 3] # bounding boxes for class two, on average start
# exactly where they should and end three pixels
# after they should
class 3: [ 3, -1] # bounding boxes for class three, on average start
# three pixels after where they begin and end one
# pixels too soon
但是当一个对象被分割成几个时,我不确定如何最好地解决这个问题......