python - 在图像中查找 RGB 颜色的边界框

Question

我正在使用页面分割算法。代码的输出写入图像，每个区域的像素都分配了唯一的颜色。我想处理图像以找到区域的边界框。我需要找到所有颜色，然后找到该颜色的所有像素，然后找到它们的边界框。

以下是示例图像。

显示彩色区域的示例输出图像

我目前从 R、G、B 通道的直方图开始。直方图告诉我数据位置。

img = Image.open(imgfilename)
img.load()
r,g,b = img.split()

ra,ga,ba = [ np.asarray(p,dtype="uint8") for p in (r,g,b) ]

rhist,edges = np.histogram(ra,bins=256)
ghist,edges = np.histogram(ga,bins=256)
bhist,edges = np.histogram(ba,bins=256)
print np.nonzero(rhist)
print np.nonzero(ghist)
print np.nonzero(bhist)

输出： (array([ 0, 1, 128, 205, 255]),) (array([ 0, 20, 128, 186, 255]),) (array([ 0, 128, 147, 150, 255] ),)

在这一点上，我有点困惑。通过目测，我有颜色（0,0,0），（1,0,0），（0,20,0），（128,128,128）等。我应该如何将非零输出置换为 np.where() 的像素值？

我正在考虑将 3,row,col narray 展平为 24 位压缩 RGB 值 (r<<24|g<<16|b) 的二维平面并搜索该数组。这似乎是蛮力和不雅的。在 Numpy 中是否有更好的方法来查找颜色值的边界框？

score 4 · Accepted Answer

没有理由将其视为 RGB 彩色图像，它只是其他人所做的分割的可视化。您可以轻松地将其视为灰度图像，对于这些特定颜色，您无需自己做任何其他事情。

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img) 
colors = set(numpy.unique(im))
colors.remove(255)

for color in colors:
    py, px = numpy.where(im == color)
    print(px.min(), py.min(), px.max(), py.max())

如果您不能依赖于convert('L')提供独特的颜色（即，您正在使用给定图像中的颜色之外的其他颜色），您可以打包您的图像并获得独特的颜色：

...
im = numpy.array(img, dtype=int)

packed = im[:,:,0]<<16 | im[:,:,1]<<8 | im[:,:,2]
colors = set(numpy.unique(packed.ravel()))
colors.remove(255<<16 | 255<<8 | 255)

for color in colors:
    py, px = numpy.where(packed == color)
    print(px.min(), py.min(), px.max(), py.max())

顺便说一句，我还建议在找到边界框之前删除小的连接组件。

score 2 · Accepted Answer

编辑使用您发布的图像将所有内容放在一个工作程序中：

from __future__ import division
import numpy as np
import itertools
from PIL import Image

img = np.array(Image.open('test_img.png'))

def bounding_boxes(img) :
    r, g, b = [np.unique(img[..., j]) for j in (0, 1, 2)]
    bounding_boxes = {}
    for r0, g0, b0 in itertools.product(r, g, b) :
        rows, cols = np.where((img[..., 0] == r0) &
                              (img[..., 1] == g0) &
                              (img[..., 2] == b0))
        if len(rows) :
            bounding_boxes[(r0, g0, b0)] = (np.min(rows), np.max(rows),
                                            np.min(cols), np.max(cols))
    return bounding_boxes

In [2]: %timeit bounding_boxes(img)
1 loops, best of 3: 30.3 s per loop

In [3]: bounding_boxes(img)
Out[3]: 
{(0, 0, 255): (3011, 3176, 755, 2546),
 (0, 128, 0): (10, 2612, 0, 561),
 (0, 128, 128): (1929, 1972, 985, 1438),
 (0, 255, 0): (10, 166, 562, 868),
 (0, 255, 255): (2938, 2938, 680, 682),
 (1, 0, 0): (10, 357, 987, 2591),
 (128, 0, 128): (417, 1873, 984, 2496),
 (205, 186, 150): (11, 56, 869, 1752),
 (255, 0, 0): (3214, 3223, 570, 583),
 (255, 20, 147): (2020, 2615, 956, 2371),
 (255, 255, 0): (3007, 3013, 600, 752),
 (255, 255, 255): (0, 3299, 0, 2591)}

不是很快，即使实际检查的颜色数量很少......

您可以找到颜色r0,的边界框g0，b0其内容类似于

rows, cols = np.where((ra == r0) & (ga == g0) & (ba == b0))
top, bottom = np.min(rows), np.max(rows)
left, right = np.min(cols), np.max(cols)

2**24您可以仅使用非零直方图箱的笛卡尔积来大大减少搜索空间，而不是遍历 RGB 颜色的所有组合：

for r0, g0, b0 in itertools.product(np.nonzero(rhist),
                                    np.nonzero(ghist),
                                    np.nonzero(bhist)) :

您将有不存在的组合泄漏，您可以过滤掉检查rows并且cols不是空元组。但是在您的示例中，您会将2**24组合的搜索空间减少到只有 125 个。

score 0 · Accepted Answer

这只是我脑海中的一个解决方案。您可以从左上角到右下角遍历图像中的像素，并为每种颜色保存一个、top和值。对于给定的颜色，该值将是您使用该颜色看到的第一行，并且将是最后一个原始值，该值将是该颜色中像素的最小列值，并且是您找到的最大列值。bottomleftrighttopbottomleftright

然后，对于每种颜色，您可以用所需的颜色从top-left到绘制一个矩形。bottom-right

我不知道这是否有资格作为一个好的边界框算法，但我想没关系。

python - 在图像中查找 RGB 颜色的边界框

3 回答 3

Related

Reference