2

I am writing Python code to accelerate a region properties function for labeled objects in a binary image. The following code will calculate the number of border pixels of a labeled object in a binary image given the indices of the object. The main() function will cycle through all labeled objects in a binary image 'mask' and calculate the number of border pixels for each one.

I am wondering what the best way is to pass or return my variables in this Cython code. The variables are either in NumPy arrays or typed Memoryviews. I've messed around with passing/returning the variables in the different formats, but cannot deduce what the best/most efficient way is. I am new to Cython so Memoryviews are still fairly abstract to me and whether there is a different between the two methods remains a mystery. The images I am working with contain 100,000+ labeled objects so operations such as these need to be fairly efficient.

To summarize:

When/should I pass/return my variables as typed Memoryviews rather than NumPy arrays for very repetitive computations? Is there a way that is best or are they exactly the same?

%%cython --annotate

import numpy as np
import cython
cimport numpy as np

DTYPE = np.intp
ctypedef np.intp_t DTYPE_t

@cython.boundscheck(False)
@cython.wraparound(False)
def erode(DTYPE_t [:,:] img):

    # Image dimensions
    cdef int height, width, local_min
    height = img.shape[0]
    width = img.shape[1]

    # Padded Array
    padded_np = np.zeros((height+2, width+2), dtype = DTYPE)
    cdef DTYPE_t[:,:] padded = padded_np
    padded[1:height+1,1:width+1] = img

    # Eroded image
    eroded_np = np.zeros((height,width),dtype=DTYPE)
    cdef DTYPE_t[:,:] eroded = eroded_np

    cdef DTYPE_t i,j
    for i in range(height):
        for j in range(width):
            local_min = min(padded[i+1,j+1], padded[i,j+1], padded[i+1,j],padded[i+1,j+2],padded[i+2,j+1])
            eroded[i,j] = local_min
    return eroded_np


@cython.boundscheck(False)
@cython.wraparound(False)
def border_image(slice_np):

    # Memoryview of slice_np
    cdef DTYPE_t [:,:] slice = slice_np

    # Image dimensions
    cdef Py_ssize_t ymax, xmax, y, x
    ymax = slice.shape[0]
    xmax = slice.shape[1]

    # Erode image
    eroded_image_np = erode(slice_np)
    cdef DTYPE_t[:,:] eroded_image = eroded_image_np

    # Border image
    border_image_np = np.zeros((ymax,xmax),dtype=DTYPE)
    cdef DTYPE_t[:,:] border_image = border_image_np
    for y in range(ymax):
        for x in range(xmax):
            border_image[y,x] = slice[y,x]-eroded_image[y,x]
    return border_image_np.sum()


@cython.boundscheck(False)
@cython.wraparound(False)
def main(DTYPE_t[:,:] mask, int numobjects, Py_ssize_t[:,:] indices):

    # Memoryview of boundary pixels
    boundary_pixels_np = np.zeros(numobjects,dtype=DTYPE)
    cdef DTYPE_t[:] boundary_pixels = boundary_pixels_np

    # Loop through each object
    cdef Py_ssize_t y_from, y_to, x_from, x_to, i
    cdef DTYPE_t[:,:] slice
    for i in range(numobjects):
        y_from = indices[i,0]
        y_to = indices[i,1]
        x_from = indices[i,2]
        x_to = indices[i,3]
        slice = mask[y_from:y_to, x_from:x_to]
        boundary_pixels[i] = border_image(slice)

    return boundary_pixels_np
4

1 回答 1

10

Memoryviews 是 Cython 的最新添加,旨在与原始np.ndarray语法相比进行改进。出于这个原因,它们略受欢迎。不过,它通常不会对您使用产生太大影响。以下是一些注意事项:

速度

就速度而言,它几乎没有什么区别——我的经验是,作为函数参数的内存视图稍微慢一些,但几乎不值得担心。

概论

Memoryviews 旨在与任何具有 Python 缓冲区接口的类型(例如标准库array模块)一起使用。键入 asnp.ndarray仅适用于 numpy 数组。原则上,memorviews 可以支持更广泛的内存布局,这可以使与 C 代码的接口更容易(实际上我从未真正看到这很有用)。

作为返回值

当从 Cython 返回一个数组以编写 Python 代码时,用户可能会更喜欢 numpy 数组而不是 memoryview。如果您正在使用 memoryviews,您可以执行以下任一操作:

return np.asarray(mview)
return mview.base

易于编译

如果您正在使用,则必须在文件np.ndarray中设置包含目录。您不必使用 memoryviews 执行此操作,这通常意味着您可以跳过并仅使用命令行命令或更简单的项目。np.get_include()setup.pysetup.pycythonizepyximport

并行化

与 numpy 数组(如果你想使用它)相比,这是 memoryviews 的一大优势。它不需要全局解释器锁来获取内存视图的切片,但它适用于 numpy 数组。这意味着以下代码大纲可以与 memoryview 并行工作:

cdef void somefunc(double[:] x) nogil:
     # implementation goes here

cdef double[:,:] 2d_array = np.array(...)
for i in prange(2d_array.shape[0]):
    somefunc(2d_array[i,:])

如果您不使用 Cython 的并行功能,则不适用。

cdef班级

您可以将 memoryviews 用作cdef类的属性,但不能用作np.ndarrays。您可以(当然)使用 numpy 数组作为无类型object属性。

于 2018-04-12T19:15:49.070 回答