-2

我有一个尝试导出到 Cython 的 Python 函数。我已经测试了两个实现,但我不明白为什么第二个比第一个慢。此外,我正在寻找提高速度的方法,但我不知道如何?

基本代码

import numpy as np
cimport numpy as np
cimport cython
DTYPE = np.int
ctypedef np.int_t DTYPE_t
cdef inline int int_max(int a, int b): return a if a >= b else b
cdef inline int int_min(int a, int b): return a if a <= b else b
cdef extern from "math.h":
    double exp(double x)

@cython.boundscheck(False)
@cython.wraparound(False)
def bilateral_filter_C(np.ndarray[np.float_t, ndim=1] samples, int w=20):
    # Filter Parameters
    cdef Py_ssize_t size = samples.shape[0]
    cdef float rang
    cdef float sigma = 2*3.0*3.0
    cdef int j, L
    cdef unsigned int a, b
    cdef np.float_t W, num, sub_sample, intensity

    # Initialization
    cdef np.ndarray[np.float_t, ndim=1] gauss = np.zeros(2*w+1, dtype=np.float)
    cdef np.ndarray[np.float_t, ndim=1] sub_samples, intensities = np.empty(size, dtype=np.float)
    cdef np.ndarray[np.float_t, ndim=1] samples_filtered = np.empty(size, dtype=np.float)

    L = 2*w+1
    for j in xrange(L):
        rang = -w+1.0/L
        rang *= rang
        gauss[j] = exp(-rang/sigma)


    <CODE TO IMPROVE>

    return samples_filtered

我尝试在该<CODE TO IMPROVE>部分中注入这两个代码示例:

最有效的方法

    for i in xrange(size):            
        a = <unsigned int>int_max(i-w, 0)
        b = <unsigned int>int_min(i+w, size-1)
        L = b-a        

        sub_samples = samples[a:b]-samples[i]
        sub_samples *= sub_samples
        for j in xrange(L):
            sub_samples[j] = exp(-sub_samples[j]/sigma)
        intensities = gauss[w-i+a:w-i+b]*sub_samples

        num = 0.0
        W = 0.0        
        for j in xrange(L):
            W += intensities[j]
            num += intensities[j]*samples[a+j]

        samples_filtered[i] = num/W

结果

%timeit -n1 -r10 bilateral_filter_C(x, 20)
1 loop, best of 10: 45 ms per loop

效率较低

    for i in xrange(size):            
        a = <unsigned int>int_max(i-w, 0)
        b = <unsigned int>int_min(i+w, size-1)

        num = 0.0
        W = 0.0        
        for j in xrange(b-a):
            sub_sample = samples[a+j]-samples[i]
            intensity1 = gauss[w-i+a+j]*exp(-sub_sample*sub_sample/sigma)
            W += intensity
            num += intensity*samples[a+j]

        samples_filtered[i] = num/W    

结果

%timeit -n1 -r10 bilateral_filter_C(x, 20)
1 loop, best of 10: 125 ms per loop
4

1 回答 1

0

你有几个错别字:

1)你忘了定义i,只需添加cdef int i, j, L

2)在您编写的第二个算法intensity1 = gauss[w-i+a+j]*exp(-sub_sample*sub_sample/sigma)中,它应该是intensity,没有 1

3)我会添加@cython.cdivision(True)以避免检查除以零

随着这些变化,x = np.random.rand(10000)我得到了以下结果

%timeit bilateral_filter_C1(x, 20) # First code
10 loops, best of 3: 74.1 ms per loop
%timeit bilateral_filter_C2(x, 20) # Second code
100 loops, best of 3: 9.5 ms per loop

并且,检查结果

np.all(np.equal(bilateral_filter_C1(x, 20), bilateral_filter_C2(x, 20)))
True

为了避免这些问题,我建议使用该选项cython my_file.pyx -a,它会生成一个 html 文件,显示代码中可能存在的问题

编辑

再次阅读代码,似乎有更多错误:

for j in xrange(L):
    rang = -w+1.0/L
    rang *= rang
    gauss[j] = exp(-rang/sigma)

gauss始终具有相同的值, 的定义是rang什么?

于 2017-01-20T15:11:01.393 回答