numpy - 将numpy整数数组传递给c代码

Question

我正在尝试编写 Cython 代码以比 sklearn 的内置代码更快地将密集特征矩阵、目标向量对转储为 libsvm 格式。我收到一个编译错误，抱怨将目标向量（一个 numpy 整数数组）传递给相关的 c 函数时出现类型问题。

这是代码：

import numpy as np
cimport numpy as np
cimport cython

cdef extern from "cdump.h":
    int filedump( double features[], int numexemplars, int numfeats, int target[], char* outfname)

@cython.boundscheck(False)
@cython.wraparound(False)
def fastdumpdense_libsvmformat(np.ndarray[np.double_t,ndim=2] X, y, outfname):
    if X.shape[0] != len(y):
        raise ValueError("X and y need to have the same number of points")

    cdef int numexemplars = X.shape[0]
    cdef int numfeats = X.shape[1]

    cdef bytes py_bytes = outfname.encode()
    cdef char* outfnamestr = py_bytes

    cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
    cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
    X_c = np.ascontiguousarray(X, dtype=np.double)
    y_c = np.ascontiguousarray(y, dtype=np.int)
    retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)

    return retval

当我尝试使用 distutils 编译此代码时，出现错误

cythoning fastdump_svm.pyx to fastdump_svm.cpp

Error compiling Cython file:
------------------------------------------------------------ ...

    cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
    cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
    X_c = np.ascontiguousarray(X, dtype=np.double)
    y_c = np.ascontiguousarray(y, dtype=np.int)
    retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)
                                                         ^
------------------------------------------------------------

fastdump_svm.pyx:24:58: Cannot assign type 'int_t *' to 'int *'

知道如何解决此错误吗？我最初遵循传递 y_c.data 的范式，这很有效，但这显然不是推荐的方式。

score 3 · Accepted Answer

您也可以dtype=np.dtype("i")在启动 numpy 数组以匹配int您机器上的 C 时使用。

cdef int [:] y_c
c_array = np.ascontiguousarray(y, dtype=np.dtype("i"))

score 3 · Accepted Answer

问题是numpy.int_t不一样int，您可以通过让程序打印sizeof(numpy.int_t)和轻松检查sizeof(int)。

int是 ac int，由 c 标准定义为至少 16 位，但在我的机器上是 32 位。numpy.int_t通常是 32 位或 64 位，具体取决于您使用的是 32 位还是 64 位版本的 numpy，但当然也有一些例外（可能适用于 Windows 用户）。如果您想知道哪个 numpy dtype 与您的 c_int 匹配，您可以这样做np.dtype(cytpes.c_int)。

因此，要将您的 numpy 数组传递给 c 代码，您可以执行以下操作：

import ctypes
cdef np.ndarray[int, ndim=1, mode="c"] y_c
y_c = np.ascontiguousarray(y, dtype=ctypes.c_int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)

numpy - 将numpy整数数组传递给c代码

2 回答 2

Related

Reference