python - 如何使用 python 构建一个 int 数组并像 C 一样高效地操作它？

Question

来自http://www.cs.bell-labs.com/cm/cs/pearls/sol01.html：

C代码是这样的：

#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];

void set(int i) {        a[i>>SHIFT] |=  (1<<(i & MASK)); }
void clr(int i) {        a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int  test(int i){ return a[i>>SHIFT] &   (1<<(i & MASK)); }

我找到了ctypes、BitArrays、numpy ，但我不确定它们是否能像上面的 C 代码一样高效。

例如，如果我编写这样的代码：

from ctypes import c_int
a=[c_int(9)]*1024*1024

使用的空间是 1M 字节还是更多？

有谁知道一些可以在 Python 中做同样事情的好库？

score 3 · Accepted Answer

Numpy 或 ctypes 都是不错的选择。但是你确定你的 Python 代码真的需要像 C 一样高效吗，你确定这段代码是性能热点吗？

最好的办法是使用 Python 分析器来确保此代码确实需要与 C 一样高效。如果确实如此，那么将代码保留在 C 中并使用某些东西链接到它可能是最简单的像 ctypes 或 SWIG。

编辑：为了回答您更新的问题，一个大小为 N 且元素大小为 M 的 numpy 数组将包含 N*M 字节的连续内存，以及一个标题和一些用于视图的字节。

以下是几个相关链接：

score 2 · Accepted Answer

您还可以检查内置array模块：

>>> import array
>>> help(array)
Help on built-in module array:

NAME
    array

FILE
    (built-in)

DESCRIPTION
    This module defines an object type which can efficiently represent
    an array of basic values: characters, integers, floating point
    numbers.  Arrays are sequence types and behave very much like lists,
    except that the type of objects stored in them is constrained.  The
    type is specified at object creation time by using a type code, which
    is a single character.  The following type codes are defined:

        Type code   C Type             Minimum size in bytes 
        'b'         signed integer     1 
        'B'         unsigned integer   1 
        'u'         Unicode character  2 (see note) 
        'h'         signed integer     2 
        'H'         unsigned integer   2 
        'i'         signed integer     2 
        'I'         unsigned integer   2 
        'l'         signed integer     4 
        'L'         unsigned integer   4 
        'f'         floating point     4 
        'd'         floating point     8

score 2 · Accepted Answer

This:

a=[c_int()]

makes a list which contains a reference to a c_int object.

Multiplying the list merely duplicates the references, so:

a = [c_int()] * 1024 * 1024

actually creates a list of 1024 * 1024 references to the same single c_int object.

If you want an array of 1024 * 1024 c_ints, do this:

a = c_int * (1024 * 1024)

python - 如何使用 python 构建一个 int 数组并像 C 一样高效地操作它？

3 回答 3

Related

Reference