3

我想遍历 Python 程序中的大型数据结构并为每个元素执行一项任务。为简单起见,假设元素是整数,而任务只是一个增量。最后,最后一个递增的元素作为(虚拟)结果返回。为了寻找最好的结构/方法,我比较了这些结构在纯 Python 和 Cython 中的时间(我在其他地方找不到它们的直接比较):

  • Python列表
  • NumPy 数组/类型化内存视图
  • 具有底层 C++ 向量的 Cython 扩展类型

我计时的迭代是:

  • 列表迭代中的 Python foreach (it_list)
  • 具有显式元素访问的 Cython 列表迭代 (cit_list)
  • 数组迭代中的 Python foreach (it_nparray)
  • Python NumPy 向量化操作 (vec_nparray)
  • 具有显式元素访问的 Cython 内存视图迭代 (cit_memview)
  • 底层向量迭代中的 Python foreach (it_pyvector)
  • 通过 __iter__ (it_pyvector_iterator) 在底层向量迭代中的 Python foreach
  • 具有显式元素访问的 Cython 向量迭代 (cit_pyvector)
  • 通过 vector.iterator (cit_pyvector_iterator) 进行 Cython 向量迭代

我由此得出结论(时间如下):

  • NumPy 数组上的普通 Python 迭代非常慢(比 Python 列表迭代慢大约 10 倍)-> 不是一个好主意
  • 包装的 C++ 向量上的 Python 迭代也很慢(比 Python 列表迭代慢大约 1.5 倍)-> 不是一个好主意
  • Cython 对包装的 C++ 向量的迭代是最快的选择,大约等于 C 连续内存视图
  • 使用显式元素访问对向量的迭代比使用迭代器稍快 -> 为什么要使用迭代器?
  • 内存视图方法具有比扩展类型方法更大的开销

我现在的问题是:我的数字可靠吗(我做错了什么或错过了什么)?这是否符合您对现实世界示例的经验?我还能做些什么来改进迭代吗?在我使用的代码和时间下方。顺便说一句,我在 Jupyter 笔记本中使用它。非常感谢您的建议和评论!

相对时间(最小值 1.000),对于不同的数据结构大小 n:

================================================================================
Timings for n = 1:
--------------------------------------------------------------------------------
    cit_pyvector_iterator:   1.000
             cit_pyvector:   1.005
                 cit_list:   1.023
                  it_list:   3.064
              it_pyvector:   4.230
     it_pyvector_iterator:   4.937
              cit_memview:   8.196
              vec_nparray:  20.187
               it_nparray:  25.310
================================================================================ 

================================================================================
Timings for n = 1000:
--------------------------------------------------------------------------------
    cit_pyvector_iterator:   1.000
             cit_pyvector:   1.001
              cit_memview:   2.453
              vec_nparray:   5.845
                 cit_list:   9.944
                  it_list: 137.694
              it_pyvector: 199.702
     it_pyvector_iterator: 218.699
               it_nparray: 1516.080
================================================================================ 

================================================================================
Timings for n = 1000000:
--------------------------------------------------------------------------------
             cit_pyvector:   1.000
              cit_memview:   1.056
    cit_pyvector_iterator:   1.197
              vec_nparray:   2.516
                 cit_list:   7.089
                  it_list:  87.099
     it_pyvector_iterator: 143.232
              it_pyvector: 162.374
               it_nparray: 897.602
================================================================================ 

================================================================================
Timings for n = 10000000:
--------------------------------------------------------------------------------
             cit_pyvector:   1.000
              cit_memview:   1.004
    cit_pyvector_iterator:   1.060
              vec_nparray:   2.721
                 cit_list:   7.714
                  it_list:  88.792
     it_pyvector_iterator: 130.116
              it_pyvector: 149.497
               it_nparray: 872.798
================================================================================

赛通代码:

%%cython --annotate

# distutils: language = c++
# cython: boundscheck = False
# cython: wraparound = False

from libcpp.vector cimport vector
from cython.operator cimport dereference as deref, preincrement as princ


# Extension type wrapping a vector
cdef class pyvector:
    cdef vector[long] _data

    cpdef void push_back(self, long x):
        self._data.push_back(x)

    def __iter__(self):
        cdef size_t i, n = self._data.size()

        for i in range(n):
            yield self._data[i]

    @property
    def data(self):
        return self._data


# Cython iteration over Python list      
cpdef long cit_list(list l):
    cdef:
        long j, ii
        size_t i, n = len(l)

    for i in range(n):
        ii = l[i]
        j = ii + 1  
    return j


# Cython iteration over NumPy array
cpdef long cit_memview(long[::1] v) nogil:
    cdef:
        size_t i, n = v.shape[0]
        long j

    for i in range(n):
        j = v[i] + 1   
    return j


# Iterate over pyvector
cpdef long cit_pyvector(pyvector v) nogil:
    cdef:
        size_t i, n = v._data.size()
        long j

    for i in range(n):
        j = v._data[i] + 1
    return j    


cpdef long cit_pyvector_iterator(pyvector v) nogil:
    cdef:
        vector[long].iterator it = v._data.begin()
        long j

    while it != v._data.end():
        j = deref(it) + 1
        princ(it)
    return j

Python代码:

# Python iteration over Python list
def it_list(l):
    for i in l:
        j = i + 1    
    return j


# Python iteration over NumPy array
def it_nparray(a):
    for i in a:
        j = i + 1    
    return j


# Vectorised NumPy operation
def vec_nparray(a):
    a + 1
    return a[-1]


# Python iteration over C++ vector extension type
def it_pyvector_iterator(v):
    for i in v:
        j = i + 1
    return j


def it_pyvector(v):
    for i in v.data:
        j = i + 1
    return j

对于基准:

import numpy as np
from operator import itemgetter


def bm(sizes):
    """Call functions with data structures of varying length"""

    Timings = {}
    for n in sizes:
        Timings[n] = {}

        # Python list
        list_ = list(range(n))

        # NumPy array
        a = np.arange(n, dtype=np.int64)

        # C++ vector extension type
        pyv = pyvector()
        for i in range(n):
            pyv.push_back(i)


        calls = [
            (it_list, list_),
            (cit_list, list_),
            (it_nparray, a),
            (vec_nparray, a),
            (cit_memview, a),
            (it_pyvector, pyv),
            (it_pyvector_iterator, pyv),
            (cit_pyvector, pyv),
            (cit_pyvector_iterator, pyv),
        ]

        for fxn, arg in calls:
            Timings[n][fxn.__name__] = %timeit -o fxn(arg)

    return Timings


def ratios(timings, base=None):
    """Show relative performance of runs based on `timings` dict"""

    if base is not None:
        base = timings[base].average
    else:
        base = min(x.average for x in timings.values())

    return sorted([
        (k, v.average / base)
        for k, v in timings.items()
        ], key=itemgetter(1))


Timings = {}
sizes = [1, 1000, 1000000, 10000000]
Timings.update(bm(sizes))
for s in sizes:
    print("=" * 80)
    print(f"Timings for n = {s}:")
    print("-" * 80)
    for x in ratios(Timings[s]):
        print(f"{x[0]:>25}: {x[1]:7.3f}")
    print("=" * 80, "\n")
4

0 回答 0