python - NumPy memmap 性能问题

Question

我有一个存储为 NumPy 内存映射的大型 (75000 x 5 x 6000) 3D 数组。如果我像这样简单地迭代第一个维度：

import numpy as np
import time

a = np.memmap(r"S:\bin\Preprocessed\mtb.dat", dtype='float32', mode='r', shape=(75000, 5, 6000))
l = []
start = time.time()
index = np.arange(75000)
np.random.shuffle(index)
for i in np.array(index):
    l.append(np.array(a[i]) * 0.7)
print(time.time() - start)

>>> 0.503

迭代发生得非常快。但是，当我尝试在更大程序的上下文中迭代同一个 memmap 时，对 memmap 的单独调用将花费多达 0.1 秒，而提取所有 75000 条记录将花费近 10 分钟。

较大的程序太长无法在这里重现，所以我的问题是：是否有任何已知问题会导致 memmap 访问显着减慢，也许如果 Python 内存中保存了大量数据？

在较大的程序中，用法如下所示：

import time
array = np.memmap(self.path, dtype='float32', mode='r', shape=self.shape)
for i, (scenario_id, area) in enumerate(self.scenario_areas):
    address = scenario_matrix.lookup.get(scenario_id)
    if address:
        scenario_output = array[address]
        output_total = scenario_output * float(area)
        cumulative += output_total  # Add results to cumulative total
        contributions[int(scenario_id.split("cdl")[1])] = output_total[:2].sum()
del array

第二个示例需要 10 多分钟才能执行。对线场景输出 = array[address]的计时，它只是从 memmap 中提取记录，在 0.0 和 0.5 之间变化 -提取一条记录需要半秒。

score 2 · Accepted Answer

据我所知，python 中的 memmap 没有任何独立于一般操作系统级别限制的限制。所以我猜你要么有操作系统级别的内存瓶颈（可能是不同大型 mmap 的缓存之间的交互），要么你的问题出在其他地方。

您已经有一个参考实现来显示操作应该有多快，这非常好。您需要系统地测试不同的可能原因。以下是一些有助于确定原因的方向。

首先，在参考实现中使用 cProfile 以更好地了解瓶颈在哪里。您将获得函数调用列表以及每个函数花费的时间。这可能会导致意想不到的结果。一些猜测：

大部分时间都花在您发布的代码中，这是真的吗？如果没有，分析可能会暗示另一个方向。
是self.scenario_areas类似列表的还是一个迭代器，它会进行一些隐藏且昂贵的计算？
可能是查找scenario_matrix.lookup.get(scenario_id)速度很慢。核实。
是contributions常规的 python 列表或 dict 还是在幕后分配上做任何奇怪的事情？

只有当你确认时间实际上是花在线路上scenario_output = array[address]时，我才会开始假设 mmap 文件之间的交互。如果是这种情况，请开始注释掉涉及其他内存访问的部分代码，并重复分析代码以更好地了解发生的情况。

我希望这有帮助。

score 0 · Accepted Answer

您可能无法使用 np.memmap 避免性能问题，

我建议尝试类似https://turi.com/products/create/docs/generated/graphlab.SFrame.html

SFrame/SArray 让您可以直接从磁盘读取表格数据，这对于大型数据文件通常会更快。

它是开源的，可在https://github.com/turi-code/SFrame获得

python - NumPy memmap 性能问题

2 回答 2

Related

Reference