21

我遇到了一个我似乎无法理解的内存问题。

我在具有 8GB 内存并运行 32 位 python 程序的 Windows 7 64 位机器上。

该程序读取 5,118 个压缩的 numpy 文件 (npz)。Windows 报告文件占用了 1.98 GB 磁盘空间

每个 npz 文件包含两条数据:“arr_0”是 np.float32 类型,“arr_1”是 np.uint8 类型

python 脚本读取每个文件,将它们的数据附加到两个列表中,然后关闭文件。

在文件 4284/5118 周围,程序抛出 MemoryException

但是任务管理器说python.exe *32出错时的内存使用量是1,854,848K ~= 1.8GB。远低于我的 8 GB 限制,或者 32 位程序的 4GB 限制。

在程序中,我发现内存错误并报告:每个列表的长度为 4285。第一个列表总共包含 1,928,588,480 个 float32 的 ~= 229.9 MB 数据。第二个列表包含 12,342,966,272 个 uint8 的 ~= 1,471.3MB 数据。

所以,一切似乎都在检查。除了我得到内存错误的部分。我绝对有更多的内存,它崩溃的文件是~800KB,所以它在读取一个大文件时不会失败。

此外,该文件没有损坏。如果我不事先用完所有内存,我可以很好地阅读它。

更令人困惑的是,所有这些似乎在我的 Linux 机器上都可以正常工作(虽然它确实有 16GB 的内存,而不是在我的 Windows 机器上的 8GB),但它似乎并不是机器的 RAM导致这个问题。

当我期望它应该能够分配另外 2GB 的数据时,为什么 Python 会抛出内存错误?

4

1 回答 1

39

I don't know why you think your process should be able to access 4GB. According to Memory Limits for Windows Releases at MSDN, on 64-bit Windows 7, a default 32-bit process gets 2GB.* Which is exactly where it's running out.

So, is there a way around this?

Well, you could make a custom build of 32-bit Python that uses the IMAGE_FILE_LARGE_ADDRESS_AWARE flag, and rebuild numpy and all of your other extension modules. I can't promise that all of the relevant code really is safe to run with the large-address-aware flag; there's a good chance it is, but unless someone's already done it and tested it, "a good chance" is the best anyone is likely to know.

Or, more obviously, just use 64-bit Python instead.


The amount of physical RAM is completely irrelevant. You seem to think that you have an "8GB limit" with 8GB of RAM, but that's not how it works. Your system takes all of your RAM plus whatever swap space it needs and divides it up between apps; an app may be able to get 20GB of virtual memory without getting a memory error even on an 8GB machine. And meanwhile, a 32-bit app has no way of accessing more than 4GB, and the OS will use up some of that address space (half of it by default, on Windows), so you can only get 2GB even on an 8GB machine that's not running anything else. (Not that it's possible to ever be "not running anything else" on a modern OS, but you know what I mean.)


So, why does this work on your linux box?

Because your linux box is configured to give 32-bit processes 3.5GB of virtual address space, or 3.99GB, or… Well, I can't tell you the exact number, but every distro I've seen for many years has been configured for at least 3.25GB.


* Also note that you don't even really get that full 2GB for your data; your program. Most of what the OS and its drivers make accessible to your code sits in the other half, but some bits sit in your half, along with every DLL you load and any space they need, and various other things. It doesn't add up to too much, but it's not zero.

于 2013-08-16T22:16:00.270 回答