python - 如何在不将整个图像加载到内存的情况下将单个 tif 图像部分加载到 numpy 数组中？

Question

所以需要处理一个 4GB 的 .TIF 图像，作为内存限制，我无法将整个图像加载到 numpy 数组中，所以我需要从硬盘中懒惰地加载它。所以基本上我需要并且需要在 python 中作为项目要求来完成。我还尝试在 PyPi tifffile中寻找 tifffile 库，但没有发现任何有用的信息，请帮忙。

score 1 · Accepted Answer

pyvips可以做到这一点。例如：

import sys
import numpy as np
import pyvips

image = pyvips.Image.new_from_file(sys.argv[1], access="sequential")

for y in range(0, image.height, 100):
    area_height = min(image.height - y, 100)
    area = image.crop(0, y, image.width, area_height)
    array = np.ndarray(buffer=area.write_to_memory(),
                       dtype=np.uint8,
                       shape=[area.height, area.width, area.bands])

打开顺序模式的access选项new_from_file：pyvips 只会按需从文件中加载像素，但限制是您必须从上到下读取像素。

循环以 100 条扫描线为单位向下运行图像。当然，你可以调整这个。

我可以这样运行它：

$ vipsheader eso1242a-pyr.tif 
eso1242a-pyr.tif: 108199x81503 uchar, 3 bands, srgb, tiffload_stream
$ /usr/bin/time -f %M:%e ./sections.py ~/pics/eso1242a-pyr.tif
273388:479.50

因此，在这台可悲的旧笔记本电脑上，扫描一张 108,000 x 82,000 像素的图像需要 8 分钟，并且需要 270mb 的峰值内存。

你在做什么处理？您可能可以在 pyvips 中完成所有操作。它比 numpy 快很多。

score 1 · Accepted Answer

import pyvips
img = pyvips.Image.new_from_file("space.tif", access='sequential')
out = img.resize(0.01, kernel = "linear")
out.write_to_file("resied_image.jpg")

如果您想将文件转换为具有较小大小的其他格式，则此代码就足够了，并且可以帮助您在没有任何内存峰值的情况下并且在很短的时间内完成...

python - 如何在不将整个图像加载到内存的情况下将单个 tif 图像部分加载到 numpy 数组中？

2 回答 2

Related

Reference