python - 来自 numpy 操作的 numpy.memmap

Question

我正在使用从大图像文件创建的相当大的数组。我在使用太多内存时遇到问题，并决定尝试使用numpy.memmap数组而不是标准的numpy.array. 我能够创建 amemmap并将数据从我的图像文件中分块加载到其中，但我不确定如何将操作结果加载到memmap.

例如，我的图像文件numpy作为二进制整数数组读入。我编写了一个函数，该函数将任何单元格区域缓冲（扩展）True指定数量的单元格。此函数将输入数组转换为Booleanusing array.astype(bool)。我将如何制作Boolean由数组创建array.astype(bool)的新numpy.memmap数组？

此外，如果有一个True单元格比指定的缓冲区距离更靠近输入数组的边缘，则该函数将向数组的边缘添加行和/或列，以允许围绕现有True单元格的完整缓冲区。这会改变数组的形状。可以改变 a 的形状numpy.memmap吗？

这是我的代码：

def getArray(dataset):
    '''Dataset is an instance of the GDALDataset class from the
    GDAL library for working with geospatial datasets

    '''
    chunks = readRaster.GetArrayParams(dataset, chunkSize=5000)
    datPath = re.sub(r'\.\w+$', '_temp.dat', dataset.GetDescription())
    pathExists = path.exists(datPath)
    arr = np.memmap(datPath, dtype=int, mode='r+',
                    shape=(dataset.RasterYSize, dataset.RasterXSize))
    if not pathExists:
        for chunk in chunks:
            xOff, yOff, xWidth, yWidth = chunk
            chunkArr = readRaster.GetArray(dataset, *chunk)
            arr[yOff:yOff + yWidth, xOff:xOff + xWidth] = chunkArr
    return arr

def Buffer(arr, dist, ring=False, full=True):
    '''Applies a buffer to any non-zero raster cells'''
    arr = arr.astype(bool)
    nzY, nzX = np.nonzero(arr)
    minY = np.amin(nzY)
    maxY = np.amax(nzY)
    minX = np.amin(nzX)
    maxX = np.amax(nzX)
    if minY - dist < 0:
        arr = np.vstack((np.zeros((abs(minY - dist), arr.shape[1]), bool),
                         arr))
    if maxY + dist >= arr.shape[0]:
        arr = np.vstack((arr,
                         np.zeros(((maxY + dist - arr.shape[0] + 1), arr.shape[1]), bool)))
    if minX - dist < 0:
        arr = np.hstack((np.zeros((arr.shape[0], abs(minX - dist)), bool),
                         arr))
    if maxX + dist >= arr.shape[1]:
        arr = np.hstack((arr,
                         np.zeros((arr.shape[0], (maxX + dist - arr.shape[1] + 1)), bool)))
    if dist >= 0: buffOp = binary_dilation
    else: buffOp = binary_erosion
    bufDist = abs(dist) * 2 + 1
    k = np.ones((bufDist, bufDist))
    bufArr = buffOp(arr, k)
    return bufArr.astype(int)

score 1 · Accepted Answer

让我试着回答你问题的第一部分。将结果加载到 memmap 数据存储中。

注意我将假设磁盘上已经有一个 memmap 文件——它将是输入文件。调用MemmapInput，创建如下：

fpInput = np.memmap('MemmapInput', dtype='bool', mode='w+', shape=(3,4))
del fpInput
fpOutput = np.memmap('MemmapOutput', dtype='bool', mode='w+', shape=(3,4))
del fpOutput

在您的情况下，输出文件可能不存在，但根据文档：'r+' 打开现有文件进行读写。

'w+' 创建或覆盖现有文件以进行读写。

因此，第一次创建 memmap 文件时，它必须带有 'w+'，之后要修改/覆盖文件，使用 'r+'，可以使用 'r' 获得只读副本。有关更多信息，请参阅http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html。

现在我们将读入这个文件并对其执行一些操作。要点是将结果加载到 memamp 文件中，首先必须创建 memmap 文件并将其附加到文件中。

fpInput = np.memmap('MemmapInput', dtype='bool', mode='r', shape=(3,4))
fpOutput = np.memmap('MemmapOutput', dtype='bool', mode='r+', shape=(3,4))

使用 fpOutput memmap 文件做任何你想做的事情，例如：

i,j = numpy.nonzero(fpInput==True)
for indexI in i:
  for indexJ in j:
    fpOutput[indexI-1,indexJ] = True
    fpOutput[indexI, indexJ-1] = True
    fpOutput[indexI+1, indexJ] = True
    fpOutput[indexI, indexJ+1] = True

python - 来自 numpy 操作的 numpy.memmap

1 回答 1

Related

Reference