python - 如何在不耗尽内存的情况下对缺少记录的大型 xyz 文件进行网格化

Question

我有需要网格化的 xyz 文本文件。对于每个 xyz 文件，我都有关于原点坐标单元大小和行/列数的信息。但是，xyz 文件中缺少没有 z 值的记录，因此仅从当前记录创建网格会因为缺少值而失败。所以我尝试了这个：

nxyz = np.loadtxt(infile,delimiter=",",skiprows=1)

ncols = 4781
nrows = 4405
xllcorner = 682373.533843
yllcorner = 205266.898604
cellsize = 1.25

grid = np.zeros((nrows,ncols))

for item in nxyz:
    idx = (item[0]-xllcorner)/cellsize
    idy = (item[1]-yllcorner)/cellsize
    grid[idy,idx] = item[2]

outfile = open(r"e:\test\myrasout.txt","w")
np.savetxt(outfile,grid[::-1], fmt="%.2f",delimiter= " ")
outfile.close()

这让我得到了 xyz 文件中没有记录的零网格。它适用于较小的文件，但对于 290Mb 大小的文件（约 8900000 条记录）出现内存不足错误。这不是我必须处理的最大文件。

因此，我尝试了 Joe Kington 的另一种（迭代）方法，我在这里找到了加载 xyz 文件的方法。这适用于 290MB 文件，但在下一个更大的文件（533MB，~15600000 条记录）上出现内存不足错误而失败。

如何在不耗尽内存的情况下正确网格化这些较大的文件（考虑丢失的记录）？

score 2 · Accepted Answer

根据评论，我将代码更改为

ncols = 4781
nrows = 4405
xllcorner = 682373.533843
yllcorner = 205266.898604
cellsize = 1.25
grid = np.zeros((nrows,ncols))

with open(file) as f:
    for line in f:
        item = line.split() # fill with whatever is separating the values 
        idx = (item[0]-xllcorner)/cellsize
        idy = (item[1]-yllcorner)/cellsize
        #...

score 1 · Accepted Answer

您可以使用 NumPy 进行精美的索引。尝试使用这样的东西，而不是循环，这可能是你问题的根源：

grid = np.zeros((nrows,ncols))
grid[nxyz[:,0],nxyz[:,1]] = nxyz[:,2]

使用原点和像元大小的转换，它会涉及更多：

grid = np.zeros((nrows,ncols))
grid[(nxyz[:,0]-x11corner)/cellsize,(nxyz[:,1]-y11corner)/cellsize] = nxyz[:,2]

如果这没有帮助，则nxyz数组太大，但我对此表示怀疑。如果是，那么您可以分几个部分加载文本文件，然后按顺序对每个部分执行上述操作。

PS您可能知道文本文件中包含的数据范围，并且您可以通过在加载文件时明确说明这一点来限制内存使用。就像这样，如果您正在处理最多 16 位整数：np.loadtxt("myfile.txt", dtype=int16).

python - 如何在不耗尽内存的情况下对缺少记录的大型 xyz 文件进行网格化

2 回答 2

Related

Reference