有点晚了,但也许这个答案对其他人有用,如果不是对你...
我已经用 Numpy 和 Pandas 做到了这一点,而且速度非常快。我正在使用 TLS 数据,并且可以在一台像样的 2009 年老式笔记本电脑上使用数百万个数据点毫无问题地做到这一点。关键是通过对数据进行四舍五入“分箱”,然后使用 Pandas 的 GroupBy 方法进行聚合并计算均值。
如果你需要四舍五入到 10 的幂,你可以使用 np.round,否则你可以通过创建一个函数来四舍五入到任意值,我通过修改这个 SO answer来完成。
import numpy as np
import pandas as pd
# make rounding function:
def round_to_val(a, round_val):
return np.round( np.array(a, dtype=float) / round_val) * round_val
# load data
data = np.load( 'shape of ndata, 3')
n_d = data.shape[0]
# round the data
d_round = np.empty( [n_d, 5] )
d_round[:,0] = data[:,0]
d_round[:,1] = data[:,1]
d_round[:,2] = data[:,2]
del data # free up some RAM
d_round[:,3] = round_to_val( d_round[:,0], 0.5)
d_round[:,4] = round_to_val( d_round[:,1], 0.5)
# sorting data
ind = np.lexsort( (d_round[:,4], d_round[:,3]) )
d_sort = d_round[ind]
# making dataframes and grouping stuff
df_cols = ['x', 'y', 'z', 'x_round', 'y_round']
df = pd.DataFrame( d_sort)
df.columns = df_cols
df_round = df[['x_round', 'y_round', 'z']]
group_xy = df_round.groupby(['x_round', 'y_round'])
# calculating the mean, write to csv, which saves the file with:
# [x_round, y_round, z_mean] columns. You can exit Python and then start up
# later to clear memory if that's an issue.
group_mean = group_xy.mean()
group_mean.to_csv('your_binned_data.csv')
# Restarting...
import numpy as np
from scipy.interpolate import griddata
binned_data = np.loadtxt('your_binned_data.csv', skiprows=1, delimiter=',')
x_bins = binned_data[:,0]
y_bins = binned_data[:,1]
z_vals = binned_data[:,2]
pts = np.array( [x_bins, y_bins])
pts = pts.T
# make grid (with borders rounded to 0.5...)
xmax, xmin = 640000.5, 637000
ymax, ymin = 6070000.5, 6067000
grid_x, grid_y = np.mgrid[640000.5:637000:0.5, 6067000.5:6070000:0.5]
# interpolate onto grid
data_grid = griddata(pts, z_vals, (grid_x, grid_y), method='cubic')
# save to ascii
np.savetxt('data_grid.txt', data_grid)
完成此操作后,我将输出保存为 .npy 并使用 Image 库转换为 tiff,然后在 ArcMap 中进行地理配准。osgeo 可能有一种方法可以做到这一点,但我没有使用它。
希望这至少可以帮助某人...