我想从采样数据构建一个网格。我可以使用机器学习 - 聚类算法,如 k-means,但我想限制中心大致均匀分布。
我想出了一种使用 scikit-learn 最近邻搜索的方法:随机选择一个点,删除半径 r 内的所有点,然后重复。这很好用,但想知道是否有人有更好(更快)的方式来做到这一点。
作为对评论的回应,我尝试了两种替代方法,一种结果慢得多,另一种几乎相同......
方法0(我的第一次尝试):
def get_centers0(X, r):
N = X.shape[0]
D = X.shape[1]
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
while N > 0:
nearest.fit(X)
x = X[int(random()*N), :]
_, del_x = nearest.radius_neighbors(x)
X = np.delete(X, del_x[0], axis = 0)
grid = np.vstack([grid, x])
N = X.shape[0]
return grid
方法 1(使用预计算图):
def get_centers1(X, r):
N = X.shape[0]
D = X.shape[1]
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
nearest.fit(X)
graph = nearest.radius_neighbors_graph(X)
#This method is very slow even before doing any 'pruning'
方法二:
def get_centers2(X, r, k):
N = X.shape[0]
D = X.shape[1]
k = k
grid = np.zeros([0,D])
nearest = near.NearestNeighbors(radius = r, algorithm = 'auto')
while N > 0:
nearest.fit(X)
x = X[np.random.randint(0,N,k), :]
#min_dist = near.NearestNeighbors().fit(x).kneighbors(x, n_neighbors = 1, return_distance = True)
min_dist = dist(x, k, 2, np.ones(k)) # where dist is a cython compiled function
x = x[min_dist < 0.1,:]
_, del_x = nearest.radius_neighbors(x)
X = np.delete(X, del_x[0], axis = 0)
grid = np.vstack([grid, x])
N = X.shape[0]
return grid
运行这些如下:
N = 50000
r = 0.1
x1 = np.random.rand(N)
x2 = np.random.rand(N)
X = np.vstack([x1, x2]).T
tic = time.time()
grid0 = get_centers0(X, r)
toc = time.time()
print 'Method 0: ' + str(toc - tic)
tic = time.time()
get_centers1(X, r)
toc = time.time()
print 'Method 1: ' + str(toc - tic)
tic = time.time()
grid2 = get_centers2(X, r)
toc = time.time()
print 'Method 1: ' + str(toc - tic)
方法0和2差不多...
Method 0: 0.840130090714
Method 1: 2.23365592957
Method 2: 0.774812936783