r - Terra R - 使用自定义函数加速栅格数据的聚合（）

Question

我想使用包中的aggregate函数terra R以分位数方法聚合栅格作为聚合函数。下面，我使用quantile函数 fromR base使用本地包目录中的栅格计算第 50 个百分位数（即中位数）。我选择了第 50 个百分位数与中位数进行比较，但我的目标确实是计算其他分位数......

library(terra)
# load elevation coming with the terra pakage
r <- rast( system.file("ex/elev.tif", package="terra") )

plot(r)

# number of iteration
n_it <- 20

# with a custom function
start_time <- Sys.time()
for (i in 1:n_it){
ra <- aggregate(r, 2 , fun = function(x) quantile(x, probs = .5, na.rm = T))
}
end_time <- Sys.time()

我的电脑花了大约。6秒做20次。

print(end_time-start_time)

时差 6.052727 秒

当我aggregate使用中值内置函数运行相同的运行时，大约需要。执行相同的 20 次迭代的时间减少了 40 倍！

# with a built-in function
start_time <- Sys.time()
for (i in 1:n_it){
  ra <- aggregate(r, 2 , fun = median)
}
end_time <- Sys.time()
print(end_time-start_time)

时间差 0.1456101 秒

由于我想计算第 50 位以外的其他百分位数，有人可以提供一些建议来加快aggregate使用自定义函数的速度吗？

score 3 · Accepted Answer

aggregate()使用自定义函数本身并不慢。quantile()相反，使用而不是median()获得中位数更昂贵。这可能是由于计算本身的成本（terra 使用C++ 实现来计算比任意分位数更快的中位数），并且还因为quantile()执行更多检查并因此在过程中调用更多附加函数。当操作被多次执行时，这种更高的计算成本会增加aggregate。

如果您有一个更大的栅格，则使用该cores参数将计算分布在多个核心上可能是有益的，请参阅?terra::aggregate. 但是，我认为这不是elev数据的选择，因为开销太大。

如果你想调用aggregate许多不同probs的，你可以并行化循环，例如使用foreach包。

score 3 · Accepted Answer

根据回复，我测试了两个选项：使用tdigest包terra和来自包（cores参数）的内置并行化例程。Dunning 等人 (2019)的 t-Digest 构造算法使用一维 k 均值聚类的变体来生成非常紧凑的数据结构，从而可以准确估计分位数。我建议使用该tquantile函数，它可以将测试数据集的处理时间减少三分之一。

对于那些正在考虑foreach并行化的人来说，没有简单的解决方案可以使用 terra 对象运行 foreach 循环。对于此类任务，我仍在使用良好的旧光栅包。这是计划中的更新，但不是短期的 - 请参阅此处。更多详情如下。

玩具数据集

library(terra)
# load elevation coming with the terra pakage
r <- rast( system.file("ex/elev.tif", package="terra") )

plot(r)

# number of iteration
n_it <- 20

# With `stats::quantile()` function
start_time <- Sys.time()
for (i in 1:n_it){
  ra <- aggregate(r, 2 , fun = function(x) quantile(x, probs = .5, na.rm = T))
}
end_time <- Sys.time()
print(end_time-start_time)

时差 6.013551 秒

和`tdigest::tquantile()`

library(tdigest)

start_time_tdigest <- Sys.time()
for (i in 1:n_it){
  ra_tdigest <- aggregate(r, 2 , fun = function(x) tquantile(tdigest(na.omit(x)), probs = .5))
  }
end_time_tdigest <- Sys.time()
print(end_time_tdigest-start_time_tdigest)

时差 1.922526 秒

正如 Martin 所怀疑的那样，cores在函数中使用参数terra:aggregate并没有提高处理时间：

`stats::quantile()`+ 并行化

start_time_parallel <- Sys.time() 
for (i in 1:n_it){
ra_tdigest_parallel <- aggregate(r, 2 , fun = function(x) quantile(x, probs = .5, na.rm = T), cores = 2) 
} 
end_time_parallel <- Sys.time() 
print(end_time_parallel-start_time_parallel)

时差 8.537751 秒

`tdigest::tquantile()`+ 并行化

tdigest_quantil_terra <- function(x) {   
require(tdigest)   
tquantile(tdigest(na.omit(x)), probs = .5) }
        
start_time_tdigest_parallel <- Sys.time() for (i in 1:n_it){
ra_tdigest_parallel <- aggregate(r, 2 , 
fun = function(x, ff) ff(x), cores = 2 , 
ff = tdigest_quantil_terra)     
} 

end_time_tdigest_parallel <- Sys.time() 
print(end_time_tdigest_parallel-start_time_tdigest_parallel)

时差 7.520231 秒

简而言之：

1 tdigest 1.922526 秒

2 base_quantile 6.013551 秒

3 tdigest_parallel 7.520231 秒

4 base_quantile_parallel 8.537751 秒

r - Terra R - 使用自定义函数加速栅格数据的聚合（）

2 回答 2

玩具数据集

和tdigest::tquantile()

stats::quantile()+ 并行化

tdigest::tquantile()+ 并行化

Related

Reference

和`tdigest::tquantile()`

`stats::quantile()`+ 并行化

`tdigest::tquantile()`+ 并行化