2

目前,似乎无法将多项式预测为光栅砖gbm模型预测为光栅砖。但是请注意,对于相对较小的栅格网格,有一种简单的方法可以解决这个问题 - 这将在下面解释。但是,当您处理大型栅格、许多类(在我的例子中是植被群落)和预测变量时,这里的过程非常缓慢并且并非没有挑战。我希望下面的信息可能对遇到同样挑战的任何人有用。

下面我尝试使用多项 gbm 模型和 20 个预测变量来预测 36 个植被群落的发生概率。我的研究区域是一个具有 213,000,000 像素的 30x30m 栅格网格 - 但是下面的代码与我用来开发/测试该过程的 1221 个单元格的小网格有关。

> require (gbm)
> require (raster)
> require (rgdal)

> load("gbmmodel_p20.Rda") 

> print(gbmmodel)

gbm(formula = as.formula(Nclustal_1 ~ tcd_coast_disa_f + tce_raddq_f + 
tce_radwq_f + tct_temp_minwin_f + tct_tempdq_f + tcw_clim_etaaann_f + 
tcw_precipseas_f + tcw_precipwq_f + tcw_rain1mm_f + tdd_strmdstge6_i + 
tlf_logre10_f + tlf_rough0500_f + trs_land_pfc_2008 + trs88_sspr_g_50p + 
trs88_ssum_b_50p + trs88_ssum_d_50p + tsp_bd200_f + tsp_cly200a_f + 
tsp_ph200_f + tsp_tn060a_f), distribution = "multinomial", 
data = gbmdata, n.trees = 2500, interaction.depth = 2, n.minobsinnode = 3, 
shrinkage = 0.003, bag.fraction = 0.75, train.fraction = 1, 
cv.folds = 8, keep.data = TRUE, verbose = TRUE, class.stratify.cv = TRUE, 
n.cores = 8)

A gradient boosted model with multinomial loss function.2500 iterations were performed.
The best cross-validation iteration was 2500.
There were 20 predictors of which 20 had non-zero influence.

我将预测变量堆叠到栅格堆栈中,如下所示:

> img.files <- list.files("/mnt/scratch/mcilwea/R/TSG/inmodel20_test",
pattern='\\.img$', full.names=TRUE)
> rasStack <- stack(img.files)
> NAvalue(rasStack) <- -9999
> projection(rasStack)
"+proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs"

检查 rasStack 中的名称是否与上述模型中的名称相同很重要

> names(rasStack)
 [1] "tcd_coast_disa_f"   "tce_raddq_f"        "tce_radwq_f"
 [4] "tct_tempdq_f"       "tct_temp_minwin_f"  "tcw_clim_etaaann_f"
 [7] "tcw_precipseas_f"   "tcw_precipwq_f"     "tcw_rain1mm_f"
[10] "tdd_strmdstge6_i"   "tlf_logre10_f"      "tlf_rough0500_f"
[13] "trs88_sspr_g_50p"   "trs88_ssum_b_50p"   "trs88_ssum_d_50p"
[16] "trs_land_pfc_2008"  "tsp_bd200_f"        "tsp_cly200a_f"
[19] "tsp_ph200_f"        "tsp_tn060a_f"

在运行 predict.gbm 之前,我调用了最好的迭代模型

> best.iter <- gbm.perf(gbmmodel, method = "cv", plot.it = TRUE)

我可以通过将网格单元转换为一组空间点(如下所示),为我的测试区域创建一组光栅输出图像,该区域有 1221 个单元。

points<-raster(img.files[1]) 
points.df <- as.data.frame(rasterToPoints(points)) 
coordinates(points.df) <- ~x+y 
plot(points.df)
coords <- coordinates(points.df)
rasterOut <- extract(rasStack, coords)
outTable<- as.data.frame(cbind(coords, rasterOut))
outTable[1:1,1:22]

         x        y tcd_coast_disa_f tce_raddq_f tce_radwq_f tct_temp_minwin_f tct_tempdq_f
149.1269 -35.6457         1.052329    10.82778    23.63533        -0.9852222     5.928154
  tcw_clim_etaaann_f tcw_precipseas_f tcw_precipwq_f tcw_rain1mm_f tdd_strmdstge6_i tlf_logre10_f
600         13.93321       179.9841       80.2064              491      1.945529
  tlf_rough0500_f trs_land_pfc_2008 trs88_sspr_g_50p trs88_ssum_b_50p trs88_ssum_d_50p tsp_bd200_f
15.6701                 0             0.38       0.09000003             0.55    1.590021
  tsp_cly200a_f tsp_ph200_f tsp_tn060a_f
33.33834    5.648166   0.03193555

运行 predict.gbm 模型

predtable <- as.data.frame(predict.gbm(gbmmodel, outTable, n.trees=best.iter, type="response"))
predout <- cbind(coords,predtable)
predout[1:1,1:38]

             x        y    e24.2500     e26.2500    e59.2500 g152.2500    g157.2500     g94.2500   m31.2500
    149.1269 -35.6457 0.001286283 0.0006473167 0.002043077 0.4973372 8.686316e-05 0.0006710651 0.01067058
         m36.2500    m68.2500    MU11.2500    MU45.2500 OTHER.2500  p14.2500     p15.2500     p17.2500
    0.004314056 0.007128109 0.0005012718 0.0006254022  0.1727706 0.1411112 0.0009099294 0.0002520156
         p19.2500    p20.2500     p22.2500   p220.2500    p23.2500   p24.2500    p27.2500   p338.2500
    0.003205936 0.002534798 0.0001474091 0.001214219 0.008455798 0.01701965 0.001879607 0.002238932
        p420.2500  p520.2500     p54.2500    p9.2500    u118.2500  u179.2500  u21.2500    u22.2500
   0.001456685 0.00108458 0.0003695966 0.02501649 0.0005977814 0.01711885 0.0558054 0.002357498
        u23.2500    u27.2500     u28.2500   u78.2500   Unit5.2500
   0.00040357 0.001422519 0.0002764237 0.01699094 4.835942e-05

    write.csv(predout, "Predout.csv", row.names=TRUE)

我可以通过以下方式将 predtable 中的出现概率值写入一组 36 个新光栅图像:

names <- names(predtable)
for (i in 1:length(names)) { 
  SpatialPointspredTable <- SpatialPointsDataFrame (coords=coords, data=predtable[i])
  gridded(SpatialPointspredTable)=TRUE
  rasValues <- raster(SpatialPointspredTable)
  projection(rasValues) <- "+proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs"
  plot(rasValues)
  writeRaster(rasValues, filename=names[i], format="HFA", overwrite=TRUE)
}

这给了我想要的输出 - 但是 - 而不是必须预测到数据帧 - 如果可以直接预测到 rasterbrick,该过程将更快,更有效。

如果我跑

predict(rasStack,
         gbmmodel,
         n.trees=best.iter,
         filename="multiclass_BRT_20p_test_idrisi",
         format="IDRISI",
         na.rm=FALSE,
         type="response",
         overwrite=TRUE,
         progress="text",
         cores=8)

输出是代表我要预测的第一个植被群落的栅格网格:

|=========================================================| 100%

class       : RasterLayer
dimensions  : 33, 37, 1221  (nrow, ncol, ncell)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : 149.1268, 149.1371, -35.65473, -35.64556  (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : /mnt/scratch/mcilwea/R/TSG/multiclass_BRT_20p_test_idrisi.rdc
names       : layer
values      : 3.762369e-06, 0.9337785  (min, max)

IDRISI 文件格式不支持多波段图像,因此我无法将 index=1:36 添加到混合中以生成多波段光栅砖作为输出。如果我尝试这样做 - 设置 format="GTiff" 或 "HFA"(或任何其他需要 rgdal 的格式),我会收到错误消息:

"Error in rgdal::putRasterData(x@file@transient, v, band = 1, offset = off) : 光栅 IO 失败”

但是,如果我设置格式 =“raster”,我可以获得 rasterbrick 输出,但这不会让我读/写除 idrisi 图像(predict.gbm 模型的第一个输出列)中的数据以外的任何数据


“警告消息:在 .rasterFromRasterFile(grdfile, band = band, objecttype, ...) 中:值文件的大小与单元格的数量不匹配(给定数据类型)”

predrast <- predict(object=rasStack,
        model=gbmmodel,
        n.trees=best.iter,
        filename="multi_test",
        fun=predict.gbm,
        format="raster",
        index=1:5,
        bandorder="BIL",
        ext=extent(rasStack[[1:20]]), 
        na.rm=FALSE,
        type="response",
        datatype="FLT4S",
        overwrite=TRUE,
        progress="text",
        cores=8) 
|=====================================================================100%

predrast

class       : RasterBrick 
dimensions  : 33, 37, 1221, 5  (nrow, ncol, ncell, nlayers)
resolution  : 0.0002777778, 0.0002777778  (x, y)
extent      : 149.1268, 149.1371, -35.65473, -35.64556  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +ellps=GRS80 +towgs84=-16.237,3.51,9.939,1.4157e-006,2.1477e-006,1.3429e-006,1.91e-007 +no_defs 
data source : C:\Data\FINAL_TSG\test\multi_test.grd 
names       :      layer.1,      layer.2,      layer.3,      layer.4,      layer.5 
min values  : 3.762369e-06, 3.762369e-06, 3.762369e-06, 3.762369e-06, 3.762369e-06 
max values  :    0.9337785,    0.9337785,    0.9337785,    0.9337785,    0.9337785 

如果我尝试将上面的光栅砖转换为一组单独的光栅图像

writeRaster(predrast, filename="multi_test.img", format="HFA", bylayer=TRUE, suffix="numbers", overwrite=TRUE)

这些图像都没有任何意义。

这也有点令人费解,如果我尝试写为多波段 CDF 图像,我会收到一组不同的 rgdal 错误警告消息:

    |   0%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |========                                                       |  25%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |==================                                              |  50%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 11988 values, but the passed data array only has 11840 entries!
  |===============================================                   |  75%
Error in ncdf::put.var.ncdf(nc, x@title, v, start = c(1, start, lstart),  : 
  put.var.ncdf: error: you asked to write 7992 values, but the passed data array only has 7955 entries!
  |=============================================================| 100%

在这里,我不确定发生了什么?

如果有知识的人可以与 gbm 包的作者合作,使其可以直接预测到光栅砖,而不会遇到上述任何问题,那就太好了。

如果有人想知道我在完整栅格网格上使用的代码,请在下面发表评论,我很乐意提供。

欢呼艾伦

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ncdf_1.6.8      rgdal_0.9-1     gbm_2.1         lattice_0.20-30 survival_2.37-7 raster_2.3-24   sp_1.0-17      

loaded via a namespace (and not attached):
[1] grid_3.1.2  tools_3.1.2

# Traceback error for
Error in rgdal::putRasterData(x@file@transient, v, band = 1, offset = off) : 
  Failure during raster IO

> traceback()
7: .Call("RGDAL_PutRasterData", raster, rasterData, as.integer(offset), 
       PACKAGE = "rgdal")
6: rgdal::putRasterData(x@file@transient, v, band = 1, offset = off)
5: writeValues(predrast, predv, tr$row[i])
4: writeValues(predrast, predv, tr$row[i])
3: .local(object, ...)
2: predict(object = rasStack, model = gbmmodel, n.trees = best.iter, 
       filename = "multi_img", format = "HFA", na.rm = FALSE, type = "response", 
       datatype = "FLT4S", overwrite = TRUE, progress = "text")
1: predict(object = rasStack, model = gbmmodel, n.trees = best.iter, 
       filename = "multi_img", format = "HFA", na.rm = FALSE, type = "response", 
       datatype = "FLT4S", overwrite = TRUE, progress = "text")
4

0 回答 0