我正在尝试将 ddply 应用于大型 data.frame(38000 行/10 个变量),但我遇到了一个错误:
ddply(uncertainty.long, .(Species), "nrow")
返回错误:
Error in attributes(out) <- attributes(col) :
'names' attribute [38000] must be the same length as the vector [3800]
> traceback()
11: FUN(1:10[[5L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: (function (i)
{
piece <- pieces[[i]]
if (.inform) {
res <- try(.fun(piece, ...))
if (inherits(res, "try-error")) {
piece <- paste(capture.output(print(piece)), collapse = "\n")
stop("with piece ", i, ": \n", piece, call. = FALSE)
}
}
else {
res <- .fun(piece, ...)
}
progress$step()
res
})(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress,
.inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(uncertainty.long, .(Species), "nrow")
有关我的 data.frame 的更多详细信息:
> head(uncertainty.long)
Stack Variable PARun Model Species value year scenario GCM sp
1 sync_current Total PA1 GLM Arctosafulvolineata 100.0000 NA <NA> <NA> Arctosa\nfulvolineata
2 sync_cgcm2_B2A_2020 Total PA1 GLM Arctosafulvolineata 134.6840 2020 B2A cgcm2 Arctosa\nfulvolineata
3 sync_cgcm2_B2A_2050 Total PA1 GLM Arctosafulvolineata 153.7617 2050 B2A cgcm2 Arctosa\nfulvolineata
4 sync_cgcm2_B2A_2080 Total PA1 GLM Arctosafulvolineata 195.7176 2080 B2A cgcm2 Arctosa\nfulvolineata
5 sync_mk2_B2A_2020 Total PA1 GLM Arctosafulvolineata 172.2967 2020 B2A mk2 Arctosa\nfulvolineata
6 sync_mk2_B2A_2050 Total PA1 GLM Arctosafulvolineata 198.9391 2050 B2A mk2 Arctosa\nfulvolineata
> str(uncertainty.long)
'data.frame': 38000 obs. of 10 variables:
$ Stack : Factor w/ 19 levels "sync_cgcm2_B2A_2020",..: 7 1 2 3 14 15 16 11 12 13 ...
$ Variable: Factor w/ 5 levels "Lost","NetChange",..: 5 5 5 5 5 5 5 5 5 5 ...
$ PARun : Factor w/ 5 levels "PA1","PA2","PA3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Model : Factor w/ 8 levels "CTA","FDA","GAM",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr "1" "1" "1" "1" ...
$ value : num 100 135 154 196 172 ...
$ year : num NA 2020 2050 2080 2020 2050 2080 2020 2050 2080 ...
$ scenario: chr NA "B2A" "B2A" "B2A" ...
$ GCM : chr NA "cgcm2" "cgcm2" "cgcm2" ...
$ sp : chr "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" ...
这是我的 sessionInfo():
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] parallel splines grid stats graphics grDevices utils datasets methods base
other attached packages:
[1] reshape2_1.2.2 Hmisc_3.12-2 Formula_1.1-1 RCurl_1.95-4.1 bitops_1.0-6 biomod2_3.0.3 pROC_1.5.4 plyr_1.8
[9] rpart_4.1-3 randomForest_4.6-7 mda_0.4-4 class_7.3-9 gbm_2.1 survival_2.37-4 nnet_7.3-7 rasterVis_0.21
[17] hexbin_1.26.2 latticeExtra_0.6-26 RColorBrewer_1.0-5 lattice_0.20-23 abind_1.4-0 raster_2.1-49 sp_1.0-13 ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] cluster_1.14.4 colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3 gtable_0.1.2 labeling_0.2 MASS_7.3-29 munsell_0.4.2 proto_0.3-10 scales_0.2.3
[11] stringr_0.6.2 tools_3.0.1 zoo_1.7-10
我试图用更少的列(2列)来重现它,它没有改变任何东西。但是,如果我减少行数,它可以在请求的变量“Species”只有一个级别值时工作:
> small.df <- uncertainty.long[1:3800, ]
> unique(small.df$Species)
[1] Arctosafulvolineata
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> ddply(small.df, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
但如果我有另一行:
> small.df <- uncertainty.long[1:3801, ]
> unique(small.df$Species)
[1] Arctosafulvolineata Argyronetaaquatica
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> small.df[3800:3801, ]
Stack Variable PARun Model Species value year scenario GCM sp
3800 sync_hadcm3_A1B_2080 Lost PA5 MAXENT Arctosafulvolineata -54.90872 2080 A1B hadcm3 Arctosa\nfulvolineata
3801 sync_current Total PA1 GLM Argyronetaaquatica 100.00000 NA <NA> <NA> Argyroneta\naquatica
> ddply(small.df, .(Species), "nrow")
Error in attributes(out) <- attributes(col) :
'names' attribute [3801] must be the same length as the vector [3800]
我发现其他人也有类似的问题:https ://stackoverflow.com/a/14162351/2788395 。
但是,他们的解决方法(重新安装 plyr 1.7 而不是 1.8)对我不起作用。有没有人知道这个问题和/或如何解决它?
谢谢!
问题已解决 问题出在“物种”列的“名称”属性上。我使用以下代码删除了它们并且 ddply 工作:
> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
Species nrow
1 Arctosafulvolineata 3800
2 Argyronetaaquatica 3800
3 Dolomedesplantarius 3800
4 Enoplognathamordax 3800
5 Iciussubinermis 3800
6 Neonvalentulus 3800
7 Pardosabifasciata 3800
8 Pardosaoreophila 3800
9 Piratauliginosus 3800
10 Trochosaspinipalpis 3800