3

我正在尝试将 ddply 应用于大型 data.frame(38000 行/10 个变量),但我遇到了一个错误:

ddply(uncertainty.long, .(Species), "nrow")

返回错误:

Error in attributes(out) <- attributes(col) : 
  'names' attribute [38000] must be the same length as the vector [3800]
> traceback()
11: FUN(1:10[[5L]], ...)
10: lapply(seq_len(n), extract_col_rows, df = x, i = i)
9: extract_rows(x$data, x$index[[i]])
8: `[[.indexed_df`(pieces, i)
7: pieces[[i]]
6: (function (i) 
   {
       piece <- pieces[[i]]
       if (.inform) {
           res <- try(.fun(piece, ...))
           if (inherits(res, "try-error")) {
               piece <- paste(capture.output(print(piece)), collapse = "\n")
               stop("with piece ", i, ": \n", piece, call. = FALSE)
           }
       }
       else {
           res <- .fun(piece, ...)
       }
       progress$step()
       res
   })(1L)
5: .Call("loop_apply", as.integer(n), f, env)
4: loop_apply(n, do.ply)
3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, 
       .inform = .inform, .parallel = .parallel, .paropts = .paropts)
1: ddply(uncertainty.long, .(Species), "nrow")

有关我的 data.frame 的更多详细信息:

    > head(uncertainty.long)
                Stack Variable PARun Model             Species    value year scenario   GCM                    sp
1        sync_current    Total   PA1   GLM Arctosafulvolineata 100.0000   NA     <NA>  <NA> Arctosa\nfulvolineata
2 sync_cgcm2_B2A_2020    Total   PA1   GLM Arctosafulvolineata 134.6840 2020      B2A cgcm2 Arctosa\nfulvolineata
3 sync_cgcm2_B2A_2050    Total   PA1   GLM Arctosafulvolineata 153.7617 2050      B2A cgcm2 Arctosa\nfulvolineata
4 sync_cgcm2_B2A_2080    Total   PA1   GLM Arctosafulvolineata 195.7176 2080      B2A cgcm2 Arctosa\nfulvolineata
5   sync_mk2_B2A_2020    Total   PA1   GLM Arctosafulvolineata 172.2967 2020      B2A   mk2 Arctosa\nfulvolineata
6   sync_mk2_B2A_2050    Total   PA1   GLM Arctosafulvolineata 198.9391 2050      B2A   mk2 Arctosa\nfulvolineata
> str(uncertainty.long)
'data.frame':   38000 obs. of  10 variables:
 $ Stack   : Factor w/ 19 levels "sync_cgcm2_B2A_2020",..: 7 1 2 3 14 15 16 11 12 13 ...
 $ Variable: Factor w/ 5 levels "Lost","NetChange",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ PARun   : Factor w/ 5 levels "PA1","PA2","PA3",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Model   : Factor w/ 8 levels "CTA","FDA","GAM",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...
 $ value   : num  100 135 154 196 172 ...
 $ year    : num  NA 2020 2050 2080 2020 2050 2080 2020 2050 2080 ...
 $ scenario: chr  NA "B2A" "B2A" "B2A" ...
 $ GCM     : chr  NA "cgcm2" "cgcm2" "cgcm2" ...
 $ sp      : chr  "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" "Arctosa\nfulvolineata" ...

这是我的 sessionInfo():

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
 [1] parallel  splines   grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.2.2      Hmisc_3.12-2        Formula_1.1-1       RCurl_1.95-4.1      bitops_1.0-6        biomod2_3.0.3       pROC_1.5.4          plyr_1.8           
 [9] rpart_4.1-3         randomForest_4.6-7  mda_0.4-4           class_7.3-9         gbm_2.1             survival_2.37-4     nnet_7.3-7          rasterVis_0.21     
[17] hexbin_1.26.2       latticeExtra_0.6-26 RColorBrewer_1.0-5  lattice_0.20-23     abind_1.4-0         raster_2.1-49       sp_1.0-13           ggplot2_0.9.3.1    

loaded via a namespace (and not attached):
 [1] cluster_1.14.4   colorspace_1.2-2 dichromat_2.0-0  digest_0.6.3     gtable_0.1.2     labeling_0.2     MASS_7.3-29      munsell_0.4.2    proto_0.3-10     scales_0.2.3    
[11] stringr_0.6.2    tools_3.0.1      zoo_1.7-10      

我试图用更少的列(2列)来重现它,它没有改变任何东西。但是,如果我减少行数,它可以在请求的变量“Species”只有一个级别值时工作:

> small.df <- uncertainty.long[1:3800, ]
> unique(small.df$Species)
[1] Arctosafulvolineata
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis 
> ddply(small.df, .(Species), "nrow")
                  Species nrow
    1 Arctosafulvolineata 3800

但如果我有另一行:

> small.df <- uncertainty.long[1:3801, ]
> unique(small.df$Species)
[1] Arctosafulvolineata Argyronetaaquatica 
10 Levels: Arctosafulvolineata Argyronetaaquatica Dolomedesplantarius Enoplognathamordax Iciussubinermis Neonvalentulus Pardosabifasciata Pardosaoreophila ... Trochosaspinipalpis
> small.df[3800:3801, ]
                    Stack Variable PARun  Model             Species     value year scenario    GCM                    sp
3800 sync_hadcm3_A1B_2080     Lost   PA5 MAXENT Arctosafulvolineata -54.90872 2080      A1B hadcm3 Arctosa\nfulvolineata
3801         sync_current    Total   PA1    GLM  Argyronetaaquatica 100.00000   NA     <NA>   <NA>  Argyroneta\naquatica
> ddply(small.df, .(Species), "nrow")
Error in attributes(out) <- attributes(col) : 
  'names' attribute [3801] must be the same length as the vector [3800]

我发现其他人也有类似的问题:https ://stackoverflow.com/a/14162351/2788395 。

但是,他们的解决方法(重新安装 plyr 1.7 而不是 1.8)对我不起作用。有没有人知道这个问题和/或如何解决它?

谢谢!

问题已解决 问题出在“物种”列的“名称”属性上。我使用以下代码删除了它们并且 ddply 工作:

> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
               Species nrow
1  Arctosafulvolineata 3800
2   Argyronetaaquatica 3800
3  Dolomedesplantarius 3800
4   Enoplognathamordax 3800
5      Iciussubinermis 3800
6       Neonvalentulus 3800
7    Pardosabifasciata 3800
8     Pardosaoreophila 3800
9     Piratauliginosus 3800
10 Trochosaspinipalpis 3800
4

1 回答 1

2

问题在于“物种”列的“名称”属性:

$ Species : Factor w/ 10 levels "Arctosafulvolineata",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr  "1" "1" "1" "1" ...

我使用以下代码删除了它们并且 ddply 工作:

> names(uncertainty.long$Species) <- "NULL"
> ddply(uncertainty.long, .(Species), "nrow")
               Species nrow
1  Arctosafulvolineata 3800
2   Argyronetaaquatica 3800
3  Dolomedesplantarius 3800
4   Enoplognathamordax 3800
5      Iciussubinermis 3800
6       Neonvalentulus 3800
7    Pardosabifasciata 3800
8     Pardosaoreophila 3800
9     Piratauliginosus 3800
10 Trochosaspinipalpis 3800
于 2013-09-18T10:44:30.340 回答