r - Colwise 吃掉 ddply 中的列名

Question

我正在尝试对数据框进行分块，查找子数据框不平衡的实例，并为缺少的某个因子的某些级别添加 0 值。为此，在 ddply 中，我与一组向量进行了快速比较，以确定应该存在哪些级别的因子，然后创建一些新行，复制子数据集的第一行但修改它们的值，然后绑定它们到旧数据集。

我使用 colwise 进行复制。

这在 ddply 之外非常有用。在 ddply 内部...识别行被吃掉了，我的 rbind borks。这是一种奇怪的行为。请参阅以下代码，其中包含一些调试打印语句以查看结果差异：

#a test data frame
g <- data.frame(a=letters[1:5], b=1:5)

#repeat rows using colwise
rep.row <- function(r, n){
  colwise(function(x) rep(x, n))(r)
}

#if I want to do this with just one row, I get all of the columns
rep.row(g[1,],5)

很好。它打印

  a b
1 a 1
2 a 1
3 a 1
4 a 1
5 a 1

#but, as soon as I use ddply to create some new data
#and try and smoosh it to the old data, I get errors
ddply(g, .(a), function(x) {

  newrows <- rep.row(x[1,],5)
  newrows$b<-0
  rbind(x, newrows)

})

这给

Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

你可以看到这个调试版本的问题

#So, what is going on here?
ddply(g, .(a), function(x) {
  newrows <- rep.row(x[1,],5)
  newrows$b<-0
  print(x)
  print("\n\n")
  print(newrows)
  rbind(x, newrows)

})

您可以看到 x 和 newrows 具有不同的列 - 它们在 a 中有所不同。

  a b
1 a 1
[1] "\n\n"
  b
1 0
2 0
3 0
4 0
5 0
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

这里发生了什么？为什么当我在子数据帧上使用 colwise 时，识别行会被吃掉？

score 2 · Accepted Answer

似乎 ddply 和 colwise 之间的互动很有趣。更具体地说，当colwise调用strip_splits并找到vars由给出的属性时，就会出现问题ddply。

作为一种解决方法，请尝试将第一行放在您的函数中，

   attr(x, "vars") <- NULL
   # your code follows
   newrows <- rep.row(x[1,],5)

r - Colwise 吃掉 ddply 中的列名

1 回答 1

Related

Reference