r - 为什么 rbindlist 不尊重列名？

Question

我刚刚发现了这个错误，才发现有些人称它为“功能”。这使得rbindlistNOT like WILL 尊重列名do.call("rbind",l)。rbind此外，文档中没有提到这种完全出乎意料的行为。这真的是故意的吗？

代码示例：

> library(data.table)
> DT1 <- data.table(a=1, b=2)
> DT2 <- data.table(b=3, a=4)
> DT1
a b
1: 1 2
> DT2
b a
1: 3 4

我希望rbind这些会产生 a = 1,4 的列；b = 2,3。rbind.data.table用and得到它rbind.data.frame，虽然rbind.data.table会产生警告。

> rbind(DT1, DT2)
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.
> rbind(as.data.frame(DT1), as.data.frame(DT2))
a b
1 1 2
2 4 3
> do.call('rbind', list(DT1, DT2))
a b
1: 1 2
2: 4 3
Warning message:
In data.table::.rbind.data.table(...) :
Argument 2 has names in a different order. Columns will be bound by name for consistency with base. You can drop names (by using an unnamed list) and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this warning.

rbindlist但是，很乐意默默地破坏数据：

> rbindlist(list(DT1, DT2))
a b
1: 1 2
2: 3 4

score 8 · Accepted Answer

此功能现在在v1.9.3 的提交 1266 中实现。来自新闻：

o  'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented 
   entirely in C. Closes #5249    
  -> use.names by default is FALSE for backwards compatibility (doesn't bind by 
     names by default)
  -> rbind(...) now just calls rbindlist() internally, except that 'use.names' 
     is TRUE by default, for compatibility with base (and backwards compatibility).
  -> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
  -> At least one item of the input list has to have non-null column names.
  -> Duplicate columns are bound in the order of occurrence, like base.
  -> Attributes that might exist in individual items would be lost in the bound result.
  -> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
  -> And incredibly fast ;).
  -> Documentation updated in much detail. Closes DR #5158.

有了这个，您可以设置use.names=TRUE按名称绑定。它FALSE默认设置为向后兼容。或者，您可以再次使用rbind(..)whereuse.names=TRUE来实现向后兼容性。

有关更多示例，请参阅此帖子，有关基准测试，请参阅此帖子。

例子：

1) 刚设置use.names=TRUE

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=1, x=2)

rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE)
#    x y
# 1: 1 2
# 2: 2 1

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(z=2, y=1)

# returns error when fill=FALSE but can't be bound without fill=TRUE
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE)
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) : 
    # Answer requires 3 columns whereas one or more item(s) in the input 
    # list has only 2 columns. ...

2) 还按出现的顺序绑定重复的列名：

DT1 <- data.table(x=1, x=2, y=10, y=20, y=30)
DT2 <- data.table(y=-10, x=-2, y=-20, x=-1, y=-30)

rbindlist(list(DT1,DT2), use.names=TRUE)

#     x  x   y   y   y
# 1:  1  2  10  20  30
# 2: -2 -1 -10 -20 -30

3)fill=TRUE如果您想按名称绑定并填充缺失的列，请使用

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=2, z=-1)

rbindlist(list(DT1, DT2), fill=TRUE)
#     x y  z
# 1:  1 2 NA
# 2: NA 2 -1

高温高压

r - 为什么 rbindlist 不尊重列名？

1 回答 1

此功能现在在v1.9.3 的提交 1266 中实现。来自新闻：

例子：

Related

Reference