r - 为什么无法使用名称对列进行 [- 子集化（即删除）？

Question

我非常担心这已经被问到并且会被否决，但是我没有在文档中找到答案（？“[”），并且发现很难搜索。

data(wines)
# This is allowed:
alcoholic <- wines[, 1]
alcoholic <- wines[, "alcohol"]
nonalcoholic <- wines[, -1]
# But this is not:
fail <- wines[, -"alcohol"]

我知道两种解决方案，但因需要它们而感到沮丧。

win <- wines[, !colnames(wines) %in% "alcohol"]  # snappy
win <- wines[, -which(colnames(wines) %in% "alcohol")]  # snappier!

score 18 · Accepted Answer

当你这样做

wines[, -1]

-1在被使用之前进行评估[。如您所知，-一元运算符不适用于 class 的对象character，因此对 "alcohol" 执行相同操作将导致您：

Error in -"alcohol" : invalid argument to unary operator

您可以将以下内容添加到您的备选方案中：

wines[, -match("alcohol", colnames(wines))]
wines[, setdiff(colnames(wines), "alcohol")]

但是您应该了解负索引的风险，例如，看看如果您错误地输入“alcool”（原文如此）会发生什么。所以您的第一个建议和这里的最后一个建议（@Ananda's）应该是首选。如果您提供的名称不是数据的一部分，您可能还想编写一个会出错的函数。

score 8 · Accepted Answer

另一种可能：

subset(wines,select=-alcohol)

你甚至可以做

subset(wines,select=-c(alcohol,other_drop))

事实上，如果你想要删除一组连续的列，你甚至可以

subset(wines,select=-(first_drop:last_drop))

这可能很方便（尽管 IMO 它危险地依赖于列的顺序，这可能很脆弱：grep如果有某种方法来识别列，或者更明确地单独定义列组，我可能更喜欢基于 - 的解决方案） .

在这种情况下subset，使用非标准评估，正如其他地方所讨论的那样，在某些情况下可能是危险的。但由于它的可读性，我仍然喜欢它用于简单的顶级数据操作。

score 6 · Accepted Answer

另一种使用数字索引并推广到您想要删除一堆类似命名的列的情况的方法：

dfrm[ , -grep("^val", names(dfrm) )] #remove columns starting with "val"

（我投票给了弗洛德尔，因为他的回答描述了“为什么”一个“减号”不起作用。主要是因为 R 作者没有为此目的重载“-”运算符。他们也没有重载“ +" 以某些语言的方式进行连接。

score 3 · Accepted Answer

编写一个简单的小函数并将其粘贴到您的.Rprofile. 就像是...

dropcols <- function( df , cols ){
  out <- df[ , !names(df) %in% cols]
  return( out )
}

#  To use it....
data( mtcars )
head( dropcols( mtcars , "mpg" ) )
#                  cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4           6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag       6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout   8  360 175 3.15 3.440 17.02  0  0    3    2
#Valiant             6  225 105 2.76 3.460 20.22  1  0    3    1

score 3 · Accepted Answer

我在文档中找不到这个，但以下语法适用于data.table：

dt = data.table(wines)

dt[, !"alcohol", with = F]

如果您愿意，您还可以有一个列列表：

dt[, !c("Country", "alcohol"), with = F]

它刚刚在新闻中记录了 v1.8.4 似乎：

当 with=FALSE 时，“！” 也可以是 j 的前缀，#1384ii。这将选择除命名列之外的所有列。

DF[,-match("somecol",names(DF))]
# works when somecol exists. If not, NA causes an error.

DF[,-match("somecol",names(DF),nomatch=0)]
# works when somecol exists. Empty data.frame when it doesn't, silently.

DT[,-match("somecol",names(DT)),with=FALSE]
# same issues.

DT[,setdiff(names(DT),"somecol"),with=FALSE]
# works but you have to know order of arguments, and no warning if missing

对比

DT[,!"somecol",with=FALSE]
# works and easy to read. With (helpful) warning if somecol isn't there.

但是以上所有内容都复制了除已删除列之外的每一列。更常见的是：

DT[,somecol:=NULL]

通过引用按名称删除列。

score 0 · Accepted Answer

你可以得到你想要的行为，如下所示：

data(iris)
str(iris)
delete <- which(colnames(iris) == "Species")
iris2 <- iris[, -delete]
str(iris2)

r - 为什么无法使用名称对列进行 [- 子集化（即删除）？

6 回答 6

Related

Reference