r - 如何从 data.frame 中删除一行而不丢失属性

Question

对于初学者：我现在在这个问题上搜索了几个小时 - 所以如果答案应该是微不足道的，请原谅我......

我想要做的是从 data.frame 中删除一行（第 101 号）。它包含测试数据，不应出现在我的分析中。我的问题是：每当我从 data.frame 中提取子集时，属性（尤其是评论）都会丢失。

str(x)
# x has comments for each variable
x <- x[1:100,]
str(x)
# now x has lost all comments

有据可查的是，子集将删除所有属性-到目前为止，这非常清楚。该手册（例如http://stat.ethz.ch/R-manual/R-devel/library/base/html/Extract.data.frame.html）甚至提出了一种保留属性的方法：

## keeping special attributes: use a class with a
## "as.data.frame" and "[" method:


as.data.frame.avector <- as.data.frame.vector

`[.avector` <- function(x,i,...) {
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  r
}

d <- data.frame(i= 0:7, f= gl(2,4),
                u= structure(11:18, unit = "kg", class="avector"))
str(d[2:4, -1]) # 'u' keeps its "unit"

我还没有深入了解 R 以了解这里到底发生了什么。但是，简单地运行这些行（最后三行除外）不会改变我的子集的行为。使用带有适当向量的命令子集（）（100 次 TRUE + 1 FALSE）给了我相同的结果。并且简单地将属性存储到一个变量并在子集之后恢复它，也不起作用。

# Does not work...
tmp <- attributes(x)
x <- x[1:100,]
attributes(x) <- tmp

当然，我可以将所有评论写入向量（var=>comment）、子集并使用循环将它们写回——但这似乎不是一个有根据的解决方案。而且我很确定我会在未来的分析中遇到具有其他相关属性的数据集。

所以这就是我在stackoverflow、谷歌和脑力方面的努力陷入困境的地方。如果有人可以帮助我提示，我将不胜感激。谢谢！

score 12 · Accepted Answer

如果我理解正确，您在 data.frame 中有一些数据，并且 data.frame 的列有与之关联的注释。也许像下面这样？

set.seed(1)

mydf<-data.frame(aa=rpois(100,4),bb=sample(LETTERS[1:5],
  100,replace=TRUE))

comment(mydf$aa)<-"Don't drop me!"
comment(mydf$bb)<-"Me either!"

所以这会给你类似的东西

> str(mydf)
'data.frame':   100 obs. of  2 variables:
 $ aa: atomic  3 3 4 7 2 7 7 5 5 1 ...
  ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2 2 5 4 2 1 3 5 3 ...
  ..- attr(*, "comment")= chr "Me either!"

当您对此进行子集化时，评论将被删除：

> str(mydf[1:2,]) # comment dropped.
'data.frame':   2 obs. of  2 variables:
 $ aa: num  3 3
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2

要保留注释，请定义函数[.avector，就像您在上面所做的那样（从文档中），然后将适当的类属性添加到 data.frame 中的每个列（编辑：要保持因子级别bb，添加"factor"到bb. ):

mydf$aa<-structure(mydf$aa, class="avector")
mydf$bb<-structure(mydf$bb, class=c("avector","factor"))

以便保留评论：

> str(mydf[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Class 'avector'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"

编辑：

如果您的 data.frame 中有许多列具有您要保留的属性，您可以使用lapply（EDITED以包含原始列类）：

mydf2 <- data.frame( lapply( mydf, function(x) {
  structure( x, class = c("avector", class(x) ) )
} ) )

但是，这会删除与 data.frame 本身相关的注释（例如comment(mydf)<-"I'm a data.frame"），因此如果有的话，请将它们分配给新的 data.frame：

comment(mydf2)<-comment(mydf)

然后你有

> str(mydf2[1:2,])
'data.frame':   2 obs. of  2 variables:
 $ aa:Classes 'avector', 'numeric'  atomic [1:2] 3 3
  .. ..- attr(*, "comment")= chr "Don't drop me!"
 $ bb: Factor w/ 5 levels "A","B","C","D",..: 4 2
  ..- attr(*, "comment")= chr "Me either!"
 - attr(*, "comment")= chr "I'm a data.frame"

score 5 · Accepted Answer

对于那些根据 BenBarnes 的解释寻找“全押”解决方案的人：就是这样。

（如果这对您有用，请将您的“up”交给 BenBarnes 的帖子）

# Define the avector-subselection method (from the manual)
as.data.frame.avector <- as.data.frame.vector
`[.avector` <- function(x,i,...) {
  r <- NextMethod("[")
  mostattributes(r) <- attributes(x)
  r
}

# Assign each column in the data.frame the (additional) class avector
# Note that this will "lose" the data.frame's attributes, therefore write to a copy
df2 <- data.frame(
  lapply(df, function(x) {
    structure( x, class = c("avector", class(x) ) )
  } )
)

# Finally copy the attribute for the original data.frame if necessary
mostattributes(df2) <- attributes(df)

# Now subselects work without losing attributes :)
df2 <- df2[1:100,]
str(df2)

好消息：当将类附加到所有 data.frame 的元素一次时，子选择不再打扰属性。

好的 - 有时我很惊讶在 R 中执行最简单的操作是多么复杂。但如果我只是在 SPSS 中标记并删除了案例，我肯定没有了解“类”功能；）

score 3 · Accepted Answer

这是通过sticky包解决的。（完全披露：我是包作者。）将sticky()应用于您的向量，并通过子集操作保留属性。例如：

> df <- data.frame( 
+   sticky   = sticky( structure(1:5, comment="sticky attribute") ),
+   nonstick = structure( letters[1:5], comment="non-sticky attribute" )
+ )
> 
> comment(df[1:3, "nonstick"])
NULL
> comment(df[1:3, "sticky"])
[1] "sticky attribute"

这适用于任何属性，而不仅仅是comment.

详情见sticky包装：

score 0 · Accepted Answer

我花了几个小时试图弄清楚如何在子集数据框（删除列）时保留属性数据（特别是变量标签）。答案太简单了，我简直不敢相信。只需使用 Hmisc 包中的函数 spss.get ，然后无论您如何子集，变量标签都会保留。

r - 如何从 data.frame 中删除一行而不丢失属性

4 回答 4

Related

Reference