r - 根据 R 中的列值对数据框进行子集化时出现问题

Question

data frame我在对in进行子集化时遇到问题R。数据框att2有一个filter_name基于我想要的列subset。此列的unique值如下。

unique(att2[["filter_name"]])
# [1] title             Type        Operating_System         Occasion           Brand
148 Levels: Accessories Age Antennae Art_Style Aspect_ratio ... Zoom

这表明这Brand是filter_name列的值。但是当我使用下面的代码对框架进行子集化时，它给出了 0 行，如下所示。

att3 <- subset(att2, filter_name == 'Brand')
> att3
[1] a      b         c  filter_name
<0 rows> (or 0-length row.names)

我无法找出原因。有没有人遇到过这种问题？

score 2 · Accepted Answer

我们所能做的就是猜测您的问题的根源可能是什么。

这是我的最佳猜测：您的“filter_name”列中包含空格，因此在您去除空格之前，您实际上不应该寻找“Brand”。

如果我的猜测正确，这是一个重现您的问题的最小示例：

首先，一些示例数据：

mydf <- data.frame(Param =  c("   Brand   ", "Operating System", 
                              "Type ", "   Brand   ", "Type ", 
                              "Type ", "   Brand   ", "Type ", 
                              "   Brand   "), Value = 1:9)
unique(mydf[["Param"]])
# [1]    Brand         Operating System Type            
# Levels:    Brand    Operating System Type 

subset(mydf, Param == "Brand")
# [1] Param Value
# <0 rows> (or 0-length row.names)

print与参数一起使用quote = TRUE以查看您的空格data.frame：

print(mydf, quote = TRUE)
#                Param Value
# 1      "   Brand   "   "1"
# 2 "Operating System"   "2"
# 3            "Type "   "3"
# 4      "   Brand   "   "4"
# 5            "Type "   "5"
# 6            "Type "   "6"
# 7      "   Brand   "   "7"
# 8            "Type "   "8"
# 9      "   Brand   "   "9"

如果这恰好是您的问题，那么gsub应该快速解决它：

mydf$Param <- gsub("^\\s+|\\s+$", "", mydf$Param)
unique(mydf[["Param"]])
# [1] "Brand"            "Operating System" "Type"  

subset(mydf, Param == "Brand")
#   Param Value
# 1 Brand     1
# 4 Brand     4
# 7 Brand     7
# 9 Brand     9

您可能还想查看strip.white默认read.table为FALSE. 尝试重新读取您的数据，strip.white = TRUE然后尝试您的子集。

score 0 · Accepted Answer

首先，你真的应该阅读这篇关于如何提出好问题的 stackoverflow 帖子。

对于您的问题，类似这样的问题（当您不发布可重现的示例时很难，正如 Arun 在上面指出的那样）

 att2 <- (data.frame(v=rnorm(10), filter_name=c('Brand','Not Brand')))

 att2[att2$filter_name == 'Brand', ]
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

 subset(att2, filter_name == 'Brand')
            v filter_name
1 -1.84217530       Brand
3 -0.36199449       Brand
5 -0.54431665       Brand
7 -0.05659442       Brand
9  1.29753513       Brand

这里有更多关于子设置的内容。

score 0 · Accepted Answer

使用stringr包，你可以做类似的事情

   dat$filter_name_trim <- str_trim(dat$filter_name)
   att3 <- subset(att2, filter_name_trim == 'Brand')

r - 根据 R 中的列值对数据框进行子集化时出现问题

3 回答 3

Related

Reference