r - 如何删除数据框中的记录

Question

我需要删除我的数据框的特定行，但我在这样做时遇到了麻烦。数据集如下所示：

> head(mergedmalefemale)
  coupleid gender shop time amount
1        1      W    3    1  29.05
2        1      W    1    2  31.65
3        1      W    3    3     NA
4        1      W    2    4  17.75
5        1      W    3    5 -28.40
6        2      W    1    1  42.30

我想做的是删除至少一个金额为 NA 或负数的 Coupleid 的所有记录。在上面的示例中，应删除所有具有 Coupleid “1” 的行，因为存在具有负值和 NA 的行。我用诸如此类的功能进行了尝试，na.omit(mergedmalefemale)但这仅删除了具有 NA 的行，而不删除具有相同 cupleid 的其他行。由于我是初学者，如果有人可以帮助我，我会很高兴。

score 2 · Accepted Answer

由于您不想只省略 NA 或负数的金额，而是想省略具有相同 id 的所有数据，因此您必须首先找到要删除的 id，然后将其删除。

mergedmalefemale <- read.table(text="
    coupleid gender shop time amount
    1        1      W    3    1  29.05
    2        1      W    1    2  31.65
    3        1      W    3    3     NA
    4        1      W    2    4  17.75
    5        1      W    3    5 -28.40
    6        2      W    1    1  42.30", 
    header=TRUE)

# Find NA and negative amounts
del <- is.na(mergedmalefemale[,"amount"]) | mergedmalefemale[,"amount"]<0
# Find coupleid with NA or negative amounts
ids <- unique(mergedmalefemale[del,"coupleid"])
# Remove data with coupleid such that amount is NA or negative
mergedmalefemale[!mergedmalefemale[,"coupleid"] %in% ids,]

score 1 · Accepted Answer

这是另一种选择。考虑你的 data.frame 被称为df

> na.omit(df[ rowSums(df[, sapply(df, is.numeric)]< 0, na.rm=TRUE)  ==0, ])
  coupleid gender shop time amount
1        1      W    3    1  29.05
2        1      W    1    2  31.65
4        1      W    2    4  17.75
6        2      W    1    1  42.30

score 1 · Accepted Answer

又一个申请的好机会data.table

require(data.table)
mergedmalefemale <- as.data.table(mergedmalefemale)
mergedmalefemale[, if(!any(is.na(amount) | amount < 0)) .SD, by=coupleid]

#   coupleid gender shop time amount
#1:        2      W    1    1   42.3

score 0 · Accepted Answer

这是一个相当肮脏的方式

# identify the coupleids that need to stay/be removed
agg <- aggregate(amount ~ coupleid, data=mergedmalefemale, FUN=function(x) min(is.na(x)|(x>0)))

# insert a column alongside "amount.y" that puts a 0 next to rows to be deleted
df.1 <- merge(mergedmalefemale, agg, by="coupleid")

# delete the rows
df.1 <- df.1[df.1$amount.y == 1, ]

r - 如何删除数据框中的记录

4 回答 4

Related

Reference