我有一个 df 如下:

   names  fruit
7   john  apple
13  john orange
14  john  apple
2   mary orange
5   mary  apple
8   mary orange
10  mary  apple
12  mary  apple
1    tom  apple
6    tom  apple

我想做两件事。首先,计算同时具有苹果和橙子的独特观察的数量(即 2 玛丽和约翰)。



toremove<-unique(data[data$fruit=='apple' & data$fruit=='orange',"names"])  ##this part doesn't work, if it had I would have used the below code to remove the names identified
data2<-data[!data$names %in% toremove,]

真的,我想使用 grepl 因为我的真实数据比水果复杂一点。这是我尝试过的(首先转换为data.table)

z<-data1[,ind := grepl('app.*? & orang.*?', fruit), by='names']  ## this works fine when i just use 'app.*?' but collapses when I try to add the & sign, so I'm making an error with the operator. In addition the by='names' doesn't work out for me, which is important. My plan here was to create an indicator (if an individual has an apple and an orange, then they get an indicator==1 and I would then filter them out on the basis of this indicator). 



names fruit
1   tom apple
6   tom apple

2 回答 2


如果您只寻找带有apples 的名称,这里有一个简单的data.table方法

setDT(data)[ , if(all(fruit == "apple")) .SD, by = names]
#    names fruit
# 1:   tom apple
# 2:   tom apple


data[, any(fruit == "apple") & any(fruit == "orange"), by = names][, sum(V1)]
## [1] 2 

最后,如果您要寻找的只是只有一个 unique 的用户fruit,您可以尝试使用GH 上uniqueN开发版本(或length(unique())

data[, if(uniqueN(fruit) < 2L) .SD, by = names]
#    names fruit
# 1:   tom apple
# 2:   tom apple
于 2015-09-02T10:22:14.747 回答

我正在使用 dplyr 包来标记/发现有橙子的用户和有两种水果的用户。(我最后添加了一个额外的行来获得一个只有橙色的案例)。

data %>%
  group_by(names) %>%                            # for each user name
  mutate(N_dist = n_distinct(fruit),             # count distinct number of fruits
         N_oranges = sum(fruit=="orange")) %>%   # count number of oranges
  filter(N_oranges == 0 & N_dist < 2) %>%        # keep users with no oranges and no both fruits
  select(names, fruit)

#   names fruit
# 1   tom apple
# 2   tom apple


#    names  fruit N_dist N_oranges
# 1   john  apple      2         1
# 2   john orange      2         1
# 3   john  apple      2         1
# 4   mary orange      2         2
# 5   mary  apple      2         2
# 6   mary orange      2         2
# 7   mary  apple      2         2
# 8   mary  apple      2         2
# 9    tom  apple      1         0
# 10   tom  apple      1         0
# 11 kathy orange      1         1


于 2015-09-02T10:17:07.540 回答