0

我正在努力为以下问题提出矢量化解决方案。我有两个数据框:

> people <- data.frame(name = c('Fred', 'Bob'), profession = c('Builder', 'Baker'))
> people
  name profession
1 Fred    Builder
2  Bob      Baker

> allowed <- data.frame(name = c('Fred', 'Fred', 'Bob', 'Bob'), profession = c('Builder', 'Baker', 'Barman', 'Biker'))
> allowed
  name profession
1 Fred    Builder
2 Fred      Baker
3  Bob     Barman
4  Bob     Biker

也就是说,我要检查 people 中的每个人都有一个允许的职业,并返回任何没有的名字。

例如,Fred 可以是 Builder 或 Baker,所以他很好。但是,Bob 可以是酒保或骑自行车的人,但不能是面包师(注意:在我的用例中,只有两个允许的职业)。

我想返回一个数据框,这些名称没有允许的职业:

name profession permitted
1 Bob Baker Biker
2 Bob Baker Barman

谢谢您的帮助

4

4 回答 4

1

简单的基础解决方案。我相信有人可以想出更好的东西。

out <- allowed[!allowed$name %in% merge(people, allowed)$name, ]

这会让你得到想要的人,以及他们允许的职业。如果您还想要他们的实际职业:

names(out)[2] <- "permitted"
out <- merge(people, out, all.y=TRUE)
于 2013-06-13T11:03:22.993 回答
1

这是一个更具可读性的data.table解决方案。如果您认为可读性强,您也可以在同一行上执行最后一步以使其成为单行。

# load library, convert people to a data.table and set a key
library(data.table)
people = data.table(people, key = "name,profession")

# compute
result = data.table(allowed, key = "name")[people[!allowed]]
setnames(result, "profession.1", "permitted")

result
#   name profession permitted
#1:  Bob     Barman     Baker
#2:  Bob      Biker     Baker
于 2013-06-13T15:01:40.043 回答
0

可能还有另一种方法,但这应该可行。我添加了一个未经许可的职业的第三人,向您展示如何将该函数应用于整个数据集。

currentprof <-structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Bob", 
"Fred", "Jan"), class = "factor"), profession = structure(c(3L, 
2L, 1L), .Label = c("Analyst", "Baker", "Builder"), class = "factor")), .Names = c("name", 
"profession"), class = "data.frame", row.names = c(NA, -3L))

allowed <- structure(list(name = structure(c(2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Bob", 
"Fred", "Jan"), class = "factor"), profession = structure(c(4L, 
1L, 2L, 3L, 6L, 5L), .Label = c("Baker", "Barman", "Biker", "Builder", 
"Driver", "Teacher"), class = "factor")), .Names = c("name", 
"profession"), class = "data.frame", row.names = c(NA, -6L))

checkprof <- function(name){
allowedn <- allowed[allowed$name == name,]
currentprofn <- currentprof[currentprof$name==name,]
if(!currentprofn$profession %in% allowedn$profession)
{result <- merge(currentprofn, allowedn, by = "name", all.x=TRUE)} else
{result <-data.frame(col1=character(),
                 col2=character(), 
                 col3=character(), 
                 stringsAsFactors=FALSE)}
colnames(result) <- c("name","profession","permitted")
return(result)
}


do.call(rbind,lapply(levels(allowed$name),checkprof))
于 2013-06-13T11:27:13.603 回答
0

这是我的看法。不过可能需要更多测试。我自己愿意接受建议。它适用于您的示例,但我不确定它是否可以概括。

people$check <- ifelse(people$profession %in% allowed[which(allowed$name == people$name),"profession"], TRUE,FALSE)

people_select <- people[people$check == TRUE,]

编辑:只是为了澄清,以防这阻碍你投票。ifelse 是矢量化的,运行速度非常快。

于 2013-06-13T12:51:01.647 回答