1

我不确定如何描述我正在尝试执行的操作。我有一个包含两列(电影和演员)的数据框。我想根据他们在一起的电影创建一个独特的 2 演员组合列表。下面是创建我拥有的数据框示例的代码,以及另一个我想要的结果的数据框。


start_data <- tibble::tribble(
  ~movie, ~actor,
  "titanic", "john",
  "star wars", "john",
  "baby driver", "john",
  "shawshank", "billy",
  "titanic", "billy",
  "star wars", "sarah",
  "titanic", "sarah"
)

end_data <- tibble::tribble(
  ~movie, ~actor1, ~actor2,
  "titanic", "john", "billy",
  "titanic", "john", "sarah",
  "titanic", "billy", "sarah",
  "star wars", "john", "sarah"
)

任何帮助表示赞赏,谢谢!短的话加分++

4

2 回答 2

3

您可以使用combn(..., 2)查找两个演员组合,可以将其转换为两列tibblesummarize并使用;存储在列表列中 要获得平面数据框,请使用unnest

library(tidyverse)

start_data %>% 
    group_by(movie) %>% 
    summarise(acts = list(
        if(length(actor) > 1) set_names(as.tibble(t(combn(actor, 2))), c('actor1', 'actor2')) 
        else tibble()
    )) %>% 
    unnest()

# A tibble: 4 x 3
#      movie actor1 actor2
#      <chr>  <chr>  <chr>
#1 star wars   john  sarah
#2   titanic   john  billy
#3   titanic   john  sarah
#4   titanic  billy  sarah
于 2017-11-07T02:16:02.880 回答
2
library(tidyverse)
library(stringr)

inner_join(start_data, start_data, by = "movie") %>% 
  filter(actor.x != actor.y) %>% 
  rowwise() %>% 
  mutate(combo = str_c(min(actor.x, actor.y), "_", max(actor.x, actor.y))) %>% 
  ungroup() %>%
  select(movie, combo) %>% 
  distinct %>% 
  separate(combo, c("actor1", "actor2"))
于 2017-11-07T03:10:39.923 回答