我正在使用 tidygraph 来处理组织结构图数据。我正在尝试执行的一项计算是总结每个直线经理的直接直接下属的数量,以及他们所坐的报告的总数(累积)。
我使用了这个问题的答案中提供的代码版本:(Tidygraph:在父级别计算子摘要)并且它完全按照它应该的方式工作。
library(tidyverse)
library(tidygraph)
set.seed(1234)
tree <- create_tree(40, children = 3, directed = TRUE) %>%
mutate(name = runif(40, min = 3333, max = 8888),
fte_person = round(runif(40, min = 0, max = 1),1))
tot_fte_person <- function(neighborhood, ...){
neighborhood %>% activate(nodes) %>%
slice(-1) %>%
select(fte_person) %>%
pull %>%
sum
}
nodes <- tree %>%
mutate(direct_fte_person = map_local_dbl(order = 1, mode="out", .f = tot_fte_person),
total_fte_person = map_local_dbl(order = 3, mode="out", .f = tot_fte_person)) %>%
as_tibble() %>%
print()
但是,我的数据集大约有 28,000 条记录,这段代码需要大约一个小时才能执行,感觉太长了。
相比之下,这段计算节点总数的代码在 28,000 条记录中花费了大约 20 秒:
nodes <- tree %>%
mutate(direct_positions = local_size(order = 1, mindist = 1, mode = "out"),
total_positions = local_size(order = 3, mindist = 1, mode = "out")) %>%
as_tibble() %>%
print()
请问有人对如何加快这些相当简单的计算有建议吗?
更新:我已经为此工作了几个小时。从这里借一些代码(https://chapmandu2.github.io/orgsurveyr_docs/articles/organisations_with_ggraph.html#simulate-the-people-in-the-organisation)给了我以下方法,它完成了对 40,000 个模拟记录的计算1-2分钟。更好的...
set.seed(1234)
tree <- create_tree(40000, children = 10, directed = TRUE, mode = "out") %>%
mutate(mutate(name = runif(40000, min = 40000, max = 90000),
fte_person = round(runif(40000, min = 0, max = 1)),
fte_position = 1,
unit_id = row_number())
# Create tibble of all reporting relationships, recursively
mapping <- tree %>%
mutate(child_id = map_bfs_back(unit_id, .f = function(node, path, ...){
.N()[c(node, path$node),]$unit_id
})) %>%
activate(nodes) %>%
as_tibble() %>%
transmute(parent_id = unit_id, unit_id = child_id) %>%
unnest(cols = c(unit_id)) %>%
arrange(parent_id) %>%
filter(parent_id != unit_id) # Removes counting of manager's own position in the total
# Group and sum by manager
nextstep <- mapping %>%
inner_join(tree %>%
activate(nodes) %>%
as_tibble(),
by = "unit_id") %>%
group_by(parent_id) %>%
summarise(tot_fte_pos = sum(fte_position),
tot_fte_per = sum(fte_person))
# Join back to the original graph
final_step <- tree %>%
activate(nodes) %>%
full_join(nextstep, by = c("unit_id" = "parent_id"))
final_step