1

我正在尝试按照 d3Network 的 R 端口的示例创建一个详细说明的 Sankey 图(如此处所述:https ://christophergandrud.github.io/networkD3/ )。我加载以下示例“能源”数据集:

    # Load energy projection data

    URL <- paste0("https://cdn.rawgit.com/christophergandrud/networkD3/",
    "master/JSONdata/energy.json")

    Energy <- jsonlite::fromJSON(URL)

导入“Energy”数据集会生成两个新的 data.frame;节点和链接。查看链接数据显示以下格式:

    head(Energy$links)
      source target   value
  1        0      1 124.729
  2        1      2   0.597
  3        1      3  26.862
  4        1      4 280.322
  5        1      5  81.144
  6        6      2  35.000

“源”列表示原始节点,“目标”列表示目标节点,而“值”列表示每个单独链接的值。

尽管这在概念上相当简单,但我在获取与Energy$linksdata.frame 格式相同的数据集时遇到了巨大的困难。我已经能够以以下格式获取我的数据,但我对如何进一步转换它完全空白:

   head(sampleSankeyData, n = 10L)
    clientID                node1
      <int>                <chr>
 1     23969 1 Community Services
 2     39199      1 Youth Justice
 3     23595      1 Mental Health
 4     15867 1 Community Services
 5     18295            3 Housing
 6     18295            2 Housing
 7     18295 1 Community Services
 8     18295            4 Housing
 9     15253            1 Housing
 10    27839 1 Community Services 

我想要做的是汇总每个链接的唯一客户数量。例如,在上述数据子集中,由于客户 18295,“1 社区服务”到“2 住房”的链接应该具有值 1(“2 住房”到“3 住房”的链接也应该是”以及“3 个房屋”到“4 个房屋”)。因此,我希望能够以Energy$links与桑基图示例相同的格式获取数据。

4

1 回答 1

0

尝试这个:

library(tidyverse)
library(stringr)
df <- tribble(
~number, ~clientID,         ~node1,
1 ,    23969, '1 Community Services',
2 ,    39199,      '1 Youth Justice',
3 ,    23595,      '1 Mental Health',
4 ,    15867, '1 Community Services',
5 ,    18295,            '3 Housing',
6 ,    18295,            '2 Housing',
7 ,    18295, '1 Community Services',
8 ,    18295,            '4 Housing',
9 ,    15253,            '1 Housing',
10,    27839, '1 Community Services')

df2 <- mutate(df, step=as.numeric(str_sub(node1, end=1))) %>%
  spread(step, node1, sep='_') %>%
  group_by(clientID) %>%
  summarise(step1 = sort(unique(step_1))[1],
            step2 = sort(unique(step_2))[1],
            step3 = sort(unique(step_3))[1],
            step4 = sort(unique(step_4))[1])

df3 <- bind_rows(select(df2,1,source=2,target=3),
            select(df2,1,source=3,target=4),
            select(df2,1,source=4,target=5)) %>%
  group_by(source, target) %>%
  summarise(clients=n())

并将其与networkD3...一起使用

links <- df3 %>% 
  dplyr::ungroup() %>% # ungroup just to be safe
  dplyr::filter(!is.na(source) & !is.na(target)) # remove lines without a link

# build the nodes data frame based on nodes in your links data frame
nodeFactors <- factor(sort(unique(c(links$source, links$target))))
nodes <- data.frame(name = nodeFactors)

# convert the source and target values to the index of the matching node in the 
# nodes data frame
links$source <- match(links$source, levels(nodeFactors)) - 1
links$target <- match(links$target, levels(nodeFactors)) - 1

# plot
library(networkD3)
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source', 
              Target = 'target', Value = 'clients', NodeID = 'name')
于 2017-10-23T01:10:11.353 回答