r - 在 R 中可视化从一组对象到另一组对象的流程

Question

我对 NIH 如何审查拨款感兴趣。拨款审查过程的运作方式是国会将资金分配给各个机构（例如，国家癌症研究所或 NCI），并将个别拨款提交给这些机构。这些机构围绕各种资助重点（例如癌症、传染病等）进行组织。

然而，当审查拨款时，它们通常（但不总是）被发送到各个研究部门，这些部门更多地围绕科学学科组织。因此，如果研究人员向 NHLBI 提交用于研究白血病的拨款，“肿瘤进展”研究部分会发现自己正在审查国家癌症研究所和国家心肺血液研究所 (NHLBI) 的拨款。

我在 R 中有一个数据框，看起来像这样：

grant_id <- 1:100
funding_agency <- sample(rep(c("NIAID", "NIGMS", "NHLBI", "NCI", "NINDS"), 20))
study_section <- sample(rep(c("Tumor Cell Biology", "Tumor Progression", 
                              "Vector Biology", "Molecular Genetics", 
                              "Medical Imaging", "Macromolecular Structure",
                              "Infectious Diseases", "Drug Discovery", 
                              "Cognitive Neuroscience", "Aging and Geriatrics"), 
                            10)
                        )
total_cost <- rnorm(100, mean = 30000, sd = 10000)
d <- data.frame(grant_id, funding_agency, study_section, total_cost)

some(d)

   grant_id funding_agency          study_section total_cost
15       15          NINDS         Vector Biology   25242.19
19       19            NCI    Infectious Diseases   29075.21
50       50            NCI         Drug Discovery   25176.35
62       62            NCI      Tumor Progression   14264.34
64       64          NIAID     Tumor Cell Biology   30024.13

我想创建这些数据的两个可视化，希望使用 R；一个显示提交给各个研究所的赠款如何分配给研究部门，第二个显示研究所分配给研究部门的赠款金额。我最终想要的是你在以下网站上看到的图表：

迁移流程

大学专业到工作管道

有人知道 R 包和/或有一些示例代码来创建您在上面的网站上找到的图表吗？或者，我是否应该考虑使用不同的可视化来实现相同的目标？

score 9 · Accepted Answer

以下是如何使用rCharts. 您可以在此处查看最终的 SankeyPlot

d <- data.frame(
  id = grant_id, 
  source = funding_agency, 
  target = study_section, 
  value = total_cost
)
# devtools::install_github("rCharts", "ramnathv", ref = "dev")
require(rCharts)
sankeyPlot <- rCharts$new()
sankeyPlot$setLib('http://timelyportfolio.github.io/rCharts_d3_sankey')
sankeyPlot$set(
  data = d,
  nodeWidth = 15,
  nodePadding = 10,
  layout = 32,
  width = 750,
  height = 500,
  labelFormat = ".1%"
)
sankeyPlot

要保存图表，您可以执行

sankeyPlot$save('mysankey.html')

sankeyplot

score 1 · Accepted Answer

对可视化部分帮助不大，但您正在寻找数据的二维表。

使用包 reshape2 并忽略 grant_id

d1 <- melt(d[,2:4])
d2 <- dcast(d1, study_section~funding_agency,sum)
> d2
              study_section      NCI     NHLBI     NIAID     NIGMS     NINDS
1      Aging and Geriatrics 28598.04  76524.55      0.00 109492.59 138330.12
2    Cognitive Neuroscience 76484.18  88217.42  78126.55  71546.62  73132.14
3            Drug Discovery 43667.30  39683.03  23797.24  46363.75 105655.61
4       Infectious Diseases 65375.44 136462.03  96413.08  34653.48  13835.22
5  Macromolecular Structure 84308.64  42290.61  39886.87  61645.00  67550.41
6           Medical Imaging 26264.32  86736.36 106356.13  41001.21  35549.83
7        Molecular Genetics 49473.72      0.00 110201.52  69468.03  86688.24
8        Tumor Cell Biology 99930.88  50862.39  95394.23  26269.98  46944.60
9         Tumor Progression 58719.89  52669.80  86874.89      0.00 119264.59
10           Vector Biology 64251.66  30880.81  66734.26 125524.72      0.00

这会告诉您哪个 study_section 从哪个资助机构获得了多少资助。现在如何显示这是一个不同的问题。也许看看http://statmath.wu.ac.at/projects/vcd/

r - 在 R 中可视化从一组对象到另一组对象的流程

2 回答 2

Related

Reference