1

我是使用 ggalluvial 包的新手。我目前正在处理一个捐赠数据集,我想使用冲积图作为媒介来表示该数据集。以下是我正在使用的数据集示例:

   donor_ID recip_name donation_amt month_year    
   <chr>    <chr>             <dbl> <chr>         
 1 1        B, P                 25 September 2019
 2 2        S, B                 27 July 2019     
 3 3        K, A                 50 June 2019     
 4 1        H, K                100 April 2019    
 5 2        W, E                  3 December 2019 
 6 3        S, B                  9 August 2019   
 7 1       C, J                 25 September 2019
 8 2       B, J                 50 October 2019  
 9 3       W, E                400 August 2019   
10 1       S, B                 20 December 2019 

此数据集上的 dput() 输出如下:

structure(list(donor_ID = c("1", "2", "3", "1", "2", "3", "1", 
"2", "3", "1"), recip_name = c("B, P", "S, B", "K, A", "H, K", 
"W, E", "S, B", "C, J", "B, J", "W, E", "S, B"), donation_amt = c(25, 
27, 50, 100, 3, 9, 25, 50, 400, 20), month_year = c("September 2019", 
"July 2019", "June 2019", "April 2019", "December 2019", "August 2019", 
"September 2019", "October 2019", "August 2019", "December 2019"
)), class = "data.frame", row.names = c(NA, -10L))

我希望代表个人捐赠者对谁接受(recip_name)他们的捐赠可能会因月而异(捐赠者偏好)做出的选择,而donor_ID代表个人捐赠者。donation_amt生成的冲积图应显示每月之间的上述变化,其方式也与接受者之间的捐赠总额 ( ) 成正比。以下是我为完成此任务而编写的脚本:

df$recip_name <- as.factor(df$recip_name)
df %>% 
  filter(transaction_dt < as.Date("2020-01-01")) %>% 
  select(donor_ID, recip_name, donation_amt, month_year) %>% 
  ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
             alluvium = donor_ID, fill = recip_name, label = recip_name)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", color = "darkgray") +
  geom_stratum() +
  theme_light() +
  theme(legend.position = "bottom") +
  ggtitle("Donor Preference")

执行此 R 代码后,这是我收到的结果错误:

Error in f(...) : 
  Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).

我已经对正确设置用于 ggalluvial 的数据的问题进行了研究,但无济于事。如何使用这些数据正确开发所需的冲积图?

4

1 回答 1

0

目前,地块层抛出的错误信息比冲积结构测试本身抛出的错误信息少。这些测试还使用不同的术语:idfor alluviumkeyforxvaluefor stratum。(对此我深表歉意!这些将在未来的版本中进行更改。)您的数据正试图采用脉络(长)形式,并且is_lodes_form()测试(如下)表明存在重复的 id-axis 配对。

我之前没有注意到,但确实至少有一对重复的配对:有两行donor_ID = 1month_year = September 2019。冲积地块要求每个冲积层 (id) 最多通过每个轴一次。在移除这一个又一个之后,一个冲积地块确实呈现(下图)。大概是因为这只是数据的一个样本,所以图是稀疏的。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)
library(ggalluvial)
#> Loading required package: ggplot2

df <- structure(list(
  donor_ID = c("1", "2", "3", "1", "2", "3", "1", "2", "3", "1"),
  recip_name = c("B, P", "S, B", "K, A", "H, K", "W, E", "S, B", "C, J", "B, J", "W, E", "S, B"),
  donation_amt = c(25, 27, 50, 100, 3, 9, 25, 50, 400, 20),
  month_year = c("September 2019", "July 2019", "June 2019", "April 2019", "December 2019", "August 2019", "September 2019", "October 2019", "August 2019", "December 2019")
), class = "data.frame", row.names = c(NA, -10L))
df$recip_name <- as.factor(df$recip_name)

is_lodes_form(df, key = month_year, value = recip_name, id = donor_ID)
#> Duplicated id-axis pairings.
#> [1] FALSE

df %>%
  slice(-c(7, 9)) %>%
  mutate(month = match(str_remove(month_year, " 2019"), month.name)) %>%
  ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
             alluvium = donor_ID, fill = recip_name, label = recip_name)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", color = "darkgray") +
  geom_stratum() +
  theme_light() +
  theme(legend.position = "bottom") +
  ggtitle("Donor Preference")

reprex 包于 2022-01-30 创建(v2.0.1)

该图非常稀疏,大概是因为这只是您的数据的一个样本。而且您还必须做一些事情来清理情节,例如将角色值month_year转换为因素或日期。

如果您想区分同一捐赠者对不同接受者的捐赠,那么您要使用的观察单位可能是donor_ID和的交互作用recip_name。将其传递给alluvium审美、recip_nametostratumdonor_IDtofill可能会产生您想要的情节。

于 2022-01-30T13:59:18.437 回答