0

我想使用 geom_area() 为日期为 x 轴和频率为 y 轴的数据集创建堆积面积图。我的数据集看起来像这样

 Date           Variant.     Frequency
2020-08-01      AY.1          0
2020-08-01      B.1.351       0
2020-08-01      B.1.617.1     0
2020-08-01      B.1.617.2     0
2020-08-01      B.1.617.3     0
2020-08-01      others        1
2020-08-01      others        1
2020-09-01.     AY.1          0
2020-09-01      B.1.351       0
2020-09-01      B.1.617.1     0
2020-09-01      B.1.617.2     0
2020-09-01      B.1.617.3     0
2020-09-01      others        1
2020-09-01      others        1
.
.
.
.
2021-08-03      B.1.617.3 0.00564 
2021-08-03      others    0.36    
2021-08-03      others    0.36    
2021-08-04      AY.1      0.000713
2021-08-04      AY.4      0.42    
2021-08-04      B.1.1.7   0.00546 
2021-08-04      B.1.351   0.00137 
2021-08-04      B.1.617.1 0.0109  
2021-08-04      B.1.617.2 0.22    

我尝试使用以下代码创建堆积面积图 -

data %>% 
  ggplot(aes(x=Date, y=Frequency, fill=Variant)) + 
  geom_area(position = 'fill', alpha=0.8) +
  scale_x_date(date_breaks = '1 month', date_labels = '%b-%y',expand = c(0.01,0)) 

但是,我最终得到了一个看起来像这样的意外输出

geom_area(位置='填充')

我尝试使用 将位置设置更改为“身份” geom_area(position='identity'),它提供了改进的输出,但不是我想要的。

geom_area(位置='身份')

我希望输出看起来像 R 中的基本堆积面积图 -

基本 R 堆积面积图

我也试过geom_bar()了,这给了我一个堆积的条形图,但我想用面积创建类似的图表

4

1 回答 1

1

要创建堆积面积图

  • 最重要的是数据的形状。正如 r2evans 已经提到的那样。

  • 在这里,我获取了您碎片化的数据框并使用其他列对其进行了修改,以显示应如何组织您的数据以绘制这种图。

  • 基本上,随着时间的推移,您需要一个具有某些值的重复组-> 这里组 a:g、时间 1:6 和频率:

修改虚假数据

library(tidyverse)
data <- df %>% 
    mutate(Date = lubridate::ymd(Date)) %>% 
    mutate(time = rep(row_number(), each=7, length.out = n())) %>% 
    mutate(group = rep(letters[1:7], length.out = n())) %>% 
    mutate(Frequency = rep(runif(29, 34, 100), length.out = n()))

情节代码:

library(tidyverse)
data %>% 
    ggplot(aes(x=time, y=Frequency, fill=group)) + 
    geom_area(alpha=0.8) 

结果图: 在此处输入图像描述

假数据:

df <- structure(list(Date = structure(c(18475, 18475, 18475, 18475, 
18475, 18475, 18475, 18506, 18506, 18506, 18506, 18506, 18506, 
18506, 18539, 18539, 18539, 18539, 18539, 18539, 18539, 18475, 
18475, 18475, 18475, 18475, 18475, 18475, 18506, 18506, 18506, 
18506, 18506, 18506, 18506, 18539, 18539, 18539, 18539, 18539, 
18539, 18539), class = "Date"), Variant = c("AY.1", "B.1.351", 
"B.1.617.1", "B.1.617.2", "B.1.617.3", "others", "others", "AY.1", 
"B.1.351", "B.1.617.1", "B.1.617.2", "B.1.617.3", "others", "others", 
"AY.1", "B.1.351", "B.1.617.1", "B.1.617.2", "B.1.617.3", "others", 
"others", "AY.1", "B.1.351", "B.1.617.1", "B.1.617.2", "B.1.617.3", 
"others", "others", "AY.1", "B.1.351", "B.1.617.1", "B.1.617.2", 
"B.1.617.3", "others", "others", "AY.1", "B.1.351", "B.1.617.1", 
"B.1.617.2", "B.1.617.3", "others", "others"), Frequency = c(57.1679558907636, 
63.9314113892615, 36.638229729142, 94.1662813336588, 75.5338987568393, 
40.2345195417292, 69.4448804655112, 45.4072538088076, 60.232708573807, 
59.4782519731671, 94.5594410258345, 91.2153454185463, 79.8043070686981, 
56.9402130353265, 48.7265761620365, 72.413387727458, 67.7010886864737, 
55.5641814963892, 69.7157254447229, 86.2067115586251, 63.0903459019028, 
73.7501232894138, 92.7098404220305, 53.342769942712, 61.7025430542417, 
72.0743641522713, 90.9143544523977, 66.1621201317757, 91.2102537448518, 
57.1679558907636, 63.9314113892615, 36.638229729142, 94.1662813336588, 
75.5338987568393, 40.2345195417292, 69.4448804655112, 45.4072538088076, 
60.232708573807, 59.4782519731671, 94.5594410258345, 91.2153454185463, 
79.8043070686981), time = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L), group = c("a", "b", "c", "d", "e", "f", "g", "a", "b", "c", 
"d", "e", "f", "g", "a", "b", "c", "d", "e", "f", "g", "a", "b", 
"c", "d", "e", "f", "g", "a", "b", "c", "d", "e", "f", "g", "a", 
"b", "c", "d", "e", "f", "g")), class = "data.frame", row.names = c(NA, 
-42L))
于 2021-09-08T17:07:03.637 回答