r - 使用 ggplot2 从两个不同的数据帧创建密度图

Question

我的目标是比较各种社会经济因素（例如收入）多年来的分布，以了解特定地区的人口在 5 年内如何演变。这方面的主要数据来自Public Use Microdata Sample。我使用R+ggplot2作为我的首选工具。

在比较两年的数据（2005 年和 2010 年）时，我有两个数据框hh2005和两年hh2010的家庭数据。两年的收入数据存储在hincp两个数据框中的变量中。使用ggplot2我将按如下方式创建各个年份的密度图（例如 2010 年）：

    p1 <- ggplot(data = hh2010, aes(x=hincp))+
      geom_density()+
      labs(title = "Distribution of income for 2010")+
      labs(y="Density")+
      labs(x="Household Income")
    p1

如何在此图上叠加 2005 年的密度？我无法弄清楚它是否已阅读data，因为hh2010我不确定如何继续。我应该从一开始就以完全不同的方式处理数据吗？

score 11 · Accepted Answer

您可以将data参数传递给单个几何图形，因此您应该能够将第二个密度添加为新几何图形，如下所示：

p1 <- ggplot(data = hh2010, aes(x=hincp))+
  geom_density() +
  # Change the fill colour to differentiate it
  geom_density(data=hh2005, fill="purple") +
  labs(title = "Distribution of income for 2010")+
  labs(y="Density")+
  labs(x="Household Income")

score 1 · Accepted Answer

这就是我解决问题的方法：

用感兴趣的变量标记每个数据框（在本例中为年份）
合并两个数据集
更新 ggplot 函数中的“填充”美学

例如：

# tag each data frame with the year^
hh2005$year <- as.factor(2005)
hh2010$year <- as.factor(2010)

# merge the two data sets
d <- rbind(hh2005, hh2010)
d$year <- as.factor(d$year)

# update the aesthetic
p1 <- ggplot(data = d, aes(x=hincp, fill=year)) +
  geom_density(alpha=.5) +
  labs(title = "Distribution of income for 2005 and 2010") +
  labs(y="Density") +
  labs(x="Household Income")
p1

^ 注意，当你使用一个因子时，'fill' 参数似乎效果最好，因此我这样定义了年份。我还使用 'alpha' 参数设置了重叠密度图的透明度。

r - 使用 ggplot2 从两个不同的数据帧创建密度图

2 回答 2

Related

Reference