1

我已经对一些数据进行了分箱,目前有一个由两列组成的数据框,一列指定分箱范围,另一列指定频率,如下所示:-

> head(data)
      binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16

我想使用它来绘制直方图和密度图,但我似乎找不到这样做的方法而不必生成新的 bin 等。在这里使用这个解决方案我尝试执行以下操作:-

p <- ggplot(data, aes(x= binRange, y=Frequency)) + geom_histogram(stat="identity")

但它崩溃了。有谁知道如何处理这个问题?

谢谢

4

2 回答 2

3

问题是 ggplot 不理解您输入数据的方式,您需要像这样重塑它(我不是正则表达式大师,所以肯定有更好的方法是):

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(stringr)
library(splitstackshape)
library(ggplot2)
# extract the numbers out,
df$binRange <- str_extract(df$binRange, "[0-9].*[0-9]+")

# split the data using the , into to columns:
# one for the start-point and one for the end-point
df <- cSplit(df, "binRange")

# plot it, you actually dont need the second column
ggplot(df, aes(x = binRange_1, y = Frequency, width = 0.025)) +
    geom_bar(stat = "identity", breaks=seq(0,0.125, by=0.025))

或者,如果您不希望以数字方式解释数据,则只需执行以下操作:

df <- read.table(header = TRUE, text = "
                 binRange Frequency
1    (0,0.025]        88
2 (0.025,0.05]        72
3 (0.05,0.075]        92
4  (0.075,0.1]        38
5  (0.1,0.125]        20
6 (0.125,0.15]        16")

library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_bar(stat = "identity")

您将无法使用您的数据绘制密度图,因为它不是连续的而是分类的,这就是为什么我实际上更喜欢第二种显示方式的原因,

于 2015-04-29T17:22:52.347 回答
0

你可以试试

library(ggplot2)
ggplot(df, aes(x = binRange, y = Frequency)) + geom_col()
于 2021-10-11T19:04:27.080 回答