您应该查看cut()
基础 R 中的函数。在进一步冒险之前,您还应该注意我的答案的最后一行(粗体)。
> set.seed(42)
> cut(runif(50), 6)
[1] (0.825,0.99] (0.825,0.99] (0.167,0.332] (0.825,0.99]
[5] (0.496,0.661] (0.496,0.661] (0.661,0.825] (0.00296,0.167]
[9] (0.496,0.661] (0.661,0.825] (0.332,0.496] (0.661,0.825]
[13] (0.825,0.99] (0.167,0.332] (0.332,0.496] (0.825,0.99]
[17] (0.825,0.99] (0.00296,0.167] (0.332,0.496] (0.496,0.661]
[21] (0.825,0.99] (0.00296,0.167] (0.825,0.99] (0.825,0.99]
[25] (0.00296,0.167] (0.496,0.661] (0.332,0.496] (0.825,0.99]
[29] (0.332,0.496] (0.825,0.99] (0.661,0.825] (0.661,0.825]
[33] (0.332,0.496] (0.661,0.825] (0.00296,0.167] (0.825,0.99]
[37] (0.00296,0.167] (0.167,0.332] (0.825,0.99] (0.496,0.661]
[41] (0.332,0.496] (0.332,0.496] (0.00296,0.167] (0.825,0.99]
[45] (0.332,0.496] (0.825,0.99] (0.825,0.99] (0.496,0.661]
[49] (0.825,0.99] (0.496,0.661]
6 Levels: (0.00296,0.167] (0.167,0.332] (0.332,0.496] ... (0.825,0.99]
cut()
返回一个因子,该因子索引在这种情况下,观察数据下降的 6 个组中的哪一个。这只是将数据范围简单地拆分为 6 组等间隔。阅读?cut
有关在间隔极端情况下做什么的详细信息。
您的代码失败的原因是因为返回的对象hist()
是一个列表,其中包含的内容远远超过您分成组的数据:
> foo <- hist(runif(50), breaks = 6, plot = FALSE)
> str(foo)
List of 7
$ breaks : num [1:6] 0 0.2 0.4 0.6 0.8 1
$ counts : int [1:5] 12 13 7 13 5
$ intensities: num [1:5] 1.2 1.3 0.7 1.3 0.5
$ density : num [1:5] 1.2 1.3 0.7 1.3 0.5
$ mids : num [1:5] 0.1 0.3 0.5 0.7 0.9
$ xname : chr "runif(50)"
$ equidist : logi TRUE
- attr(*, "class")= chr "histogram"
所以你不能把它转换成一个因子——R 不知道该怎么做。另请注意,这hist()
不会返回分解为 6 组的数据 - 它提供了对构建直方图有用的其他信息。另请注意,与cut()
. 如果你想要这些漂亮的休息,那么我们可以hist()
通过以下方式重现:
> set.seed(42)
> x <- runif(50)
> brks <- pretty(range(x), n = 6, min.n = 1)
> cut(x, breaks = brks)
[1] (0.8,1] (0.8,1] (0.2,0.4] (0.8,1] (0.6,0.8] (0.4,0.6] (0.6,0.8]
[8] (0,0.2] (0.6,0.8] (0.6,0.8] (0.4,0.6] (0.6,0.8] (0.8,1] (0.2,0.4]
[15] (0.4,0.6] (0.8,1] (0.8,1] (0,0.2] (0.4,0.6] (0.4,0.6] (0.8,1]
[22] (0,0.2] (0.8,1] (0.8,1] (0,0.2] (0.4,0.6] (0.2,0.4] (0.8,1]
[29] (0.4,0.6] (0.8,1] (0.6,0.8] (0.8,1] (0.2,0.4] (0.6,0.8] (0,0.2]
[36] (0.8,1] (0,0.2] (0.2,0.4] (0.8,1] (0.6,0.8] (0.2,0.4] (0.4,0.6]
[43] (0,0.2] (0.8,1] (0.4,0.6] (0.8,1] (0.8,1] (0.6,0.8] (0.8,1]
[50] (0.6,0.8]
Levels: (0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1]
但是您应该问自己为什么要对数据进行离散化,这是否有意义?