python - 通过 Rpy 排序的分位数平均值

Question

这里的真正目标是在 Python 中找到分位数均值（或总和，或中位数等）。由于我不是 Python 的高级用户，但使用 R 有一段时间了，所以我选择的路线是通过 Rpy。但是，我遇到了返回的均值列表与分位数的顺序不对应的问题。特别是，我在 R 中有以下内容：

> a = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> b = c(2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000)
> prob = seq(0,5)/5
> br = quantile(a,prob)
> rcut = cut(a, br, include.lowest = TRUE)
> quintile_means = tapply(b, rcut, mean)
> quintile_means
[1,2.8] (2.8,4.6] (4.6,6.4] (6.4,8.2]  (8.2,10] 
      3        30       300      3000     30000

这一切都很好。但是，如果我将代码翻译成 Rpy，我得到了

>>> import rpy
>>> from rpy import r
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = [2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000]
>>> prob = [ x / 5.0 for x in range(6)]
>>> br = r.quantile(a, prob)
>>> rcut = r.cut(a, br, include_lowest=r.TRUE)
>>> quintile_means = r.tapply(b, rcut, r.mean)
>>> print quintile_means
[30.0, 300.0, 3000.0, 30000.0, 3.0]

请注意，最终列表的顺序是错误的（我们知道是因为在这种情况a下b两者都是有序的）。一般来说，我只是无法恢复 Rpy 中从最低分位数到最高分位数的正确顺序。有什么建议么？

另外（不是替代品，因为我想知道上述问题的答案），如果您能提出一种直接在 python 中执行分析的方法，那也很棒。（我没有安装 numpy 或 scipy。）谢谢！

编辑：澄清一下，a并且b是配对的，但不一定是有序的。例如，a是眼睛b的大小，是鼻子的大小。我试图找出在的各个分位数中，通讯员sa的含义是什么。b谢谢。

score 4 · Accepted Answer

试试 rpy2。

对于 rpy2 >= 2.1.0，这可能是：

from rpy2.robjects.vectors import IntVector
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')

a = IntVector((1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
b = IntVector((2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000))
prob = base.seq(0,5).ro / 5
br = stats.quantile(a,prob)
rcut = base.cut(a, br, include_lowest = True)
quintile_means = base.tapply(b, rcut, stats.mean)
print(quintile_means)

score 2 · Accepted Answer

如果您不需要标签（例如：），(8.2,10]那么您可以cut使用labels=FALSE. 这应该保持秩序（并免费加速您的代码）。

score 0 · Accepted Answer

我只是无法从 Rpy 中的最低分位数到最高分位数恢复正确的顺序

如果从最低到最高排序列表可以解决您的问题，请尝试sorted(quintile_means).

python - 通过 Rpy 排序的分位数平均值

3 回答 3

Related

Reference