python - pandas 根据另一个列表的 qcut 对列表进行分箱

Question

说我有一个清单：

a = [3, 5, 1, 1, 3, 2, 4, 1, 6, 4, 8]

和一个子列表：

b = [5, 2, 6, 8]

我想通过pd.qcut(a,2)列表 b 的每个 bin 获取 bin 并计算值的数量。那是

In[84]: pd.qcut(a,2)
Out[84]: 
Categorical: 
[[1, 3], (3, 8], [1, 3], [1, 3], [1, 3], [1, 3], (3, 8], [1, 3], (3, 8], (3, 8], (3, 8]]
Levels (2): Index(['[1, 3]', '(3, 8]'], dtype=object)

现在我知道垃圾箱是：[1,3] 和 (3,8]，我想知道列表“b”的每个垃圾箱中有多少个值。当垃圾箱的数量为小，但是当垃圾箱的数量很大时，最好的方法是什么？

score 4 · Accepted Answer

您可以使用 retbins 参数从 qcut 取回 bin：

>>> q, bins = pd.qcut(a, 2, retbins=True)

然后用于pd.cut获取b关于 bin 的索引：

>>> b = np.array(b)
>>> hist = pd.cut(b, bins, right=True).labels
>>> hist[b==bins[0]] = 0
>>> hist
array([1, 0, 1, 1])

请注意，您必须bins[0]单独处理角盒，因为它不包含在最左边的 bin 中。

score 0 · Accepted Answer

如之前的答案所示：您可以qcut使用retbins参数获取 bin 边界，如下所示：

q, bins = pd.qcut(a, 2, retbins=True)

然后，您可以使用cut将另一个列表中的值放入那些“箱”中。例如：

myList = np.random.random(100)
# Define bin bounds that cover the range returned by random()
bins = [0, .1, .9, 1] 
# Now we can get the "bin number" of each value in myList:
binNum = pd.cut(myList, bins, labels=False, include_lowest=True)
# And then we can count the number of values in each bin number:
np.bincount(binNum)

确保您的 bin 范围涵盖出现在第二个列表中的整个值范围。 为确保这一点，您可以使用最大值和最小值填充 bin 边界。例如，

cutBins = [float('-inf')] + bins.tolist() + [float('inf')]

python - pandas 根据另一个列表的 qcut 对列表进行分箱

2 回答 2

Related

Reference