python - 在python中用熊猫计算一行的出现次数

Question

我有一个包含数千行和 4 列的 pandas 数据框。IE：

A B C D 
1 1 2 0
3 3 2 1
3 1 1 0
....

有没有办法计算某一行出现了多少次？例如，可以找到多少次 [3,1,1,0] 并返回这些行的索引？

score 4 · Accepted Answer

如果您只寻找一排，那么我可能会做类似的事情

>>> df.index[(df == [3, 1, 1, 0]).all(axis=1)]
Int64Index([2, 3], dtype=int64)

--

解释如下。从...开始：

>>> df
   A  B  C  D
0  1  1  2  0
1  3  3  2  1
2  3  1  1  0
3  3  1  1  0
4  3  3  2  1
5  1  2  3  4

我们与我们的目标进行比较：

>>> df == [3,1,1,0]
       A      B      C      D
0  False   True  False   True
1   True  False  False  False
2   True   True   True   True
3   True   True   True   True
4   True  False  False  False
5  False  False  False  False

找到匹配的：

>>> (df == [3,1,1,0]).all(axis=1)
0    False
1    False
2     True
3     True
4    False
5    False

并使用此布尔系列从索引中进行选择：

>>> df.index[(df == [3,1,1,0]).all(axis=1)]
Int64Index([2, 3], dtype=int64)

如果您不计算一行的出现次数，而是希望对每一行重复执行此操作，因此您确实希望同时定位所有行，那么有比一次又一次执行上述操作更快的方法。但这对于一排应该足够好。

score 1 · Accepted Answer

首先创建一个示例数组：

>>> import numpy as np
>>> x = [[1, 1, 2, 0],
... [3, 3, 2, 1],
... [3, 1, 1, 0],
... [0, 1, 2, 3],
... [3, 1, 1, 0]]

然后创建一个数组视图，其中每一行都是一个元素：

>>> y = x.view([('', x.dtype)] * x.shape[1])
>>> y
array([[(1, 1, 2, 0)],
       [(3, 3, 2, 1)],
       [(3, 1, 1, 0)],
       [(0, 1, 2, 3)],
       [(3, 1, 1, 0)]], 
      dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8'), ('f3', '<i8')])

对要查找的元素执行相同的操作：

>>> e = np.array([[3, 1, 1, 0]])
>>> tofind = e.view([('', e.dtype)] * e.shape[1])

现在您可以查找元素：

>>> y == tofind[0]
array([[False],
       [False],
       [ True],
       [False],
       [ True]], dtype=bool)

score 1 · Accepted Answer

您也可以使用 MultiIndex，当它被排序时，它会更快地找到计数：

s = StringIO("""A  B  C  D
1  1  2  0
3  3  2  1
3  1  1  0
3  1  1  0
3  3  2  1
1  2  3  4""")
df = pd.read_table(s,delim_whitespace=True)
s = pd.Series(range(len(df)), index=pd.MultiIndex.from_arrays(df.values.T))
s = s.sort_index()
idx = s[3,1,1,0]
print idx.count(), idx.values

输出：

2 [2 3]

python - 在python中用熊猫计算一行的出现次数

3 回答 3

Related

Reference